# Why do we need artificial neural networks?

We will train a `SGDClassifier` and an artificial neural network with a sequential method in `keras`. In both cases a versicolor detector (binary classifier model) will be constructed using the classical dataset `load_iris` from sklearn and we will see that in some sense the neural network help us to avoid feature engineearing.

Some functions for preprocessing the dataset are in `prepross_iris_data.py` file. It is recommended go to the function's documentation.

Let's begin focussing only on petal data (just the last columns in iris.data)

In [24]:
import numpy as np
from sklearn.datasets import load_iris
from prepross_iris_data import *

iris = load_iris()
X, y = iris.data, iris.target

X_pet = only_sepal_or_petal(X, kind='petal')
y_ver = label_binary(y, kind='versicolor')

As you can see, we are focussed only on petal information and the task is just create a versicolor detector. We will create a figure in which you will see that the versicolor class is not linearly separable.

<img src = 'Versicolor_class.png'> 

### `SGDClassifier` with and without feature engineearing over the data

In [25]:
from sklearn.linear_model import SGDClassifier

classical_clf = SGDClassifier()
classical_clf.fit(X_pet, y_ver) # without feature engineearing
print('For simplicity, the following score is just on the train set.')
print('Score: ', classical_clf.score(X_pet, y_ver))

For simplicity, the following score is just on the train set.
Score:  0.6666666666666666


We can help the classifier with the practice of feature engineearing strategy in the following way:

1) Traslate the data using the versicolor means of length and width respectively.

2) Create a new feature taking the square of the data in the previous step.

In [26]:
l_mean, w_mean = means_sepal_or_petal(X, kind='petal') # length mean and width mean
X_pet_tras = traslate_data(X_pet, l_mean, w_mean)
X_pet_tras_squared = np.square(X_pet_tras) # take data squared is considered feature engineearing

The following figure shows the versicolor new representation is now a little bit near of being linearly separable

<img src = 'Versicolor_separable.png'> 

Now, we use again a `SGDClassifier` after feature engineearing excecuted

In [27]:
classical_clf = SGDClassifier()
classical_clf.fit(X_pet_tras_squared, y_ver) # with feature engineearing
print('For simplicity, the following score is just on the train set.')
print('Score: ', classical_clf.score(X_pet_tras_squared, y_ver))

For simplicity, the following score is just on the train set.
Score:  0.9666666666666667


As you can see, the classifier is better now.

### Artificial neural network without feature engineearing

In [28]:
import tensorflow as tf
from tensorflow import keras

model_pet = keras.models.Sequential()
model_pet.add(keras.layers.Flatten(input_shape=[2]))
model_pet.add(keras.layers.Dense(32, activation='relu'))
model_pet.add(keras.layers.Dense(32, activation='relu'))
model_pet.add(keras.layers.Dense(1, activation='sigmoid'))

model_pet.compile(loss="BinaryCrossentropy",
                optimizer='sgd',
                metrics=['accuracy'])

model_pet.fit(X_pet_tras, y_ver, epochs=100, verbose=0)

# the use of evaluate method should be over a test_set, but here it is used X_pet_tras
print("ANN's score: ", model_pet.evaluate(X_pet_tras, y_ver)[1]) 

ANN's score:  0.8999999761581421


The accuracy show us that the artificial neural network's success without feature engineearing (that is, take squares) is better than the SGDClassifier success without feature engineearing. In conclusion, We have seen a case of binary classifiers in which under the same conditions (without feature engineearing) the artificial neural network has beaten the classical model.

In the following figure you can see how the artificial neural network detects versicolor zone, even with not linearly separable data. Also, it is important recognize that we could work a lot to improve the power of generalization but it wasn't the purpose of this notebook. 

<img src='Versicolor_ann.png'>