# Artificial Neural Networks

Have you ever wondered how we process and learn information? For example, how does our body process information such that we are able to move our hands or legs? To put it simply, the brain will process information and then send out signals to the rest of the body to trigger certain muscle movements. These signals are transported through the nervous system. One of the main components of the nervous system are neuron cells. These cells work on a threshold basis which means that the signal will only be transferred from cells to cells if it is higher than a certain value or amount. As such, when we decide to move our hands, the signals from the brain will get transferred to the muscle in our hands and not the muscle in our legs.

### Training Artificial Neural Networks

However, this only explains how we process information. How about the ability of humans to learn? For example, why do we know to stop at a red light or how to kick a ball? It is because we were trained to do so by looking at examples or how other people were doing it. Through these examples, we were able to learn and remember.

Would it be great if computers were able to mimic the way humans process and learn information? With artificial neural network, this can be done! Artificial neural networks are able to process and 'learn' complex relationships within datasets. An illustration of a simple neural network is shown below.

The basic idea is that we input data into the input layer. Data will be processed in the subsequent hidden layer(s) - we only show one hidden layer on the picture below, but it can be many layers. Each layer is made of multiple artificial neurons which apply functions to the data and pass it on to another hidden layer, finally ending with the output layer.

![ANN](assets/ANN.jpg)

The illustration above shows a simple neural network with 1 input layer, 1 hidden layer (in between an input layer and an output layer) and 1 output layer. Each circle represent 1 node or 1 neuron. We usually do not discuss the number of nodes at the input layer within the model architecture as the input layer is just the data that is being passed to the model. Thus, the hidden layer has 4 nodes/neurons and the output layer has 1 node/neuron.

The output layer will show the results of the neural network.

How does the neural network work? How can they be useful for machine learning project? Watch this [video](https://www.youtube.com/watch?v=aircAruvnKk) to find out more about artificial neural networks. Pause the video and take time to try and understand how the neural network works. Note down any interesting information on neural networks on your worksheet. Are you also able to draw out the network (similar to the illustration above) if the network has 1 input layer with 5 nodes, 2 hidden layers with 3 nodes each and 1 output layer with 2 nodes?

### Training the neural network
After understanding the different features of an artificial neural network, one question that still remains is how does the network "learn"?

Take for example a young basketball player who is learning to shoot a 3-point shot. If he shoots and he misses because the shot is too short, the basketball player will adjust and increase the strength of the next shot. If the next shot now is too far right of the basket, the player will again adjust his shot to shoot more towards the center. The player continues to do this until the shot is made. The player then remembers the exact strength and shot direction when shooting 3-point shots in future.

This is similar to how neural networks are trained. First, the data is passed through the network and a predicted output is given. This is known as a forward propagation. The predicted output is then compared to the actual output of the data and the differences between the predicted and actual will be passed backwards through the model. During the backward pass, adjustments will be made within the model such that the differences between prediction output and actual output will be reduced. This is known as the backpropagation. After the adjustments are made, data will be passed through from the input layer again and another predicted output will be made. The new predicted output will be compared to the actual output again and the differences will be passed backwards through the model. More adjustments will be made within the model.

The process of forward propagation and backpropagation will be repeated until the differences between predicted output and the actual output are minimised. The model is now trained and can be used for prediction of other similar datasets.

To have a better visualisation of how backpropagation works, watch this [video](https://www.youtube.com/watch?v=Ilg3gGewQ5U) and take down any information that interests you.

**Bonus: You are not required to understand the actual adjustments within the model. However, if you are mathematically inclined or really interested in understanding all the adjustments and have time, you can watch the 2 videos listed below.**

- [video 1](https://www.youtube.com/watch?v=IHZwWFHWa-w)
- [video 2](https://www.youtube.com/watch?v=tIeHLnjs5U8)

## Training a neural network with the Iris Flower dataset

Let's try training the neural network using the Iris Flower dataset! we will start with library imports and data import

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

In [2]:
df = pd.read_csv('datasets/iris.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   Class         150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [3]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


Now, we have to split the datset into the x values (features which the model can learn relationships from) and the y values (target values or expected output from the model).

#### Standardization

We would also have to standardize the dataset. What is standardization for? To understand this, look at the distribution of data above! Notice the different mean and standard deviation. Comparing between these variables will be difficult. Standardization helps us to equalize these various distributions into a common mean and standart deviation, so that we can compare them easily. Refer to the [standardscaler graph here](https://benalexkeen.com/feature-scaling-with-scikit-learn/). See how data changed before and after scaling.

This is to allow neural networks to classify easily. The code below will extract out the x values as x_values and also standardise the values. We will extract the y_values later. 

In [4]:
x_values = df.drop('Class', axis=1)
x_values.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [5]:
standardize = StandardScaler()
x_values = standardize.fit_transform(x_values)
x_values_df = pd.DataFrame(x_values)

In [6]:
x_values_df.describe()

Unnamed: 0,0,1,2,3
count,150.0,150.0,150.0,150.0
mean,-4.736952e-16,-7.81597e-16,-4.263256e-16,-4.736952e-16
std,1.00335,1.00335,1.00335,1.00335
min,-1.870024,-2.433947,-1.567576,-1.447076
25%,-0.9006812,-0.592373,-1.226552,-1.183812
50%,-0.05250608,-0.1319795,0.3364776,0.1325097
75%,0.6745011,0.5586108,0.7627583,0.7906707
max,2.492019,3.090775,1.785832,1.712096


see how this is different from the original? the mean is almost 0 while the standard deviation is almost 1

### Build the neural network

We can now build a simple neural network. In order to do so, we will need to import the dense from Tensorflow Keras library and sequential functions from keras.

**Sequential**

The Sequential model allows you to first create an empty model object, and then add layers to it one after another in sequence.

**Dense**

A dense layer is simply a layer of neurons in the neural network.

In [7]:
from keras import Sequential
from tensorflow.keras.layers import Dense

Let us build a neural network with 1 input layer, 2 hidden layers and 1 output layer. There are no rules to decide how many nodes should be within the hidden layers.

For this neural network, we will use 6 hidden nodes for each hidden layer.

With regards to the output layer, we should use as many nodes as the number of classes. How many nodes should we use for the output?

In [8]:
# initialize the network as model
model = Sequential()

# Add the first hidden layer with 6 nodes. Input_dim refers to the number of columns/number of features in x_values or the input layer.
# Activation refers to how the nodes/neurons are activated. We will use relu. Other common activations are 'sigmoid' and 'tanh'
model.add(Dense(6, input_dim=4, activation='relu'))

# Add the next hidden layer with 6 nodes. 
model.add(Dense(6, activation='relu'))

# Add the output layer with 3 nodes. The activation used has to be 'softmax'. Softmax is used when you are dealing with categorical outputs or targets. 
model.add(Dense(3, activation='softmax'))

# Compile the model together. The optimizer refers to the method to make the adjustment within the model. Loss refers to how the difference between the predicted out 
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

_Optional_: [Comparisons between activation functions](http://www.machineintellegence.com/different-types-of-activation-functions-in-keras/) There are many more activation functions. They are like on-off buttons to allow certain data/input to follow through the neurons or not. You are not expected to know the functions in detail for now.

We can also print out the model summary after it has been compiled. Try the code below to see:

- The layers and their order in the model.
- The output shape of each layer.
- The number of parameters (weights) in each layer.
- The total number of parameters (weights) in the model.

In [9]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 6)                 30        
                                                                 
 dense_1 (Dense)             (None, 6)                 42        
                                                                 
 dense_2 (Dense)             (None, 3)                 21        
                                                                 
Total params: 93
Trainable params: 93
Non-trainable params: 0
_________________________________________________________________


As the y_values are categorical in nature (categories instead of numbers), we have to convert the y_values from categories into numbers before we can train the neural network. For neural networks, if the categories are not numerical groups (for example, 1,2,3,4,etc) we have to perform label encoding (Remember doing this in an earlier notebook?) before doing one-hot encoding.

In [10]:
# Convert the classes to one-hot encoding
y_values = pd.get_dummies(df['Class'])
print(y_values)

     Setosa  Versicolor  Virginica
0         1           0          0
1         1           0          0
2         1           0          0
3         1           0          0
4         1           0          0
..      ...         ...        ...
145       0           0          1
146       0           0          1
147       0           0          1
148       0           0          1
149       0           0          1

[150 rows x 3 columns]


Now, let us train the model.

In [11]:
# Train model with x_values and y_values. 
# Epochs refer to the number of times the full dataset will be used to train the model.
# Shuffle = True tells the model to randomise the arrangement of the dataset after each epoch. 
model.fit(x_values,y_values,epochs=10,shuffle=True, batch_size=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x211fa24ea70>

Well done! You have trained your first neural network. Before we look at the accuracy, we have to understand some of the terms in the output.

Epochs refer to the number of times the full dataset will be used to train the model.

us/step shows how long the model took to train on each epoch.

accuracy shows how accurate the model is.

Notice how these numbers change over the different epoch.

From the model above, what was the accuracy value you've got?

Try to see if you can get a better accuracy by adding another hidden layer to the model. The additional hidden layer can have the same number of nodes as the previous layer.

In [12]:
# Initialise the neural network as model
model1 = Sequential()

# Add the first hidden layer with 6 nodes. Input_dim refers to the number of columns/number of features in x_values or the input layer.
# Activation refers to how the nodes/neurons are activated. We will use relu. Other common activations are 'sigmoid' and 'tanh'
model1.add(Dense(6,input_dim=4,activation='relu'))

# Add the hidden layer with 6 nodes. 
model1.add(Dense(6,activation='relu'))
# Add the 3rd hidden layer with 6 nodes. 
model1.add(Dense(6,activation='relu'))

# Add the output layer with 3 nodes. The activation used has to be 'softmax'. Softmax is used when you are dealing with categorical outputs or targets. 
model1.add(Dense(3,activation='softmax'))

# Compile the model together. The optimizer refers to the method to make the adjustment within the model. Loss refers to how the difference between the predicted out 
model1.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [18]:
model1.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 6)                 30        
                                                                 
 dense_4 (Dense)             (None, 6)                 42        
                                                                 
 dense_5 (Dense)             (None, 6)                 42        
                                                                 
 dense_6 (Dense)             (None, 3)                 21        
                                                                 
Total params: 135
Trainable params: 135
Non-trainable params: 0
_________________________________________________________________


In [13]:
# Train model with x_values and y_values. 
# Epochs refer to the number of times the full dataset will be used to train the model.
# Shuffle = True tells the model to randomise the arrangement of the dataset after each epoch. 
model1.fit(x_values,y_values,epochs=10,shuffle=True, batch_size=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x211fa5f1390>

After adding another hidden layer, you should observe that the accuracy seems to be higher than the initial model. As such, this shows that adding more layers can increase the accuracy, but this isn't always the case.

**Bonus: You can try other methods to improve the accuracy. How about increasing the number of nodes in the hidden layer? Do you get a higher accuracy if you do so?**

In [14]:
# Initialise the neural network as model
model2 = Sequential()

# Add the first hidden layer with 6 nodes. Input_dim refers to the number of columns/number of features in x_values or the input layer.
# Activation refers to how the nodes/neurons are activated. We will use relu. Other common activations are 'sigmoid' and 'tanh'
model2.add(Dense(6,input_dim=4,activation='relu'))

# Add the hidden layer with 6 nodes. 
model2.add(Dense(10,activation='relu'))
# Add the 3rd hidden layer with 6 nodes. 
model2.add(Dense(10,activation='relu'))

# Add the output layer with 3 nodes. The activation used has to be 'softmax'. Softmax is used when you are dealing with categorical outputs or targets. 
model2.add(Dense(3,activation='softmax'))

# Compile the model together. The optimizer refers to the method to make the adjustment within the model. Loss refers to how the difference between the predicted out 
model2.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
# Train model with x_values and y_values. 
# Epochs refer to the number of times the full dataset will be used to train the model.
# Shuffle = True tells the model to randomise the arrangement of the dataset after each epoch. 
model2.fit(x_values,y_values,epochs=10,shuffle=True, batch_size=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x211fa5c2980>

### Validation Dataset

From the above, we can see that the neural network can be trained for any number of epochs. This means that the network can keep learning from the same dataset many times. What do you think will happen if the model keeps learning from the same dataset? Do you think the network will be able to obtain very high accuracy by doing so?

In [15]:
model2.fit(x_values,y_values,epochs=40,shuffle=True, batch_size=1)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x211fc892ef0>

You were right if you think that the model's accuracy will increase as it continues to learn. You can try it on the same dataset. Run the code below and observe the accuracy. Is it higher than the accuracy from the previous model with the same setting?

In [16]:
# Initialise the neural network as model
model4 = Sequential()

# Add the first hidden layer with 6 nodes. Input_dim refers to the number of columns/number of features in x_values or input_layer.
# Activation refers to how the nodes/neurons are activated. We will use relu. Other common activations are 'sigmoid' and 'tanh'
model4.add(Dense(6,input_dim=4,activation='relu'))

# Add the hidden layer with 6 nodes. 
model4.add(Dense(6,activation='relu'))

# Add the output layer with 3 nodes. The activation used has to be 'softmax'. Softmax is used when you are dealing with categorical outputs or targets. 
model4.add(Dense(3,activation='softmax'))

# Compile the model together. The optimizer refers to the method to make the adjustment within the model. Loss refers to how the difference between the predicted out 
model4.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

# Train model with x_values and y_values. Epochs refer to the number of times the full dataset will be used to train the model.
# Shuffle = True tells the model to randomise the arrangement of the dataset after each epoch. This will allow the model to learn.
model4.fit(x_values,y_values,epochs=200,shuffle=True, batch_size=1)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x211fc893be0>

You can observe from the printout that the accuracy score is now above 99%. Wow! It seems amazing that just increasing the number of epochs will allow the accuracy to increase. Do you think it is good that the model is so highly accuracte on the data that it has trained on? Just imagine, will a soccer player do well in a match if he/she only trains extremely hard on scoring a goal from a specific spot on the field? Or will a baker be able to bake a delicious cake based on a customer's request if he/she only learns how to bake a cake that is a specific flavour, shape and size?

The soccer player will not perform well in a match as he/she may not be able to shoot accurately from other parts of the field. The baker will not be able to bake a delicious cake as he/she will only know 1 specific flavour.

The idea behind this question is the concept of overfitting. If the model overfits, it will not be able to generalise to properly predict data that it has not seen before.

This is the concept of overfitting and also applies to all machine learning techniques. Once the technique has fitted too acurrately on the dataset, the trained technique will not be able to generalise to other data that it has not seen before. As such, we usually only train the technique on a fraction of the dataset that we have and keep the remainder as a test or validation set to see if the model has overfitted. When training on the training set, the accuracy of the model on the test set should increase. However, at the point of overfitting, the accuracy on the test set will start to decrease. If you see that the test accuracy has begin to increase after a certain epoch, you should not train the model any further.

Let us apply the split between a train and a test set to the dataset that we are using here.

First, we need to decide how much data we want to keep and prevent the model from training on. Usually, we withold about 20% to 30% of the dataset. In this example, we will keep 25 percent of the dataset as the test/validation set. We can use the train_test_split function from the sklearn library. Try the code below.

In [18]:
from sklearn.model_selection import train_test_split

# Extract out original x_values from the dataframe df. 
# We have to re-extract the x_values as the standardisation should only be based on the data that the model will train with.
# Thus, we have to split the data first before standardising.
x_values = df[['sepal_length','sepal_width','petal_length','petal_width']]

# Test_size=0.20 indicates that 20% of the datapoints will be in x_test and y_test whereas 70% will be in x_train and y_train.
# random_state=10 is used to ensure that the split is the same everytime you run the code below. 
# This is because the split is done randomly everytime. The same random_state is the only way to ensure the same split everytime.
x_train, x_test, y_train, y_test = train_test_split(x_values,y_values,test_size=0.20,random_state=10)

# Check the number of rows in x_train, x_test, y_train and y_test
print("Number of rows in x_train:", x_train.shape[0])
print("Number of rows in x_test:", x_test.shape[0])
print("Number of rows in y_train:", y_train.shape[0])
print("Number of rows in y_test:", y_test.shape[0])

# We can now standardise the x values.
# Initialise the StandardScaler
standardise = StandardScaler()

# Standardise the x_train values using .fit_transform
x_train = standardise.fit_transform(x_train)

# Standardise the x_test values using .transform. 
# There is no need to fit the data as the standardisation should be the same as that of x_train.
x_test = standardise.transform(x_test)

Number of rows in x_train: 120
Number of rows in x_test: 30
Number of rows in y_train: 120
Number of rows in y_test: 30


In [19]:
# Initialise the neural network as model
model_val = Sequential()

# Add the first hidden layer with 6 nodes. Input_dim refers to the number of columns/number of features in x_values or input_layer.
# Activation refers to how the nodes/neurons are activated. We will use relu. Other common activations are 'sigmoid' and 'tanh'
model_val.add(Dense(6,input_dim=4,activation='relu'))

# Add the hidden layer with 6 nodes. 
model_val.add(Dense(6,activation='relu'))

# Add the output layer with 3 nodes. The activation used has to be 'softmax'. Softmax is used when you are dealing with categorical outputs or targets. 
model_val.add(Dense(3,activation='softmax'))

# Compile the model together. The optimizer refers to the method to make the adjustment within the model. Loss refers to how the difference between the predicted out 
model_val.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
model_val.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_14 (Dense)            (None, 6)                 30        
                                                                 
 dense_15 (Dense)            (None, 6)                 42        
                                                                 
 dense_16 (Dense)            (None, 3)                 21        
                                                                 
Total params: 93
Trainable params: 93
Non-trainable params: 0
_________________________________________________________________


In [20]:
# Train model with x_values and y_values. 
# Epochs refer to the number of times the full dataset will be used to train the model.
# Shuffle = True tells the model to randomise the arrangement of the dataset after each epoch. This will allow the model to learn.
# Validation_data will allow us to input in the test/validation datasets. This will allow us to see the model accuracy on the test/validation set.
model_val.fit(x_train,y_train,epochs=50,shuffle=True,validation_data=(x_test,y_test), batch_size=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x211fdab2a70>

Did you notice that the printout now shows the validation accuracy as well? Is the validation accuracy higher or lower than the training accuracy?

We can also use the model to identify flower types for newly gathered data. For example, imagine that your friend has measured the ``sepal_length``, ``sepal_width``, ``petal_length`` and ``petal_width`` of some flowers and saved the data in a file called ``"iris_predict.data"``. Your friend wants to find the flower types for these flowers based on the values measured. Would you be able to use your model to help your friend? What are the flower types that your friend measured? Try the codes below! You can use ``model.predict`` method to obtain the flower types for your friend. Additionally, the values returned by the ``.predict`` method will be the probability of each flower type that the model thinks should be assigned to each data row. As such, the higher the probablity, the more confident the model is that the flower type predicted is correct. For example, if the predicted values are very high in the second column, then the model thinks that the flower type is versicolor. You will also need to scale your data before obtaining the flower types.

In [26]:
df2 = pd.read_csv('datasets/iris_predict.csv')
x = df2.drop('Class', axis=1)
x_scaled = standardise.transform(x)
y_pred = model_val.predict(x_scaled)



In [27]:
print(y_pred)

[[9.98875320e-01 1.12366502e-03 9.10445920e-07]
 [6.28031557e-05 5.40373147e-01 4.59564060e-01]
 [4.78343536e-05 9.37768579e-01 6.21835366e-02]
 [9.17562772e-08 7.06989737e-03 9.92929995e-01]
 [1.19578205e-02 9.47605550e-01 4.04366516e-02]
 [1.75949780e-03 8.62990618e-01 1.35249913e-01]
 [5.31231053e-04 9.96929348e-01 2.53946520e-03]
 [1.17281536e-06 9.44635482e-04 9.99054134e-01]
 [1.15284696e-02 9.88050640e-01 4.20994504e-04]
 [5.74647538e-06 2.09561363e-02 9.79038119e-01]]


What are the flower types for each data row? You can use the [argmax](https://www.geeksforgeeks.org/numpy-argmax-python/) function to help you identify the flower types from y_pred. 

In [28]:
flower_types = []
for ii in range(0,y_pred.shape[0]):
    flower_types.append(np.argmax(y_pred[ii,:]))
print(flower_types)

[0, 1, 1, 2, 1, 1, 1, 2, 1, 2]


0 - setosa
1 - versicolor
2 - virginica

Congrats! You have successfully learnt how to create an artificial neural network model and trained it. Artficial neural networks are extremely powerful tools that can be used on very large datasets. For example, if you have a couple of hundred columns/features and over 100000 data points, it may be beneficial to use artificial neural networks instead of other machine learning techniques. Just remember, there are no strict rules on the number of nodes in the hidden layer or the number of hidden layers. Experiment with different neural networks to see which one provides the best result for your dataset.