<a href="https://colab.research.google.com/github/hussain0048/Deep-Learning-with-Keras/blob/master/Deep_Learning_with_Keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**1-Introduction to Deep Learning with Keras**[1]

##**1.1 Introducing Keras**


### **1.1.1 What is Keras?**

This course will add Keras as a powerful tool to your arsenal. Keras is a high level deep learning framework, to understand what it's meant by that we can compare it to a lower level framework like Theano.


**2. Theano vs Keras**

Building a neural network in Theano can take many lines of codes and requires a deep understanding of how they work internally. Building and training this very same network in Keras only takes a few lines of code. Much quicker,right?

**3. Keras**

Keras is an open source deep learning library that enables fast experimentation with neural networks. It runs on top of other frameworks like Tensorflow, Theano or CNTK. And it was created by French AI researcher François Chollet.

**4. Why use Keras?**

Why use Keras instead of other low-level libraries like TensorFlow? With Keras you can build industry-ready models in no time, with much less code than Theano, as we saw before, and a higher abstraction than that offered by TensorFlow. This allows for quickly and easily checking if a neural network will get your problems solved. In addition you can build any architecture you can imagine, from simple networks to more complex ones like auto-encoders, convolutional or recurrent neural networks. Keras models can also be deployed across a wide range of platforms like Android, iOS, web-apps, etc.

**5. Keras + TensorFlow**

It's the best moment to be learning Keras. Keras is now fully integrated into TensorFlow 2.0, so you can use the best of both worlds as needed and in the same code pipeline. If as you dive into deep learning you find yourself needing to use low-level features, for instance to have a finer control of how your network applies gradients, you could use TensorFlow and tweak whatever you need. Now that you know better what Keras is and why to use it, perhaps we should discuss when and why to use neural networks in the first place.

**6. Feature Engineering**

Neural networks are good feature extractors, since they learn the best way to make sense of unstructured data. Previously, it was the domain expert that had to set rules based on experimentation and heuristics to extract the relevant features of data. Neural networks can learn the best features and their combination, they can perform feature engineering themselves. That's why they are so useful. But what is unstructured data?

**7. Unstructured data**

Unstructured data is data that is not easily put into a table. For instance, sound, Video, images, etc. It's also the type of data where performing feature engineering can be more challenging, that's why leaving this task to neural networks is a good idea.

**8. So, when to use neural networks?**

If you are dealing with unstructured data, you don't need to interpret the results and your problem can benefit from a known architecture, then you probably should use neural networks. For instance, when classifying images of cats and dogs: Images are unstructured data, we don't care as much about why the network knows it's a cat or a dog, and we can benefit from convolutional neural networks. So it's wise to use neural networks. You will learn about the usefulness of convolutional neural networks later on in the course.

###**1.1.2  Your first neural network**

**2. A neural network?**

A neural network is a machine learning algorithm with the training data being the input to the input layer and the predicted value the value at the output layer.

![](
https://drive.google.com/uc?export=view&id=1vuNIxT1t1hCvufkkykiXz0qWzuf-Q6s_)


**3. Parameters**

Each connection from one neuron to another has an associated weight w. Each neuron, except the input layer which just holds the input value, also has an extra weight we call the bias weight, b. During feed-forward our input gets transformed by weight multiplications and additions at each layer, the output of each neuron can also get transformed by the application of what's called an activation function.

![](https://drive.google.com/uc?export=view&id=1oKV0pBpn3rKbpV_Anq4lOEjlb8psug00)

**4. Gradient descent**

Learning in neural networks consists of tuning the weights or parameters to give the desired output. One way of achieving this is by using the famous gradient descent algorithm, and applying weight updates incrementally via a process known as back-propagation. That was a lot of theory! The code in Keras is much simpler as we will see now.
![](https://drive.google.com/uc?export=view&id=10Gk8kB7JB5zASSu5irmBZUrBsgb0L03Z)


**5. The sequential API**

Keras allows you to build models in two different ways; using either the Functional API or the Sequential API. We will focus on the Sequential API. This is a simple, yet very powerful way of building neural networks that will get you covered for most use cases. With the sequential API you're essentially building a model as a stack of layers. You can start with an input layer.

![](
https://drive.google.com/uc?export=view&id=1EOGcX5oGwa2CI4A1Xt86Ap27oeA19DVO)

**6. The sequential API**

Add a couple of hidden layers


**7. The sequential API**

And finally end your model by adding an output layer. Let's go through a code example

**8. Defining a neural network**

To create a simple neural network we'd do the following: Import the Sequential model from Keras.models. Import a Dense layer, also known as fully connected layer, from Keras.layers. We can then create an instance of a Sequential model.In this next line of code we add two layers; a 2 neuron Dense fully connected layer, and an input layer consisting of 3 neurons. The input layer is defined with the input_shape parameter. This first layer matches the dimensions of our input data.We finally add another fully connected layer, this time with 1 neuron. We've built the network to the right.

![](
https://drive.google.com/uc?export=view&id=1DhHRl7HJTi2DMFPAufbxzWeMLGXe02a7)

**9. Adding activations**

In order to add an activation function to our layers we can make use of the activation argument.

**10. Adding activations**

For instance, this is how we'd add a ReLU activation to our hidden layer. Don't worry about the choice of activation functions,that will be covered later on in the course.

**11. Summarize your model!**

Once we've created our model we can call the summary() method on it. This displays a table with 3 columns: The first with the layers name and type, the second with the shape of the outputs produced by each layer and the third containing the number of parameters, those are the weights including the bias weight of each neuron in the layer. When the input layer is defined via the input_shape parameter, as we did before, it is not shown as a layer in the summary but it's included in the layer where it was defined, in this case the dense_3 layer.

![](https://drive.google.com/uc?export=view&id=1KYJqGrtAE33RRvqAgn-RnbsXFa2xQSpX)

**2. Visualize parameters**

That's why we see that this layer has 8 parameters: 6 parameters or weights come from the connections of the 3 input neurons to the 2 neurons in this layer, the missing 2 parameters come from the bias weights, b0 and b1, 1 per each neuron in the hidden layer.

![](https://drive.google.com/uc?export=view&id=1QXwcJ72KfKBlMMgDNHG24elHfAAQRPv7)

**13. Visualize parameters**

These add up to 8 different parameters.

#### **1.2.2.1 Hello nets!**
You're going to build a simple neural network to get a feeling of how quickly it is to accomplish this in Keras.

You will build a network that takes two numbers as an input, passes them through a hidden layer of 10 neurons, and finally outputs a single non-constrained number.

A non-constrained output can be obtained by avoiding setting an activation function in the output layer. This is useful for problems like regression, when we want our output to be able to take any non-constrained value.


**Instruction**

- Import the Sequential model from keras.models and the Denselayer from keras.layers.
- Create an instance of the Sequential model.
- Add a 10-neuron hidden Dense layer with an input_shape of two neurons.
- Add a final 1-neuron output layer and summarize your model with summary().

In [None]:
# Import the Sequential model and Dense layer
from keras.models import Sequential
from keras.layers import Dense

# Create a Sequential model
model = Sequential()

# Add an input layer and a hidden layer with 10 neurons
model.add(Dense(10, input_shape=(2,), activation="relu"))

# Add a 1-neuron output layer
model.add(Dense(1))

# Summarise your model
model.summary()

![](
https://drive.google.com/uc?export=view&id=1KYJqGrtAE33RRvqAgn-RnbsXFa2xQSpX)

#### **1.2.2.2 Counting parameters**

You've just created a neural network. But you're going to create a new one now, taking some time to think about the weights of each layer. The Keras Dense layer and the Sequential model are already loaded for you to use.

This is the network you will be creating:

**Instruction 1/2** 

- Instantiate a new Sequential() model.
- Add a Dense() layer with five neurons and three neurons as input.
- Add a final dense layer with one neuron and no activation.

**Hint**
- The input_shape is a tuple of the form (input_size,).
- Your dense layers should be enclosed between the model.add() parentheses.



In [None]:
# Instantiate a new Sequential model
model = Sequential()

# Add a Dense layer with five neurons and three inputs
model.add(Dense(5, input_shape=(3,), activation="relu"))

# Add a final Dense layer with one neuron and no activation
model.add(Dense(1))

# Summarize your model
model.summary()

![](
https://drive.google.com/uc?export=view&id=1v2Ql6jRe8604-ZWRFmF3GKvECZ4mnhmA)

#### **1.2.2.3 Build as shown!**

You will take on a final challenge before moving on to the next lesson. Build the network shown in the picture below. Prove your mastered Keras basics in no time!

**Instruction **

- Instantiate a Sequential model.
- Build the input and hidden layer.
- Add the output layer.


In [None]:
from keras.models import Sequential
from keras.layers import Dense

# Instantiate a Sequential model
model = Sequential()

# Build the input and hidden layer
model.add(Dense(3, input_shape=(2,), activation='relu'))

# Add the ouput layer
model.add(Dense(1))

Perfect! You've shown you can already translate a visual representation of a neural network into Keras code. Let's keep going!

### **1.1.3-Surviving a meteor strike**

**1. Surviving a meteor strike**

Welcome back! We are going to learn just what you're missing to be able to save the earth from a meteor strike. But first, let's see how to compile, train and predict with your models!

**2. Recap**

In the previous lesson we saw how easy it was to create a model with Keras. You instantiate your Sequential model,add a couple of layers and their activations, and that's it, you've built a simple model in no time.

**3. Compiling**

A model needs to be compiled before training. We can compile our model by calling the compile method on it. The compile method receives an optimizer, which we can see as the algorithm that will be used to update our neural network weights, and a loss function, which is the function we want to minimize during training. In this case, we choose adam as our optimizer and mean squared error as our loss function. Optimizers and loss functions will be covered later on in the course, so don't worry about it for now. Compiling our model produces no output. Our model is now ready to train!


![](https://drive.google.com/uc?export=view&id=1Ai59-e0if_bvOZ9XNiXCHIQXJDJdzqWk)

**4. Training**

Creating a model is useless if we don't train it. We train our model by calling the fit method and passing the features in X_train,the labels in y_train and the number of epochs to train for. An epoch corresponds to our entire training data passing through the network once and the respective weight updates during back-propagation. As our model is being trained, we will get some output showing the progress. We can see the model is improving since the mean squared error loss is decreasing at each epoch.

In [None]:
# Train your model
model.fit(X_train, y_train, epochs=5)

**5. Predicting**

To obtain predictions from our trained model we just need to call predict on the new set of data. We can store the predictions in a variable for later use. The predictions are just numbers in a numpy array, we will interpret these depending on our dataset and problem at hand.

In [None]:
#Predict on new data
preds=models.predict(X_test)
# look at the predictions
print(preds)

**6. Evaluating**

To quickly evaluate how well our model performs on unseen data we can use the model's evaluate method. This performs feed-forward with all samples in our test dataset (X_test). Feed-forward consists in computing a model's outputs from a given set of inputs. It then computes the error comparing the results to the true values stored in y_test. In this particular example, the model we trained for 5 epochs before, has a mean squared error of 0.25.

In [None]:
# evaluate your result
model.evaluate(X_test,y_test)

**7. The problem at hand**

Are you ready?! A meteor is approaching the earth and we want to make sure it won't take us to extinction. A group of scientists is trying to estimate the orbit by using historical data gathered about previous orbits of similar meteors.

![](
https://drive.google.com/uc?export=view&id=1asHHOzX2uyr5dCmn0PGwEu1m8Q7eY1Pd)


**8. Scientific prediction **

Scientist have used this data alongside their knowledge to estimate an 80-minute orbit, that is, an orbit from -40 minutes to +40 minutes. t=0 corresponds to the time of crossing the impact region. It looks like the meteor will be close! Perhaps it won't hit us, but we must make sure we are right!

**9. Your task**

You have data for the path a previous meteor took during a period of 20 minutes, 10 minutes before and 10 minutes after crossing the impact region. You will train a model on this data and extrapolate your predictions to an 80-minute orbit to see how it compares to the scientists prediction. Will your orbit be similar to that of the scientists, where we don't die just by a small bit?
![](
https://drive.google.com/uc?export=view&id=1u9vRuASe1NlnsMmXeJScKLySxSzYztz5)

![](https://drive.google.com/uc?export=view&id=1WNeqkyKSxwa3Hc1TArGvjxnou_v8eTiP
)

#### **1.1.3.1 Specifying a model**

You will build a simple regression model to predict the orbit of the meteor!

Your training data consist of measurements taken at time steps from -10 minutes before the impact region to +10 minutes after. Each time step can be viewed as an X coordinate in our graph, which has an associated position Y for the meteor orbit at that time step.

Note that you can view this problem as approximating a quadratic function via the use of neural networks.

This data is stored in two numpy arrays: one called time_steps , what we call features, and another called y_positions, with the labels. Go on and build your model! It should be able to predict the y positions for the meteor orbit at future time steps.

Keras Sequential model and Dense layers are available for you to use.


**Instruction**
- Instantiate a Sequential model.
- Add a Dense layer of 50 neurons with an input shape of 1 neuron.
- Add two Dense layers of 50 neurons each and 'relu' activation.
- End your model with a Dense layer with a single neuron and no activation.

In [None]:
# Instantiate a Sequential model
model = Sequential()
# Add a Dense layer with 50 neurons and an input of 1 neuron
model.add(Dense(50, input_shape=(1,), activation='relu'))
# Add two Dense layers with 50 neurons and relu activation
model.add(Dense(50,activation='relu'))
model.add(Dense(50,activation='relu'))
# End your model with a Dense layer and no activation
model.add(Dense(1))

You are closer to forecasting the meteor orbit! It's important to note we aren't using an activation function in our output layer since y_positions aren't bounded and they can take any value. Your model is built to perform a regression task.

#### **1.1.3.2 Training**

You're going to train your first model in this course, and for a good cause!

Remember that before training your Keras models you need to compile them. This can be done with the .compile() method. The .compile() method takes arguments such as the optimizer, used for weight updating, and the loss function, which is what we want to minimize. Training your model is as easy as calling the .fit() method, passing on the features, labels and a number of epochs to train for.

The regression model you built in the previous exercise is loaded for you to use, along with the time_steps and y_positions data. Train it and evaluate it on this very same data, let's see if your model can learn the meteor's trajectory.

**Instruction**

- Compile your model making use of the 'adam' optimizer and 'mse' as your loss function.
- Fit your model using the features and labels for 30 epochs.
- Evaluate your model with the .evaluate() method, passing the features and labels used during training.

In [None]:
# Compile your model
model.compile(optimizer = 'adam', loss = 'mse')
print("Training started..., this can take a while:")
# Fit your model on your data for 30 epochs
model.fit(time_steps,y_positions, epochs = 30)
# Evaluate your model 
print("Final loss value:",model.evaluate(time_steps, y_positions))

Final loss value: 0.11601032888400369
Amazing! You can check the console to see how the loss function decreased as epochs went by. Your model is now ready to make predictions on unseen data.

#### **1.1.3.3 Predicting the orbit!**

You've already trained a model that approximates the orbit of the meteor approaching Earth and it's loaded for you to use.

Since you trained your model for values between -10 and 10 minutes, your model hasn't yet seen any other values for different time steps. You will now visualize how your model behaves on unseen data.

If you want to check the source code of plot_orbit, paste show_code(plot_orbit) into the console.

Hurry up, the Earth is running out of time!

Remember np.arange(x,y) produces a range of values from x to y-1. That is the [x, y) interval.

Use the model's .predict() method to predict from -10 to 10 minutes.

In [None]:
# Predict the twenty minutes orbit
twenty_min_orbit = model.predict(np.arange(-10, 11))

# Plot the twenty minute orbit 
plot_orbit(twenty_min_orbit)

![](https://drive.google.com/uc?export=view&id=1yrUjhuvgdxLMap8SNeKavX6JIfp5qn1H)

In [None]:
# Predict the eighty minute orbit
eighty_min_orbit = model.predict(np.arange(-40, 41))

# Plot the eighty minute orbit 
plot_orbit(eighty_min_orbit)

Your model fits perfectly to the scientists trajectory for time values between -10 to +10, the region where the meteor crosses the impact region, so we won't be hit! However, it starts to diverge when predicting for new values we haven't trained for. This shows neural networks learn according to the data they are fed with. Data quality and diversity are very important. You've barely scratched the surface of what neural networks can do. Are you prepared for the next chapter?

![](
https://drive.google.com/uc?export=view&id=1rnb0PHvr-zHKO1J-wf2A0Bra5TztaPyT)

# **2 Going Deeper**


### **1.2.1 Binary classification**


**1. Binary classification**

You're now ready to learn about binary classification, so let's dive in.

**2. When to use binary classification?**

You will use binary classification when you want to solve problems where you predict whether an observation belongs to one of two possible classes. A simple binary classification problem could be learning the boundaries to separate blue from red circles as shown in the image.

![](https://drive.google.com/uc?export=view&id=1UYqpsy_0Lq4eIGZRa6idqXjQXyqDqU51)

**3. Our dataset**

The dataset for this problem is very simple. The coordinates are pairs of values corresponding to the X and Y coordinates of each circle in the graph. The labels are 1 for red circles and 0 for the blue circles.

![](https://drive.google.com/uc?export=view&id=1OnE1rovNG_UWwvLM1RJwpT6Redg6nXSN)

**4. Pairplots**

We can make use of seaborn's pairplot function to explore a small dataset and identify whether our classification problem will be easily separable. We can get an intuition for this if we see that the classes separate well-enough along several variables. In this case, for the circles dataset, there is a very clear boundary: the red circles concentrate at the center while the blue are outside. It should be easy for our network to find a way to separate them just based on x and y coordinates.

![](https://drive.google.com/uc?export=view&id=11Qext6FFSD9xD6ObdifRm9GvKzr_WAla)

**7. The NN architecture 3**

Then we have one hidden layer with four neurons. Four is a good enough number to learn the separation of classes in this dataset. This was found by experimentation.

**8. The NN architecture 4**

We finally end up with a single output neuron which makes use of the sigmoid activation function. It's important to note that, regardless of the activation functions used for the previous layers, we need the sigmoid activation function for this last output node.


**9. The sigmoid function**

The sigmoid activation function squashes the neuron output of the second to last layer to a floating point number between 0 and 1.

**10. The sigmoid function**

You can consider the output of the sigmoid function as the probability of a pair of coordinates being in one class or another. So we can set a threshold and say everything below 0.5 will be a blue circle and everything above a red one.

![](https://drive.google.com/uc?export=view&id=16I4Jv7-qkldO9-6zGUN45hOx_MIdH1sW)

![](https://drive.google.com/uc?export=view&id=1q1gmN39GmBNI0K1-zUCHqe2KnZITE2Yx)

**11. Let's build it**

So let's build our model in keras: We start by importing the sequential model and the dense layer. We then instantiate a sequential model.

**12. Let's build it**

We add a hidden layer of 4 neurons and we define an input shape, which consists of 2 neurons. We use the tanh as the activation function, for this hidden layer. Activation functions are covered later in the course, so don't worry about this choice for now.

**13. Let's build it**

We finally add an output layer which contains a single neuron, we make use of the sigmoid activation function so that we achieve the behavior we expect from this network, that is obtaining a value between 0 and 1. Our model is now ready to be trained.

![](https://drive.google.com/uc?export=view&id=1Z-s34OwqTGx__Okz0hBmzhMOTqlYL6Qs)

![](https://drive.google.com/uc?export=view&id=1Cdcj9wln-YAUnBg6Peq5OCEiOJkOdhuV)

#### **1.2.1.1 Exploring dollar bills**

You will practice building classification models in Keras with the Banknote Authentication dataset.

Your goal is to distinguish between real and fake dollar bills. In order to do this, the dataset comes with 4 features: variance,skewness,kurtosis and entropy. These features are calculated by applying mathematical operations over the dollar bill images. The labels are found in the dataframe's class column.

**Instruction**

- Import seaborn as sns.
- Use seaborn's pairplot() on banknotes and set hue to be the name of the column containing the labels.
- Generate descriptive statistics for the banknotes authentication data.
= Count the number of observations per label with .value_counts().

In [None]:
# Import seaborn
import seaborn as sns
# Use pairplot and set the hue to be our class column
sns.pairplot(banknotes, hue='class') 
# Show the plot
plt.show()
# Describe the data
print('Dataset stats: \n', banknotes.descibe())
# Count the number of observations per class
print('Observations per class: \n', banknotes['class'].value_counts())

![](
https://drive.google.com/uc?export=view&id=1B1MpQeZt1Kg82l6mAED9jogBDJFa885J)

#### **1.2.1.2 Is this dollar bill fake ?**

You are now ready to train your model and check how well it performs when classifying new bills! The dataset has already been partitioned into features: X_train & X_test, and labels: y_train & y_test.

**Insruction **

- Train your model for 20 epochs calling .fit(), passing in the training data.
- Check your model accuracy using the .evaluate() method on the test data.
- Print accuracy.


In [None]:
# Train your model for 20 epochs
model.fit(X_train, y_train, epochs = 20)
# Evaluate your model accuracy on the test set
accuracy = model.evaluate(X_test, y_test)[1]
# Print accuracy
print('Accuracy:', accuracy)

Accuracy: 0.8252427167105443


Alright! It looks like you are getting a high accuracy even with this simple model!

### **1.3.2 Multi-class classification**

**1. Multi-class classification**

What about when we have more than two classes to classify? We run into a multi-class classification problem, but don't worry, we just have to tweak our neural network architecture.

**2. Throwing darts**

Identifying who threw which dart in a game of darts is a good example of a multi-class classification problem. Each dart can only be thrown by one competitor. That means our classes are mutually exclusive, no dart can be thrown by two different competitors simultaneously.

**3. The dataset**

The darts dataset consist of dart throws by different competitors. The coordinate pairs xCoord and yCoord show where each dart landed.

**4. The dataset**

Based on the landing position of previously thrown darts we should be able to distinguish between throwers if there's enough variation among them. In our pairplot we can see players tend to aim at certain regions of the board.

![](https://drive.google.com/uc?export=view&id=19F566c9S2ANiBtqcoNjwwkfFJ5u_RPC4)

![](https://drive.google.com/uc?export=view&id=1BZs1-YxO9mebsdM9i_algsc2NCie2zpb)

**5. The architecture**

The model for this dataset has two neurons as inputs,since our predictors are xCoord and yCoord. We will define them using the input_shape argument, just as we've done before.

**6. The architecture**

In between there will be a series of hidden layers, we are using 3 Dense layers of 128, 64 and 32 neurons each.

**7. The architecture**

As outputs we have 4 neurons, one per competitor. Let's look closer at the output layer.

![](https://drive.google.com/uc?export=view&id=1eHha7Z85p0IgGhyDCR4j0BHTdiqeHLX3)

**8. The output layer**

We have 4 outputs, each linked to a possible competitor. Each competitor has a probability of having thrown a given dart, so we must make sure the total sum of probabilities for the output neurons equals one. We achieve this with the softmax activation function. Once we have a probability per output neuron we then choose as our prediction the competitor whose associated output has the highest probability.

![](https://drive.google.com/uc?export=view&id=1_hail1aH8YBmkvE2QkTUVcrEhchRtU81)

**9. Multi-class model**

You can build this model as we did in the previous lesson; instantiate a sequential model, add a hidden layer, also defining an input layer with the input_shape parameter,and finish by adding the remaining hidden layers and an output layer with softmax activation. You will do all this yourself in the exercises.

![](https://drive.google.com/uc?export=view&id=1RYWoEBIo6X1bw-BmbbLb7Be-njP8RIOF
)

**10. Categorical cross-entropy**

When compiling your model, instead of binary cross-entropy as we used before, we now use categorical cross-entropy or log loss. Categorical cross-entropy measures the difference between the predicted probabilities and the true label of the class we should have predicted. So if we should have predicted 1 for a given class, taking a look at the graph we see we would get high loss values for predicting close to 0 (since we'd be very wrong) and low loss values for predicting closer to 1 (the true label).

![](https://drive.google.com/uc?export=view&id=1DlfFP_A0DIJpgWMnhdFhqd-lEcdvEp5z
)

**11. Preparing a dataset**

Since our outputs are vectors containing the probabilities of each class, our neural network must also be trained with vectors representing this concept. To achieve that we make use of the keras.utils to_categorical function. We first turn our response variable into a categorical variable with pandas Categorical, this allows us to redefine the column using the categorical codes (cat codes) of the different categories. Now that our categories are each represented by a unique integer, we can use the to_categorical function to turn them into one-hot encoded vectors, where each component is 0 except for the one corresponding to the labeled categories.

![](https://drive.google.com/uc?export=view&id=1jW4UBNYXLNpq_wDL-NlZ-cnZSSWVPp7Z)

**12. One-hot encoding**

Keras to_categorical essentially perform the process described in the picture above. Label encoded Apple,Chicken and Broccoli turn into a vector of 3 components. A 1 is placed to represent the presence of the class and a 0 to indicate its absence.

![](https://drive.google.com/uc?export=view&id=1tjy2FpyNgDXRyz_LFOsQb7ivk74ok4ce)

#### **1.3.1 A multi-class model**

You're going to build a model that predicts who threw which dart only based on where that dart landed! (That is the dart's x and y coordinates on the board.)

This problem is a multi-class classification problem since each dart can only be thrown by one of 4 competitors. So classes/labels are mutually exclusive, and therefore we can build a neuron with as many output as competitors and use the softmax activation function to achieve a total sum of probabilities of 1 over all competitors.

Keras Sequential model and Dense layer are already loaded for you to use.

**instruction**

- Instantiate a Sequential model.
- Add 3 dense layers of 128, 64 and 32 neurons each.
- Add a final dense layer with as many neurons as competitors.
- Compile your model using categorical_crossentropy loss.

#### **1.3.2-Prepare your dataset**

In the console you can check that your labels, darts.competitor are not yet in a format to be understood by your network. They contain the names of the competitors as strings. You will first turn these competitors into unique numbers,then use the to_categorical() function from keras.utils to turn these numbers into their one-hot encoded representation.

This is useful for multi-class classification problems, since there are as many output neurons as classes and for every observation in our dataset we just want one of the neurons to be activated.

The dart's dataset is loaded as darts. Pandas is imported as pd. Let's prepare this dataset!

**instruction 1**

- Use the Categorical() method from pandas to transform the competitor column.
- Assign a number to each competitor using the cat.codes attribute from the competitor column.

In [None]:
# Transform into a categorical variable
darts.competitor = pd.Categorical(darts.competitor)

# Assign a number to each category (label encoding)
darts.competitor = darts.competitor.cat.codes 

# Print the label encoded competitors
print('Label encoded competitors: \n',darts.competitor.head())

![](
https://drive.google.com/uc?export=view&id=1IJLVmZrkt69yjzUbZw97uAAmrGjJFKf-)

**Instructon 2**

- Import to_categorical from keras.utils.
- Apply to_categorical() to your labels.

In [None]:
# Transform into a categorical variable
darts.competitor = pd.Categorical(darts.competitor)

# Assign a number to each category (label encoding)
darts.competitor = darts.competitor.cat.codes 

# Import to_categorical from keras utils module
from keras.utils import to_categorical

coordinates = darts.drop(['competitor'], axis=1)
# Use to_categorical on your labels
competitors = to_categorical(darts.competitor)

# Now print the one-hot encoded labels
print('One-hot encoded competitors: \n',competitors)

![](https://drive.google.com/uc?export=view&id=1sTycp1dzBzwwRg0zl9f8XGSpm5fHY5pj
)

Great! Each competitor is now a vector of length 4, full of zeroes except for the position representing her or himself.

#### **1.3.3 Training on dart throwers**

Your model is now ready, just as your dataset. It's time to train!

The coordinates features and competitors labels you just transformed have been partitioned into coord_train,coord_test and competitors_train,competitors_test.

Your model is also loaded. Feel free to visualize your training data or model.summary() in the console.

Let's find out who threw which dart just by looking at the board!

**instruction**

- rain your model on the training data for 200 epochs.
- Evaluate your model accuracy on the test data.


In [None]:
# Fit your model to the training data for 200 epochs
model.fit(coord_train,competitors_train,epochs=200)

# Evaluate your model accuracy on the test data
accuracy = model.evaluate(coord_test, competitors_test)[1]

# Print accuracy
print('Accuracy:', accuracy)

Your model just trained for 200 epochs! The accuracy on the test set is quite high. How are the predictions looking? Let's find out!


Accuracy: 0.85

#### **1.3.4 Softmax predictions**

Your recently trained model is loaded for you. This model is generalizing well!, that's why you got a high accuracy on the test set.

Since you used the softmax activation function, for every input of 2 coordinates provided to your model there's an output vector of 4 numbers. Each of these numbers encodes the probability of a given dart being thrown by one of the 4 possible competitors.

When computing accuracy with the model's .evaluate() method, your model takes the class with the highest probability as the prediction. np.argmax() can help you do this since it returns the index with the highest value in an array.

Use the collection of test throws stored in coords_small_test and np.argmax()to check this out!


**instruction  1**

- Predict with your model on coords_small_test.
- Print the model predictions.

In [None]:
# Predict on coords_small_test
preds = model.predict(coords_small_test)

# Print preds vs true values
print("{:45} | {}".format('Raw Model Predictions','True labels'))
for i,pred in enumerate(preds):
  print("{} | {}".format(pred,competitors_small_test[i]))

![](https://drive.google.com/uc?export=view&id=1BrNsjSp6G_EMRc9W_LOvCx1qcAywMU0r)

**instruction 2**

- Use np.argmax()to extract the index of the highest probable competitor from each pred vector in preds.

In [None]:
# Predict on coords_small_test
preds = model.predict(coords_small_test)

# Print preds vs true values
print("{:45} | {}".format('Raw Model Predictions','True labels'))
for i,pred in enumerate(preds):
  print("{} | {}".format(pred,competitors_small_test[i]))

# Extract the position of highest probability from each pred vector
preds_chosen = [np.argmax(pred) for pred in preds]

# Print preds vs true values
print("{:10} | {}".format('Rounded Model Predictions','True labels'))
for i,pred in enumerate(preds_chosen):
  print("{:25} | {}".format(pred,competitors_small_test[i]))

![](https://drive.google.com/uc?export=view&id=1i9dek-iLNpynRFXVSpIkAWW2CrJbDVLl)

Well done! As you've seen you can easily interpret the softmax output. This can also help you spot those observations where your network is less certain on which class to predict, since you can see the probability distribution among classes per prediction. Let's learn how to solve new problems with neural networks!

### **1.4 Multi-label classification**

**1. Multi-label classification**

Now that you know how multi-class classification works, we can take a look at multi-label classification. They both deal with predicting classes, but in multi-label classification, a single input can be assigned to more than one class.

**2. Real world examples**

We could use multi-label classification, for instance, to tag a serie’s genres by its plot summary.


**4. Multi-class vs multi-label**

Imagine we had three classes; sun, moon and clouds. In multi-class problems if we took a sample of our observations each individual in the sample will belong to a unique class. However in a multi-label problem each individual in the sample can have all, none or a subset of the available classes. As you can see in the image, multi-label vectors are also one-hot encoded, there's a 1 or a 0 representing the presence or absence of each class.

1 https://gombru.github.io/2018/05/23/cross_entropy_loss/

![](
https://drive.google.com/uc?export=view&id=1ZeFAOFjvQpIWhPQurHG0_cMbOxj3Z8vQ)

**5. The architecture**

Making a multi-label model for this problem is not very different to what you did when building your multi-class model. We first instantiate a sequential model. For the sake of this example, we will assume that to differentiate between these 3 classes, we need just one input and 2 hidden neurons. The biggest changes happen in the output layer and in its activation function. In the output layer, we use as many neurons as possible classes but we use sigmoid activation this time.

![](https://drive.google.com/uc?export=view&id=11wquJNNBo_bLTWGDIzgwotnDqk4wbRVM)

**6. Sigmoid outputs**

We use sigmoid outputs because we no longer care about the sum of probabilities. We want each output neuron to be able to individually take a value between 0 and 1. This can be achieved with the sigmoid activation because it constrains our neuron output in the range 0-1. That's what we did in binary classification, though we only had one output neuron there.

![](https://drive.google.com/uc?export=view&id=1ePx9KweW5gjN4UC0wo2xIiUbrd8sFRos)

**7. Compile and train**

Binary cross-entropy is now used as the loss function when compiling the model. You can look at it as if you were performing several binary classification problems: for each output we are deciding whether or not its corresponding label is present. When training our model we can use the validation_split argument to print validation loss and accuracy as it trains. By using validation_split, a percentage of training data is left out for testing at each epoch.

![](https://drive.google.com/uc?export=view&id=1IOj8sOaR5cVz4n9jKbPfIeBqfTom5cVd
)

**8. An advantage**

Using neural networks for multi-label classification can be performed by minor tweaks in our model architecture. If we were to use a classical machine learning approach to solve multi-label problems we would need more complex methods. One way to do so consists of training several classifiers to distinguish each particular class from the rest. This is called one versus rest classification.

![](https://drive.google.com/uc?export=view&id=1-XktWRtOKv1j3DF1LIVrZbZvV5dvVf4-
)

**9. An irrigation machine**

Let's tackle a new problem. A farm field has an array of 20 sensors distributed along 3 crop fields. These sensors measure, among other things, the humidity of the soil, radiation of the sun, etc. Your task is to use the combination of measurements of these sensors to decide which parcels to water, given each parcel has different environmental requirements.

**10. An irrigation machine**

Each sensor measures an integer value between 0 and 13 volts. Parcels can be represented as one-hot encoded vectors of length 3, where each index is one of the parcels. Parcels can be watered simultaneously.

![](https://drive.google.com/uc?export=view&id=1NicNkAZL6MeV7olxScbkTUx_dP--0tJh)

![](https://drive.google.com/uc?export=view&id=1D0XRAscH9JnLMlgsrHPSZUAzpRehc-p2
)

#### **1.4.1 An irrigation machine**

You're going to automate the watering of farm parcels by making an intelligent irrigation machine. Multi-label classification problems differ from multi-class problems in that each observation can be labeled with zero or more classes. So classes/labels are not mutually exclusive, you could water all, none or any combination of farm parcels based on the inputs.

To account for this behavior what we do is have an output layer with as many neurons as classes but this time, unlike in multi-class problems, each output neuron has a sigmoid activation function. This makes each neuron in the output layer able to output a number between 0 and 1 independently.

Keras Sequential() model and Dense() layers are preloaded. It's time to build an intelligent irrigation machine!

**Instruction**

- Instantiate a Sequential() model.
- Add a hidden layer of 64 neurons with as many input neurons as there are sensors and relu activation.
- Add an output layer with as many neurons as parcels and sigmoidactivation.
- Compile your model with the adam optimizer and binary_crossentropy loss.


In [None]:
# Instantiate a Sequential model
model = Sequential()

# Add a hidden layer of 64 neurons and a 20 neuron's input
model.add(Dense(64, activation='relu', input_shape=(20,)))

# Add an output layer of 3 neurons with sigmoid activation
model.add(Dense(3, activation='sigmoid'))

# Compile your model with binary crossentropy loss
model.compile(optimizer='adam',
           loss = 'binary_crossentropy',
           metrics=['accuracy'])

model.summary()

![](
https://drive.google.com/uc?export=view&id=1S1s1Z9ufGXvckmTxJbZVD0Gs_frMnElH
)

#### **1.4.2 Training with multiple labels**

An output of your multi-label model could look like this: [0.76 , 0.99 , 0.66 ]. If we round up probabilities higher than 0.5, this observation will be classified as containing all 3 possible labels [1,1,1]. For this particular problem, this would mean watering all 3 parcels in your farm is the right thing to do, according to the network, given the input sensor measurements.

You will now train and predict with the model you just built. sensors_train, parcels_train, sensors_test and parcels_test are already loaded for you to use.

Let's see how well your intelligent machine performs!

**Instruction**

- Train the model for 100 epochs using a validation_split of 0.2.
- Predict with your model using the test data.
- Round up your preds with np.round().
- Evaluate your model's accuracy on the test data.

In [None]:
# Train for 100 epochs using a validation split of 0.2
model.fit(sensors_train, parcels_train, epochs = 100, validation_split = 0.2)

# Predict on sensors_test and round up the predictions
preds = model.predict(sensors_test)
preds_rounded = np.round(preds)

# Print rounded preds
print('Rounded Predictions: \n', preds_rounded)

# Evaluate your model's accuracy on the test data
accuracy = model.evaluate(sensors_test, parcels_test)[1]

# Print accuracy
print('Accuracy:', accuracy)

Accuracy: 0.9044444600741068


### **1.5 Keras callbacks**


**1. Keras callbacks**

By now you've trained a lot of models. It's time to learn more about how to better control and supervise model training by using callbacks.

**2. What is a callback?**

A callback is a function that is executed after some other function, event, or task has finished. For instance, when you touch your phone screen, a block of code that identifies the type of gesture will be triggered. Since this block of code has been called after the touching event occurred, it's a callback.

![](https://drive.google.com/uc?export=view&id=10Qq4jNXDAYNfqvvQe_EB_fLGH4exiCLL
)

**3. Callbacks in Keras**

In the same way, a keras callback is a block of code that gets executed after each epoch during training or after the training is finished. They are useful to store metrics as the model trains and to make decisions as the training goes by

![](https://drive.google.com/uc?export=view&id=1fSWIRWa9McKbWfT9pZlUwq03p7kuBEYv)

**4. A callback you've been missing**

Every time you call the fit method on a keras model there's a callback object that gets returned after the model finishes training. This is the history object. Accessing the history attribute, which is a python dictionary,we can check the saved metrics of the model at each epoch during training as an array of numbers.

![](
https://drive.google.com/uc?export=view&id=18Ld8VcDVPyDmE4ilWWDf-xUu_3hU8ENy)

**5. A callback you've been missing**

To get the most out of the history object we should use the validation_data parameter in our fit method, passing X_test and y_test as a tuple. The validation_split parameter can be used instead, specifying a percentage of the training data that will be left out for testing purposes. That way we not only have the training metrics but also the validation metrics.

![](https://drive.google.com/uc?export=view&id=16snzzSQ8_7DUvNJiTbVGx6ewbsYpmst_
)

**6. History plots**

You can compare training and validation metrics with a few matplotlib commands. We just need to define a figure. Plot the values of the history attribute for the training accuracy (acc) and the validation accuracy (val_acc). We can then make our graph prettier by adding a title, axis labels and a legend.

![](https://drive.google.com/uc?export=view&id=1fw2DtyjwxxAJ6XcEK-h6HXwUPb92z53B)

**7. History plots**

We can see our model accuracy increases for both training and test sets till it reaches epoch 25. Then accuracy flattens for the test set whilst the training keeps improving. Overfitting it's taking place since we see the train keeps improving as test data decreases in accuracy. More on this in the next chapter.

**8. Early stopping**

Early stopping a model can solve the overfitting problem. Since it stops its training when it no longer improves. This is extremely useful since deep neural models can take a long time to train and we don't know beforehand how many epochs will be needed. Early stopping, like other keras callbacks can be imported from keras callbacks. We then need to instantiate it. The early stopping callback can monitor several metrics, like validation accuracy, validation loss, etc. These can be specified in the monitor parameter. It's also important to define a patience argument, that is the number of epochs to wait for the model to improve before stopping it's training. There's no rules to decide which patience number works best at all times,this depends on the implementation. It's good to avoid low values, that way your model has a chance to improve at a later epoch.The callback is passed as a list to the callbacks parameter in the model fit method.

![](https://drive.google.com/uc?export=view&id=1T4BFhuvSUxxsFUxkHWm5YWik8cPV7xLS)

**9. Model checkpoint**

The model checkpoint callback can also be imported from keras callbacks. This callback allows us to save our model as it trains. We specify the model filename with a name and the .hdf5 extension. You can also decide what to monitor to determine which model is best with the monitor parameter, by default validation loss is monitored. Setting the save_best_only parameter to True guarantees that the latest best model according to the quantity monitored will not be overwritten.

![](https://drive.google.com/uc?export=view&id=1kuXu6Ha_5n_GlQFGp74fZCkxk3G5538v)

####  **1.5.1-The history callback**

The history callback is returned by default every time you train a model with the .fit() method. To access these metrics you can access the history dictionary parameter inside the returned h_callback object with the corresponding keys.

The irrigation machine model you built in the previous lesson is loaded for you to train, along with its features and labels now loaded as X_train, y_train, X_test, y_test. This time you will store the model's historycallback and use the validation_data parameter as it trains.

You will plot the results stored in history with plot_accuracy() and plot_loss(), two simple matplotlib functions. You can check their code in the console by pasting show_code(plot_loss).

Let's see the behind the scenes of our training!


**instruction**
1. Train your model on X_train and y_train, validate each epoch on X_test and y_test.
2. Use plot_lossextracting lossand val_loss from h_callback.
3. Use plot_accuracyextracting accand val_acc from h_callback


**hints**

- Use the .fit() method to train your model.
- The h_callback variable is an object, to access its dictionary with the metrics as keys you need to access it like this: h_callback.history.

![](https://drive.google.com/uc?export=view&id=1bItetEmlAb6V0PqI2dIs9rDQM_xQpYGG)

####**1.5.2-Early stopping your model**

The early stopping callback is useful since it allows for you to stop the model training if it no longer improves after a given number of epochs. To make use of this functionality you need to pass the callback inside a list to the model's callback parameter in the .fit() method.

The model you built to detect fake dollar bills is loaded for you to train, this time with early stopping. X_train, y_train, X_test and y_test are also available for you to use.

**instruction**

- Import the EarlyStoppingcallback from keras.callbacks.
- Define a callback, monitor 'val_acc' with a patience of 5 epochs.
- Train your model using the early stopping callback.

**Hindts**

- Instantiate the callback with EarlyStopping().
- The validation_data parameter takes a tuple with X_test and y_test.
- Callbacks need to be passed inside a list, since the parameter is prepared to accept more than one

![](https://drive.google.com/uc?export=view&id=1T0_t8a_FNfjo01kKJIoRFsLx5m_fzxIc)

#### **1.5.3 A combination of callbacks**


Deep learning models can take a long time to train, especially when you move to deeper architectures and bigger datasets. Saving your model every time it improves as well as stopping it when it no longer does allows you to worry less about choosing the number of epochs to train for. You can also restore a saved model anytime and resume training where you left it.

The model training and validation data are available in your workspace as X_train, X_test, y_train, and y_test.

Use the EarlyStopping() and the ModelCheckpoint() callbacks so that you can go eat a jar of cookies while you leave your computer to work!

**Instruction**

- Import both the EarlyStopping and ModelCheckpoint callbacks from keras.
- Create monitor_val_acc as an EarlyStopping callback that will monitor 'val_acc', with a patience of 3 epochs.
- Create modelCheckpoint as a ModelCheckpointcallback, save the best model as best_banknote_model.hdf5.
- Fit your model providing a list with the defined callbacks and X_test and y_test as validation data.

**Hints**

- ImportModelCheckpoint and EarlyStopping from keras.callbacks.
- Don't forget to monitor 'val_acc' and use the patience argument in your EarlyStopping callback.
- save_best_only takes a boolean value.
- The two callback objects must be passed inside a list to the model's callbacks parameter in the .fit() method.

![](
https://drive.google.com/uc?export=view&id=1TtewrFOlhg8XgWeD116WibYrPjp56QN6)

# **3-Improving Your Model Performance**

### **3.1 Learning Curve**

**1. Learning curves**

Learning curves provide a lot of insight into your model. Now that you know how to use the history callback to plot them, you will learn how to get the most value out of them.

**2. Learning curves**

So far we've seen two types of learning curves: loss curves and accuracy curves.

![](https://drive.google.com/uc?export=view&id=1-MnsnGt8ixJHn57DpSBuAOBgyERjab2f)

**3. Loss curve**

Loss tends to decrease as epochs go by. This is expected since our model is essentially learning to minimize the loss function. Epochs are shown on the X axis and loss on the Y-axis. As epochs go by our loss value decreases. After a certain amount of epochs, the value converges, meaning it no longer gets much lower than that. We've arrived at a minimum.



**4. Accuracy curve**

Accuracy curves are similar but opposite in tendency if the Y-axis shows accuracy it now tends to increase as epochs go by. This shows our model makes fewer mistakes as it learns.

**5. Overfitting**

If we plot training versus validation data we can identify overfitting. We will see the training and validation curves start to diverge. Overfitting is when our model starts learning particularities of our training data which don't generalize well on unseen data. The early stopping callback is useful to stop our model before it starts overfitting.

![](https://drive.google.com/uc?export=view&id=1xNus1mxxyuLVJpZWLqItnRthu_La209G)

**6. Unstable curves**

But not all curves are smooth and pretty, many times we will find unstable curves. There are many reasons that can lead to unstable learning curves; the chosen optimizer, learning rate, batch-size, network architecture, weight initialization, etc. All these parameters can be tuned to improve our model learning curves, as we aim for better accuracy and generalization power. We will cover this in the following videos.

![](https://drive.google.com/uc?export=view&id=18Qp7x8cZfyDFUqhmGjrl5KmZdgUc0Ps0)

**7. Can we benefit from more data?**

![](https://drive.google.com/uc?export=view&id=1U8QquSossUbjBX8Ge4daALhKPhYIStxV)

Neural networks are well known for surpassing traditional machine learning techniques as we increase the size of our datasets. We can check whether collecting more data would increase a model’s generalization and accuracy.

**8. Can we benefit from more data?**

We aim at producing a graph like this one, where we have fitted our model with increasing amounts of training data and plotted the values for the training and test accuracies of each run.

**9. Can we benefit from more data?**

If after using all our data we see that our test still has a tendency to improve that is, it's not parallel to our training set curve and it's increasing, then it's worth to gather more data if possible to allow the model to keep learning.



**10. Coding train size comparison**

How would we go about coding a graph like the previous one? Imagine we want to evaluate an already built and compiled model and that we have partitioned our data into X_train, y_train, X_test and y_test. We first store the model initial weights, this is done by calling get_weights on our model,we then initialize two lists to store train and test accuracies.

![](https://drive.google.com/uc?export=view&id=1XzQEg_8qe7WT5DNeEecLSYiQjFWbKQ8T)

**11. Coding train size comparison II**

We loop over a predefined list of train sizes and for each training size we get the corresponding training data fraction. Before any training, we make sure our model starts with the same set of weights by setting them to the initial_weights using the set_weights function. After that, we can fit our model on the training fraction. We use an EarlyStopping callback which monitors loss, it's important to note it's not validation loss since we haven't provided the fit method with validation data. After the training is done we can get the accuracy for the training set fraction and the accuracy from the test set and append it to our lists of accuracies. Observe that the same amount of test data was used for each iteration.

![](https://drive.google.com/uc?export=view&id=1AhWUTNSma18bhnvyVp9Z0nyLeYlcyXBb)

12. Time to dominate all curves!
It's time for you to show you dominate learning curves!

#### **3.1.1-Learning the digits**


You're going to build a model on the digits dataset, a sample dataset that comes pre-loaded with scikit learn. The digits dataset consist of 8x8 pixel handwritten digits from 0 to 9:


You want to distinguish between each of the 10 possible digits given an image, so we are dealing with multi-class classification.
The dataset has already been partitioned into X_train, y_train, X_test, and y_test, using 30% of the data as testing data. The labels are already one-hot encoded vectors, so you don't need to use Keras to_categorical() function.

Let's build this new model!

**Instruction** 

- Add a Dense layer of 16 neurons with relu activation and an input_shape that takes the total number of pixels of the 8x8 digit image.
- Add a Dense layer with 10 outputs and softmax activation.
Compile your model with adam, categorical_crossentropy, and accuracy metrics.
- Make sure your model works by predicting on X_train.

**Hints**


- The total number of pixels in an image can be obtained by multiplying width and height.
- The input_shape parameter takes in a tuple.
- Activations functions are passed to the activation parameter as a string.

![](https://drive.google.com/uc?export=view&id=1wDYoNXQdSgbMb0miMIXIqm1MN-lWrxoC)

#### **3.1.2-Is the model overfitting?**


Let's train the model you just built and plot its learning curve to check out if it's overfitting! You can make use of the loaded function plot_loss() to plot training loss against validation loss, you can get both from the history callback.

If you want to inspect the plot_loss() function code, paste this in the console: show_code(plot_loss)

**instruction** 

- Train your model for 60 epochs, using X_test and y_test as validation data.
- Use plot_loss() passing loss and val_loss as extracted from the history attribute of the h_callback object.

**Hints**

- Validation data is to be passed to the validation_data parameter.
- The h_callback object has an history attribute which is a dictionary: h_callback.history['loss'] would give us the array of training losses for each epoch.

![](https://drive.google.com/uc?export=view&id=1t99xP9zPjFK0dvMJyaOFdGdUxuq3ByuR)

#### **3.1.3-Do we need more data?**


It's time to check whether the digits dataset model you built benefits from more training examples!

In order to keep code to a minimum, various things are already initialized and ready to use:

- The model you just built.
- X_train,y_train,X_test, and y_test.
- The initial_weights of your model, saved after using model.- get_weights().
- A pre-defined list of training sizes: training_sizes.
- A pre-defined early stopping callback monitoring loss: early_stop.
- Two empty lists to store the evaluation results: train_accs and test_accs.

Train your model on the different training sizes and evaluate the results on X_test. End by plotting the results with plot_results().

The full code for this exercise can be found on the slides!

**Instruction**

- Get a fraction of the training data determined by the size we are currently evaluating in the loop.
- Set the model weights to the initial_weights with set_weights() and train your model on the fraction of training data using early_stop as a callback.
- Evaluate and store the accuracy for the training fraction and the test set.
- Call plot_results() passing in the training and test accuracies for each training size.

**Hints**

- The whole training data is stored in X_train and y_train.
- model.set_weights() receives the initially stored weights: initial_weights. early_stop is an already defined callback you can pass to your callback array.
- You can obtain the accuracy of a model with model.evaluate() passing in predictors and labels.

![](
https://drive.google.com/uc?export=view&id=14CEUdCyyZZY8nvCm_Hyz27T-CxN1gapu)

### **3.2-Activation functions**


**1. Activation functions**

So far we've been using several activation functions in our models, but we haven't yet covered their role in neural networks other than when it comes to obtaining the output we want in our output layer.

![](https://drive.google.com/uc?export=view&id=1n1VueOFpJOZISxErXlXX2DCUBuBJ2q6B)

**2. An activation function**

Inside the neurons of any neural network the same process takes place:

**3. An activation function**

A summation of the inputs reaching the neuron multiplied by the weights of each connection and the addition of the bias weight. This operation results in a number: a, which can be anything, it is not bounded.

**4. An activation function**

We pass this number into an activation function that essentially takes it as an input and decides how the neuron fires and which output it produces. Activation functions impact learning time, making our model converge faster or slower and achieving lower or higher accuracy. They also allow us to learn more complex functions.

**5. Activation zoo**

Four very well known activation functions are: The **sigmoid**, which varies between 0 and 1 for all possible X input values. The **tanh or Hyperbolic tangent**, which is similar to the sigmoid in shape but varies between -1 and 1.

![](https://drive.google.com/uc?export=view&id=1auOZxKH3xHV2I8I5G-6Ak_t-BYXvTYp4)

**6. Activation zoo**

The **ReLU** (Rectified linear unit) which varies between 0 and infinity and the leaky ReLU, which we can look as a smoothed version of ReLU that doesn't sit at 0, allowing negative values for negative inputs.

![](https://drive.google.com/uc?export=view&id=1jHV1IDdiD8hWJZrMvkSTT4w6RHV52IjQ
)

**7. Effects of activation functions**

![](https://drive.google.com/uc?export=view&id=1-ab9d6Lyc6dXsg3u-eq8puiiXRaRSbYG)

Changing the activation function used in the hidden layer of the model we built for binary classification results in different classification boundaries.

**8. Effects of activation functions**

![](https://drive.google.com/uc?export=view&id=1hVIe7DgcD6G1BYJPGc-4NqOUY9oYyg6R)

We can see that the previous model can not completely separate red crosses and blue circles if we use a sigmoid activation function in the hidden layer. Some blue circles are misclassified as red crosses along the diagonal. However, when we use the tanh we completely separate red crosses from blue circles, the separation region for the blue and red classification is smooth.

**9. Effects of activation functions**

Using a ReLU activation function we obtain sharper boundaries,the leaky ReLU shows similar behavior for this dataset. It's important to note that these boundaries will be different for every run of the same model because of the random initialization of weights and other random variables that aren't fixed.

![](https://drive.google.com/uc?export=view&id=1fjHl40cMhVoVn8v2S9Rq9hZsxR6MNqnl)

**10. Which activation function to use?**

All activation functions come with their pros and cons. There's no easy way to determine which activation function is best to use. Based on their properties, the problem at hand,and the layer we are looking at in our network, one activation function will perform better in terms of achieving our goal. A way to go is to start with ReLU as they train fast and will tend to generalize well to most problems,avoid sigmoids,and tune with experimentation.

![](https://drive.google.com/uc?export=view&id=1tP4caPySi4einv-__fd5PfK_egAVbIJY
)

**11. Comparing activation functions**

It's easy to compare how models with different activation functions perform if they are small enough and train fast. It's important to set a random seed with numpy, that way the model weights are initialized the same for each activation function. We then define a function that returns a fresh new model each time, using the act_function parameter.

![](https://drive.google.com/uc?export=view&id=1XLKt0gUxDUI9SC-nEiuSUX7UNXQDi9s-)

**12. Comparing activation functions**

We can then use this function as we loop over several activation functions, training different models and saving their history callback. We store all these callbacks in a dictionary.

![](https://drive.google.com/uc?export=view&id=1yX4V4VR7gSjOo9Uz1R-rJ7bsJQrDzSj2
)

**13. Comparing activation functions**

With this dictionary of histories, we can extract the metrics we want to plot,build a pandas dataframe and plot it.

![](https://drive.google.com/uc?export=view&id=1gNPaFqFEqHpB8GJFPYwqgaOON4HRW8Uk)

14. Let's practice!
Let's explore the effects of activation functions!

#### 3.2.1 **Different activation Function** 

![](https://drive.google.com/uc?export=view&id=1-TPR8StMFgGnhpiGMx3M-PLlkSJEWm8t)

#### **3.2.2 Comparing activation functions**


Comparing activation functions involves a bit of coding, but nothing you can't do!

You will try out different activation functions on the multi-label model you built for your farm irrigation machine in chapter 2. The function get_model('relu') returns a copy of this model and applies the 'relu' activation function to its hidden layer.

You will loop through several activation functions, generate a new model for each and train it. By storing the history callback in a dictionary you will be able to visualize which activation function performed best in the next exercise!

X_train, y_train, X_test, y_test are ready for you to use when training your models.

**Instruction**

- Fill up the activation functions array with relu,leaky_relu, sigmoid, and tanh.
- Get a new model for each iteration with get_model() passing the current activation function as a parameter.
- Fit your model providing the train and validation_data, use 20 epochs and set verbose to 0.

**Hints**

- Use quotes when defining your activation functions array.
- get_model(act) returns a new model applying act as the activation function for the hidden layer.
- model.fit() validation data is specified as a tuple (X_test,y_test)for the - validation_data parameter. verbose = 0 passed as a parameter makes our model give no output as it trains.

![](
https://drive.google.com/uc?export=view&id=13DmrM-pXKu6D41dMRxm5A6mrEhA-oEJS)

#### **3.2.3 Comparing activation functions II**

What you coded in the previous exercise has been executed to obtain theactivation_results variable, this time 100 epochs were used instead of 20. This way you will have more epochs to further compare how the training evolves per activation function.

For every h_callback of each activation function in activation_results:

The h_callback.history['val_loss'] has been extracted.
The h_callback.history['val_acc'] has been extracted.
Both are saved into two dictionaries: val_loss_per_function and val_acc_per_function.

Pandas is also loaded as pd for you to use. Let's plot some quick validation loss and accuracy charts!

**Instruction**

- Use pd.DataFrame()to create a new DataFrame from the val_loss_per_function dictionary.
- Call plot() on the DataFrame.
- Create another pandas DataFrame from val_acc_per_function.
- Once again, plot the DataFrame.

**Hints**

- You can create a pandas DataFrame from a dictionary by passing it to the pd.DataFrame() function.
- plot() is a method of the pandas DataFrame object, so it can be called as df.plot().

![](https://drive.google.com/uc?export=view&id=1ejfECYAf66lqIvc2LkLRSEt5dxkyXXap)

### 3.3 **Batch size and batch normalization**

**1. Batch size and batch normalization**

It’s time to learn the concepts of batch size and batch normalization.

**2. Batches**

A mini-batch is a subset of data samples. If we were training a neural network with images, each image in our training set would be a sample and we could take mini-batches of different sizes from the training set batch.

![](https://drive.google.com/uc?export=view&id=1_kSFhr0u7Fk9OqOW1BjIdPwnnvCEl58y)

**3. Mini-batch**

Remember that during an epoch we feed our network, calculate the errors and update the network weights. It's not very practical to update our network weights only once per epoch after looking at the error produced by all training samples. In practice, we take a mini-batch of training samples. That way, if our training set has 9 images and we choose a batch_size of 3, we will perform 3 weight updates per epoch, one per mini-batch.

![](https://drive.google.com/uc?export=view&id=17_U0Ea9VjUys20KwZAs4b4oKZNIkR03_)

**4. Mini-batches**

Networks tend to train faster with mini-batches since weights are updated often. Sometimes datasets are so huge that they would struggle to fit in RAM memory if we didn't use mini-batches. Also, the noise produced by a small batch-size can help escape local minima. A couple of disadvantages are the need for more iterations and finding a good batch size.

![](https://drive.google.com/uc?export=view&id=1ui2Iou26Nd4IFsySzVU6YI4vl890BiwK)

**5. Effects of batch sizes**

Here you can see how different batch sizes converge towards a minimum as training goes by. Training with all samples is shown in blue. Mini-batching is shown in green. Stochastic gradient descent, in red, uses a batch_size of 1. We can see how the path towards the best value for our weights is noisier the smaller the batch_size. They reach the same value after a different number of iterations.

![](https://drive.google.com/uc?export=view&id=1PC_QDSZRAgQdECq-5FPpoJIaHYRauFr2)

1 Stack Exchange
**6. Batch size in Keras**

You can set your own batch_size with the batch_size parameter on the model's fit method. Keras uses a default batch-size of 32. Increasing powers of two tend to be used. As a rule of thumb, you tend to make your batch size bigger the bigger your dataset.

**7. Normalization in machine learning**

Normalization is a common pre-processing step in machine learning algorithms, especially when features have different scales. One way to normalize data is to subtract its mean value and divide by the standard deviation.We always tend to normalize our model inputs. This avoids problems with activation functions and gradients.

**8. Normalization in machine learning**

This leaves everything centered around 0 with a standard deviation of 1.

![](https://drive.google.com/uc?export=view&id=1sxpiIEUGlPdH9xMih8l_kQbkLvRif6Y9)

**9. Reasons for batch normalization**

Normalizing neural networks inputs improve our model. But deeper layers are trained based on previous layer outputs and since weights get updated via gradient descent, consecutive layers no longer benefit from normalization and they need to adapt to previous layers' weight changes, finding more trouble to learn their own weights. Batch normalization makes sure that, independently of the changes, the inputs to the next layers are normalized. It does this in a smart way, with trainable parameters that also learn how much of this normalization is kept scaling or shifting it.

**10. Batch normalization advantages**

This improves gradient flow, allows for higher learning rates, reduces weight initializations dependence, adds regularization to our network and limits internal covariate shift; which is a funny name for a layer's dependence on the previous layer outputs when learning its weights. Batch normalization is widely used today in many deep learning models.

![](https://drive.google.com/uc?export=view&id=1j_ls4YzmTySriGr6V94REyLfE0quU1N_)

**11. Batch normalization in Keras**

Batch normalization in Keras is applied as a layer. So we can place it in between two layers. We import batch normalization from Keras layers. We then instantiate a sequential model, add an input layer, and then add a batch normalization layer. We finalize this binary classification model with an output layer.

![](https://drive.google.com/uc?export=view&id=1ui2Iou26Nd4IFsySzVU6YI4vl890BiwK)

![](https://drive.google.com/uc?export=view&id=1cU93eX05XEFT7-DMX5EJgmlDrVlyTUKA)

12. Let's practice!
Let's practice!

#### **3.3.1 Changing batch sizes**

You've seen models are usually trained in batches of a fixed size. The smaller a batch size, the more weight updates per epoch, but at a cost of a more unstable gradient descent. Specially if the batch size is too small and it's not representative of the entire training set.

Let's see how different batch sizes affect the accuracy of a simple binary classification model that separates red from blue dots.

You'll use a batch size of one, updating the weights once per sample in your training set for each epoch. Then you will use the entire dataset, updating the weights only once per epoch.

**Instruction**

Use get_model() to get a new, already compiled, model, then train your model for 5 epochs with a batch_size of 1.

**Hints**


The batch_size parameter takes a number as the size of the batch.

![](
https://drive.google.com/uc?export=view&id=16JDaLqJMFyXEGhBnbqO8sVL0_KSZxG3o)

#### **3.3.2 Batch normalizing a familiar model**


Remember the digits dataset you trained in the first exercise of this chapter?


A multi-class classification problem that you solved using softmax and 10 neurons in your output layer.
You will now build a new deeper model consisting of 3 hidden layers of 50 neurons each, using batch normalization in between layers. The kernel_initializer parameter is used to initialize weights in a similar way.

**Instruction** 

- Import BatchNormalization from keras layers.
- Build your deep network model, use 50 neurons for each hidden layer adding batch normalization in between layers.
- Compile your model with stochastic gradient descent, sgd, as an optimizer.

**Hints**


- You can import BatchNormalization from keras.layers, similar to how you import your Dense layers.
- Remember to instantiate your model with Sequential().
- BatchNormalization() is used to add a batch normalization layer.

![](https://drive.google.com/uc?export=view&id=1U8q3l4QhJirNemPJSQLIXjN8rg5MjVLA)

#### **3.3.3 Batch normalization effects**


Batch normalization tends to increase the learning speed of our models and make their learning curves more stable. Let's see how two identical models with and without batch normalization compare.

The model you just built batchnorm_model is loaded for you to use. An exact copy of it without batch normalization: standard_model, is available as well. You can check their summary() in the console. X_train, y_train, X_test, and y_test are also loaded so that you can train both models.

You will compare the accuracy learning curves for both models plotting them with compare_histories_acc().

You can check the function pasting show_code(compare_histories_acc) in the console.

**Instruction** 

- Train the standard_model for 10 epochs passing in train and validation data, storing its history in h1_callback.
- Train your batchnorm_model for 10 epochs passing in train and validation data, storing its history in h2_callback.
- Call compare_histories_acc passing in h1_callback and h2_callback.

**Hints**

The validation_data is to be passed as a tuple.


![](https://drive.google.com/uc?export=view&id=1WgxDzdCG4dYW-sdkA6r97CgpeGEFErBa)

### **3.4 Hyperparameter tuning**


**1. Hyperparameter tuning**

You now know everything you need to perform hyperparameter tuning in neural networks! Our aim is to identify those parameters that make our model generalize better.

**2. Neural network hyperparameters**

A neural network is full of parameters that can be tweaked: the number of layers, neurons per layer, the order of such layers, the activation functions, batch sizes, learning rates, optimizers, a lot of things to keep in mind!

![](https://drive.google.com/uc?export=view&id=1OiGj_KWaomPUu-rhwh8iF_3D7pGeP_77)

**3. Sklearn recap**


In sklearn we can perform hyperparameter search by using methods like RandomizedSearchCV. We import RandomizedSearchCV from sklearn model_selection. We instantiate a model, define a dictionary with a series of model parameters to try and finally instantiate a RandomizedSearchCV object passing our model, the parameters and a number of cross-validation folds. We fit it on our data and print the best resulting combination of parameters. For this example, a min_samples_leaf of 1, 3 max_features and a max_depth of 3 gave us the best results.

![](
https://drive.google.com/uc?export=view&id=1u2NoO11coOl9KKs35ElRb4lm5wyXFYft)

**4. Turn a Keras model into a Sklearn estimator**

We can do the same with our Keras models! But we first have to transform them into sklearn estimators. We do this by first defining a function that creates our model. Then we import the KerasClassifier wrapper from Keras sci-kit learn wrappers. We finish by simply instantiating a KerasClassifier object passing create_model as the building function, other parameters like epochs and batch_size are optional but should be passed if we want to specify them.

![](https://drive.google.com/uc?export=view&id=19EBwT4ngffw_mtbhOlYYqJB8YGy6Vcnb)

**5. Cross-validation**

This is very cool! Our model is now just like any other sklearn estimator, so we can, for instance, perform cross-validation on it to see the stability of its predictions across folds. Import cross_val_score, passing in our recently converted Keras model, predictors, labels, and the number of folds. We can then check the mean accuracy per fold or the standard deviation. Note that 6 epochs and a batch_size of 16 were used since we specified it before.

![](https://drive.google.com/uc?export=view&id=1Z3ZH72Ovv4Z0s42X7CH4nQPiJxPFxSSR)

**6. Tips for neural networks hyperparameter tuning**

It's much more probable that a good combination of parameters will be found by using random search instead of an exhaustive grid search. Grid search loops over all possible combinations of parameters whilst random search tries a given number of random combinations. Normally not many epochs are needed to check how well your model is performing, using a smaller representative sample of your dataset makes things faster if you've got a huge dataset. It's easier to play with things like optimizers, batch_sizes, activations, and learning rates.

![](https://drive.google.com/uc?export=view&id=11hLe-7OSDjRSs1-DjdjWz7skMGJ1jK-a)

**7. Random search on Keras models**

To perform randomized search on a Keras model we just need to define the parameters to try. We can try different optimizers, activation functions for the hidden layers and batch sizes. The keys in the parameter dictionary must be named exactly as the parameters in our create_model function. We then instantiate a RandomizedSearchCV object passing our model and parameters with 3 fold cross-validation. We end up fitting our random_search object to obtain the results. We can print the best score and the parameters that were used. We get an accuracy of 94% with the adam optimizer, 3 epochs, a batch_size of 10 and relu activation.

![](https://drive.google.com/uc?export=view&id=1l7UN2iEeprdC_GXA4qTx6r6yhrhWpVC8)

8. Tuning other hyperparameters
Parameters like the number of neurons per layer and the number of layers can also be tuned using the same method. We just need to make some smart changes in our create model function. The nl parameter determines the number of hidden layers and nn the number of neurons in these layers, we can have a loop inside our function and add to our sequential model as many layers as provided in nl with the given number of neurons.

**9. Tuning other hyperparameters**

Then we just need to use the exact same names in the parameter dictionary as we have in our function and repeat the process. The best result is 87% accuracy with 2 hidden layers of 128 neurons each.

![](https://drive.google.com/uc?export=view&id=1FTrA8kMO0WbeGSljMOaPVa-xxbasG3cG)

![](https://drive.google.com/uc?export=view&id=1rg7r5Mg_ZVswzywVw00zp7QzMEX1PNrW)

10. Let's tune some networks!
Let's practice!

#### **3.4.1 Preparing a model for tuning**


Let's tune the hyperparameters of a binary classification model that does well classifying the breast cancer dataset.

You've seen that the first step to turn a model into a sklearn estimator is to build a function that creates it. The definition of this function is important since hyperparameter tuning is carried out by varying the arguments your function receives.

Build a simple create_model() function that receives both a learning rate and an activation function as arguments. The Adam optimizer has been imported as an object from keras.optimizers so that you can also change its learning rate parameter.

**Instruction**

- Set the learning rate of the Adam optimizer object to the one passed in the arguments.
- Set the hidden layers activations to the one passed in the arguments.
- Pass the optimizer and the binary cross-entropy loss to the .compile() method.

**Hints**

- The optimizer is stored in the variable opt.
- The loss function needs to be written as a string: 'binary_crossentropy'.

![](https://drive.google.com/uc?export=view&id=1lK_K0dlwoX6a9fd0Mc6T6-K8ohTs2_kA)

#### **3.4.2 Tuning the model parameters**


It's time to try out different parameters on your model and see how well it performs!

The create_model() function you built in the previous exercise is ready for you to use.

Since fitting the RandomizedSearchCV object would take too long, the results you'd get are printed in the show_results() function. You could try random_search.fit(X,y) in the console yourself to check it does work after you have built everything else, but you will probably timeout the exercise (so copy your code first if you try this or you can lose your progress!).

You don't need to use the optional epochs and batch_size parameters when building your KerasClassifier object since you are passing them as params to the random search and this works already.

**instruction**

- Import KerasClassifier from keras scikit_learn wrappers.
- Use your create_model function when instantiating your KerasClassifier.
- Set 'relu' and 'tanh' as activation, 32, 128, and 256 as batch_size, 50, 100, and 200 epochs, and learning_rate of 0.1, 0.01, and 0.001.
- Pass your converted model and the chosen params as you build your RandomizedSearchCV object.

**Hints**

Activation functions are defined as a list of strings whilst batch_size, epochs and learning rates are defined as lists of numbers.

![](https://drive.google.com/uc?export=view&id=1U707fEaf6iipZSrtcQEcBrtTdbsnhIWG)

#### **3.4.3-Training with cross-validation**


Time to train your model with the best parameters found: 0.001 for the learning rate, 50 epochs, a 128 batch_size and relu activations.

The create_model() function from the previous exercise is ready for you to use. X and y are loaded as features and labels.

Use the best values found for your model when creating your KerasClassifier object so that they are used when performing cross_validation.

End this chapter by training an awesome tuned model on the breast cancer dataset!

**Instruction** 

- Import KerasClassifier from keras scikit_learn wrappers.
- Create a KerasClassifier object providing the best parameters found.
- Pass your model, features and labels to cross_val_score to perform cross-validation with 3 folds.

**Hints**


- KerasClassifier is imported from keras.wrappers.scikit_learn.
- You can find the best parameters for your model listed in the context, these come from the previous exercise findings.

![](https://drive.google.com/uc?export=view&id=1lpNppJeya2sd2F9JEa6icImxxucEpdjK)

# **4-Advanced Model Architectures**

## **4.1 Tensors, layers, and autoencoders**


**1. Tensors, layers and autoencoders**

Now that you know how to tune your models,it's time to better understand how they work and learn about new neural network architectures.

**2. Accessing Keras layers**

Model layers are easily accessible, we just need to call layers on a built model and access the index of the layer we want. From a chosen layer we can print its inputs, outputs, and weights. You can see inputs and outputs are tensors of a given shape built with TensorFlow tensor objects, weights are TensorFlow variable objects, which are just tensors that change their value as the neural network learns the best weights.

![](https://drive.google.com/uc?export=view&id=1BswUPE1R0eL-AmluhlkYnZimoR2Ffoak)

**3. What are tensors?**

Tensors are the main data structures used in deep learning, inputs, outputs, and transformations in neural networks are all represented using tensors. A tensor is a multi-dimensional array of numbers. A 2 dimensional tensor is a matrix, a 3 dimensional tensor is an array of matrices.

![](https://drive.google.com/uc?export=view&id=1UhLwscTCERhuX38X2vt06arVYYE3T7E3)

**4. Keras backend**

If we import the Keras backend we can build a function that takes in an input tensor from a given layer and returns an output tensor from another or the same layer. Tensorflow is the backend Keras is using in this course, but it could be any other, like Theano. To define the function with our backend K we need to give it a list of inputs and outputs, even if we just want 1 input and 1 output. Then we can use it on a tensor with the same shape as the input layer given during its definition. If the weights of the layers between our input and outputs change the function output for the same input will change as well. We can use this to see the output of certain layers as weights change during training, we will check this in the exercises!

![](
https://drive.google.com/uc?export=view&id=1R5vOe8GipLh5vTxFKhz_dnUN2mihP9sM)

**5. Introducing autoencoders**

It's time to introduce a new architecture.

**6. Autoencoders!**

Autoencoders! Autoencoders are models that aim at producing the same inputs as outputs.

![](https://drive.google.com/uc?export=view&id=1dFT9PCo2pVEVWULu-dPGZHwTbpn6HIcT)

**7. Autoencoders!**

This task alone wouldn't be very useful, but since along the way we decrease the number of neurons, we are effectively making our network learn to compress its inputs into a small set of neurons.

![](https://drive.google.com/uc?export=view&id=1oFKTioPRFg3BiWBnlhKr-lMSOs1d5xqw)

**8. Autoencoder use cases**

This makes autoencoders useful for things like: Dimensionality reduction, since we can obtain a smaller dimensional space representation of our inputs. De-noising, if trained with clear data and then fed with noisy data they will be able to decode back a good representation of the input data without noise. Anomaly detection, if you train an autoencoder to map inputs to outputs with data but you then pass in strange values, the network will fail at giving accurate output values. Many other applications can also benefit from this architecture.

![](https://drive.google.com/uc?export=view&id=1Hq82H3mWuKLuqqNjKXUNwpr-t13E5EzF)

**9. Building a simple autoencoder**

To make an autoencoder that maps a hundred inputs to a hundred outputs, encoding the inputs into a layer of 4 neurons, we would do the following: Instantiate a sequential model, add a dense layer of 4 neurons with an input_shape of a hundred and end with an output layer of 100 neurons. We use activation sigmoid because we assume that our output can take a value between 0 and 1, we end compiling our model with adam optimizer and binary_crossentropy loss since we used sigmoid.

![](https://drive.google.com/uc?export=view&id=12c7TASVU5DPUN9HJJtzrZHlsayergieD)

**10. Breaking it into an encoder**

Once you've built and trained your autoencoder you might want to encode your inputs. To do this, you just have to build a new model and add the first layer of your previously trained autoencoder. This new model predictions returns the 4 numbers given by the 4 neurons of the hidden layer for each observation in the input dataset.

![](https://drive.google.com/uc?export=view&id=1FJKBvAqGGRG8m-LgJEpvP4d_vnT9s2ai)

### **4.1.1 It's a flow of tensors**

If you have already built a model, you can use the model.layers and the keras.backend to build functions that, provided with a valid input tensor, return the corresponding output tensor.

This is a useful tool when we want to obtain the output of a network at an intermediate layer.

For instance, if you get the input and output from the first layer of a network, you can build an inp_to_out function that returns the result of carrying out forward propagation through only the first layer for a given input tensor.

So that's what you're going to do right now!

X_test from the Banknote Authentication dataset and its model are preloaded. Type model.summary() in the console to check it.

**Instruction**

- Import keras.backend as K.
- Use the model.layers list to get a reference to the input and output of the first layer.
- Use K.function() to define a function that maps inp to out.
- Print the results of passing X_test through the 1st layer.

**Hints**

- Get the first layer of your model with model.layers[0].
Don't forget that K.function() expects two lists as arguments, one for the inputs and another one for the outputs.

![](https://drive.google.com/uc?export=view&id=1J8Ti2qQZ3MEI-Fn9MPI0AB-JpBG6nLUG)

### **4..1.2-Neural separation**


Put on your gloves because you're going to perform brain surgery!

Neurons learn by updating their weights to output values that help them better distinguish between the different output classes in your dataset. You will make use of the inp_to_out() function you just built to visualize the output of two neurons in the first layer of the Banknote Authentication model as it learns.

The model you built in chapter 2 is ready for you to use, just like X_test and y_test. Paste show_code(plot) in the console if you want to check plot().

You're performing heavy duty, once all is done, click through the graphs to watch the separation live!

**Instruction** 

- Use the previously defined inp_to_out() function to get the outputs of the first layer when fed with X_test.
- Use the model.evaluate() method to obtain the validation accuracy for the test dataset at each epoch.

**Hints**

- Use inp_to_out() passing X_test as argument.
- The model.evaluate() method receives both X_test and y_test as arguments .

![](https://drive.google.com/uc?export=view&id=1n5kDeeVotjSxwW0nsORtWkjd2TDLwGHH)

### **4.1.3- Building an autoencoder**


Autoencoders have several interesting applications like anomaly detection or image denoising. They aim at producing an output identical to its inputs. The input will be compressed into a lower dimensional space, encoded. The model then learns to decode it back to its original form.

You will encode and decode the MNIST dataset of handwritten digits, the hidden layer will encode a 32-dimensional representation of the image, which originally consists of 784 pixels (28 x 28). The autoencoder will essentially learn to turn the 784 pixels original image into a compressed 32 pixels image and learn how to use that encoded representation to bring back the original 784 pixels image.

The Sequential model and Dense layers are ready for you to use.

Let's build an autoencoder!

**Instruction** 

- Create a Sequential model.
- Add a dense layer with as many neurons as the encoded image dimensions and input_shape the number of pixels in the original image.
- Add a final layer with as many neurons as pixels in the input image.
- Compile your autoencoder using adadelta as an optimizer and binary_crossentropy loss, then summarise it.

**hints**

- The original image has 784 pixels (28x28).
- The encoded image dimensions are 32, this matches the number of neurons in the hidden layer.
- To summarize your model structure remember you can use .summary().

![](https://drive.google.com/uc?export=view&id=1VRkgk4NajmlF-r9Si_Goa5YJrrRG-qFt)

### **4.1.4 De-noising like an autoencoder**


Okay, you have just built an autoencoder model. Let's see how it handles a more challenging task.

First, you will build a model that encodes images, and you will check how different digits are represented with show_encodings(). To build the encoder you will make use of your autoencoder, that has already being trained. You will just use the first half of the network, which contains the input and the bottleneck output. That way, you will obtain a 32 number output which represents the encoded version of the input image.

Then, you will apply your autoencoder to noisy images from MNIST, it should be able to clean the noisy artifacts.

X_test_noise is loaded in your workspace. The digits in this noisy dataset look like this:



**Instruction** 

- Build an encoder model with the first layer of your trained autoencoder model.
- Predict on X_test_noise with your encoder and show the results with show_encodings().

**Hints**

You can access the layers of a model with model.layers, this returns an array of layers.

![](
https://drive.google.com/uc?export=view&id=1C9VESqrPlnIHFsdeiNvYuPjrxrIqkagS)

## **4.2 Intro to CNNs**


**1. Intro to CNNs**

Let’s introduce Convolutional Neural Networks, a specific type of network that has led to a lot of advances in computer vision, as well as in other areas.

**2. How do they work?**

A convolutional model uses convolutional layers. A convolution is a simple mathematical operation that preserves spatial relationships. When applied to images it can detect relevant areas of interest like edges, corners, vertical lines, etc.

![](https://drive.google.com/uc?export=view&id=1Wq7A2b6ajrUENEvO7KOSO2bN7zxpRD6t)

**3. Convolutions demonstration**

It consists of applying a filter also known as kernel of a given size. In this image, we are applying a 3 by 3 kernel. We center the kernel matrix of numbers as we slide through each pixel in the image, multiplying the kernel and pixel values at each location and averaging the sum of values obtained. This effectively computes a new image where certain characteristics are amplified depending on the filter used. The secret sauce of CNNs resides in letting the network itself find the best filter values and to combine them to achieve a given task.

![](https://drive.google.com/uc?export=view&id=1VIqK7VwdiIYI8QckJPBv31fbJSwlRWoP)

**4. Typical architectures**

For a classification problem with many possible classes, CNNs tend to become very deep. Architectures consist of concatenations of convolutional layers among other layers known as pooling layers, that we won't cover here. Convolutional layers perform feature learning, we then flatten the outputs into a unidimensional vector and pass it to fully connected layers that carry out classification.

![](https://drive.google.com/uc?export=view&id=1Q89_bt1I7Mjuf2mxOfSjag4ispCNTK9o)

**5. Input shape to convolutional neural networks**

Images are 3D tensors, they have width, height, and depth. This depth is given by the color channels. If we use black and white images we will just have one channel, so the depth will be 1.

![](https://drive.google.com/uc?export=view&id=1RNvZHsQqqgkOf82ZeW81Sd6zL6CTe2Qv)

**6. How to build a simple convolutional net in keras?**

To build a CNN in Keras we first import the Conv2D and Flatten layers from keras.layers. We instantiate our model and add a convolutional layer. This first convolutional layer has 32 filters, this means it will learn 32 different convolutional masks. These masks will be squares of 3 by 3 as defined in the kernel_size. For 28 times 28 black and white images with only one channel, we use an input shape of (28, 28, 1). We can use any activation, as usual. We then add another convolutional layer and end flattening this 2D layer into a unidimensional layer with the Flatten layer. We finish with an output dense layer.

![](https://drive.google.com/uc?export=view&id=1aB7hnK-jesVi3a6Agkk2jpqk80M2uvw_)

**7. Deep convolutional models**

ResNet50 is a 50 layer-deep model that performs well on the Imagenet Dataset, a huge dataset of more than 14 million images. ResNet50 can distinguish between 1000 different classes. This model would take too long to train on a regular computer, but Keras makes it easy for us to use it. We just need to prepare the image we want to classify for the model, predict the processed image, and decode the predictions!

![](https://drive.google.com/uc?export=view&id=1SIvSQLWgunXZO0QbMQm6Zp9GQpEd61IS)

**8. Pre-processing images for ResNet50**

To use pre-trained models to classify images, we first have to adapt these images so that they can be understood by the model. To prepare images for ResNet50 we would do the following. First import the image from keras.preprocessing and preprocess_input from keras.applications.resnet50. We then load our image with load_img, providing the target size, for this particular model that is 224 by 224. We turn the image into a numpy array with img_to_array, we expand the dimensions of the array and preprocess the input in the same way the training images were.

![](
https://drive.google.com/uc?export=view&id=1cd2J5NTaIFYmbtSKyRy6l_wnQyqel8TG)

**9. Using the ResNet50 model in Keras**

We import ResNet50 and decode_predictions,load the model with Imagenet pre_trained weights,predict on our image,and decode the predictions. That is, getting the predicted classes with the highest probabilities.

![](https://drive.google.com/uc?export=view&id=12YynNzF32uoDvhTQoo6VBxBJZ1Nqe3bq)

**10. What is going on inside a convnet?**

Inside a CNN we can check how the different filters activate in response to an input image. We will explore this in the exercises!

**11. Let's experiment!**

Let's experiment with convolutional networks

#**References**
[[1] Introduction to Deep Learning with Keras](https://learn.datacamp.com/courses/introduction-to-deep-learning-with-keras)

[[2]Machine-Learning-Scientist-with-Python-by-DataCamp](https://github.com/abdelrahmaan/Machine-Learning-Scientist-with-Python-by-DataCamp)