<a href="https://colab.research.google.com/github/hussain0048/Deep-Learning-with-Keras/blob/master/Deep_Learning_with_Keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**1-Introduction to Deep Learning with Keras**[1]

##**1.1 Introducing Keras**


### **1.1.1 What is Keras?**

This course will add Keras as a powerful tool to your arsenal. Keras is a high level deep learning framework, to understand what it's meant by that we can compare it to a lower level framework like Theano.


**2. Theano vs Keras**

Building a neural network in Theano can take many lines of codes and requires a deep understanding of how they work internally. Building and training this very same network in Keras only takes a few lines of code. Much quicker,right?

**3. Keras**

Keras is an open source deep learning library that enables fast experimentation with neural networks. It runs on top of other frameworks like Tensorflow, Theano or CNTK. And it was created by French AI researcher François Chollet.

**4. Why use Keras?**

Why use Keras instead of other low-level libraries like TensorFlow? With Keras you can build industry-ready models in no time, with much less code than Theano, as we saw before, and a higher abstraction than that offered by TensorFlow. This allows for quickly and easily checking if a neural network will get your problems solved. In addition you can build any architecture you can imagine, from simple networks to more complex ones like auto-encoders, convolutional or recurrent neural networks. Keras models can also be deployed across a wide range of platforms like Android, iOS, web-apps, etc.

**5. Keras + TensorFlow**

It's the best moment to be learning Keras. Keras is now fully integrated into TensorFlow 2.0, so you can use the best of both worlds as needed and in the same code pipeline. If as you dive into deep learning you find yourself needing to use low-level features, for instance to have a finer control of how your network applies gradients, you could use TensorFlow and tweak whatever you need. Now that you know better what Keras is and why to use it, perhaps we should discuss when and why to use neural networks in the first place.

**6. Feature Engineering**

Neural networks are good feature extractors, since they learn the best way to make sense of unstructured data. Previously, it was the domain expert that had to set rules based on experimentation and heuristics to extract the relevant features of data. Neural networks can learn the best features and their combination, they can perform feature engineering themselves. That's why they are so useful. But what is unstructured data?

**7. Unstructured data**

Unstructured data is data that is not easily put into a table. For instance, sound, Video, images, etc. It's also the type of data where performing feature engineering can be more challenging, that's why leaving this task to neural networks is a good idea.

**8. So, when to use neural networks?**

If you are dealing with unstructured data, you don't need to interpret the results and your problem can benefit from a known architecture, then you probably should use neural networks. For instance, when classifying images of cats and dogs: Images are unstructured data, we don't care as much about why the network knows it's a cat or a dog, and we can benefit from convolutional neural networks. So it's wise to use neural networks. You will learn about the usefulness of convolutional neural networks later on in the course.

###**1.1.2  Your first neural network**

**2. A neural network?**

A neural network is a machine learning algorithm with the training data being the input to the input layer and the predicted value the value at the output layer.

![](
https://drive.google.com/uc?export=view&id=1vuNIxT1t1hCvufkkykiXz0qWzuf-Q6s_)


**3. Parameters**

Each connection from one neuron to another has an associated weight w. Each neuron, except the input layer which just holds the input value, also has an extra weight we call the bias weight, b. During feed-forward our input gets transformed by weight multiplications and additions at each layer, the output of each neuron can also get transformed by the application of what's called an activation function.

![](https://drive.google.com/uc?export=view&id=1oKV0pBpn3rKbpV_Anq4lOEjlb8psug00)

**4. Gradient descent**

Learning in neural networks consists of tuning the weights or parameters to give the desired output. One way of achieving this is by using the famous gradient descent algorithm, and applying weight updates incrementally via a process known as back-propagation. That was a lot of theory! The code in Keras is much simpler as we will see now.
![](https://drive.google.com/uc?export=view&id=10Gk8kB7JB5zASSu5irmBZUrBsgb0L03Z)


**5. The sequential API**

Keras allows you to build models in two different ways; using either the Functional API or the Sequential API. We will focus on the Sequential API. This is a simple, yet very powerful way of building neural networks that will get you covered for most use cases. With the sequential API you're essentially building a model as a stack of layers. You can start with an input layer.

![](
https://drive.google.com/uc?export=view&id=1EOGcX5oGwa2CI4A1Xt86Ap27oeA19DVO)

**6. The sequential API**

Add a couple of hidden layers


**7. The sequential API**

And finally end your model by adding an output layer. Let's go through a code example

**8. Defining a neural network**

To create a simple neural network we'd do the following: Import the Sequential model from Keras.models. Import a Dense layer, also known as fully connected layer, from Keras.layers. We can then create an instance of a Sequential model.In this next line of code we add two layers; a 2 neuron Dense fully connected layer, and an input layer consisting of 3 neurons. The input layer is defined with the input_shape parameter. This first layer matches the dimensions of our input data.We finally add another fully connected layer, this time with 1 neuron. We've built the network to the right.

![](
https://drive.google.com/uc?export=view&id=1DhHRl7HJTi2DMFPAufbxzWeMLGXe02a7)

**9. Adding activations**

In order to add an activation function to our layers we can make use of the activation argument.

**10. Adding activations**

For instance, this is how we'd add a ReLU activation to our hidden layer. Don't worry about the choice of activation functions,that will be covered later on in the course.

**11. Summarize your model!**

Once we've created our model we can call the summary() method on it. This displays a table with 3 columns: The first with the layers name and type, the second with the shape of the outputs produced by each layer and the third containing the number of parameters, those are the weights including the bias weight of each neuron in the layer. When the input layer is defined via the input_shape parameter, as we did before, it is not shown as a layer in the summary but it's included in the layer where it was defined, in this case the dense_3 layer.

![](https://drive.google.com/uc?export=view&id=1KYJqGrtAE33RRvqAgn-RnbsXFa2xQSpX)

**2. Visualize parameters**

That's why we see that this layer has 8 parameters: 6 parameters or weights come from the connections of the 3 input neurons to the 2 neurons in this layer, the missing 2 parameters come from the bias weights, b0 and b1, 1 per each neuron in the hidden layer.

![](https://drive.google.com/uc?export=view&id=1QXwcJ72KfKBlMMgDNHG24elHfAAQRPv7)

**13. Visualize parameters**

These add up to 8 different parameters.

#### **1.2.2.1 Hello nets!**
You're going to build a simple neural network to get a feeling of how quickly it is to accomplish this in Keras.

You will build a network that takes two numbers as an input, passes them through a hidden layer of 10 neurons, and finally outputs a single non-constrained number.

A non-constrained output can be obtained by avoiding setting an activation function in the output layer. This is useful for problems like regression, when we want our output to be able to take any non-constrained value.


**Instruction**

- Import the Sequential model from keras.models and the Denselayer from keras.layers.
- Create an instance of the Sequential model.
- Add a 10-neuron hidden Dense layer with an input_shape of two neurons.
- Add a final 1-neuron output layer and summarize your model with summary().

In [None]:
# Import the Sequential model and Dense layer
from keras.models import Sequential
from keras.layers import Dense

# Create a Sequential model
model = Sequential()

# Add an input layer and a hidden layer with 10 neurons
model.add(Dense(10, input_shape=(2,), activation="relu"))

# Add a 1-neuron output layer
model.add(Dense(1))

# Summarise your model
model.summary()

![](
https://drive.google.com/uc?export=view&id=1KYJqGrtAE33RRvqAgn-RnbsXFa2xQSpX)

#### **1.2.2.2 Counting parameters**

You've just created a neural network. But you're going to create a new one now, taking some time to think about the weights of each layer. The Keras Dense layer and the Sequential model are already loaded for you to use.

This is the network you will be creating:

**Instruction 1/2** 

- Instantiate a new Sequential() model.
- Add a Dense() layer with five neurons and three neurons as input.
- Add a final dense layer with one neuron and no activation.

**Hint**
- The input_shape is a tuple of the form (input_size,).
- Your dense layers should be enclosed between the model.add() parentheses.



In [None]:
# Instantiate a new Sequential model
model = Sequential()

# Add a Dense layer with five neurons and three inputs
model.add(Dense(5, input_shape=(3,), activation="relu"))

# Add a final Dense layer with one neuron and no activation
model.add(Dense(1))

# Summarize your model
model.summary()

![](
https://drive.google.com/uc?export=view&id=1v2Ql6jRe8604-ZWRFmF3GKvECZ4mnhmA)

#### **1.2.2.3 Build as shown!**

You will take on a final challenge before moving on to the next lesson. Build the network shown in the picture below. Prove your mastered Keras basics in no time!

**Instruction **

- Instantiate a Sequential model.
- Build the input and hidden layer.
- Add the output layer.


In [None]:
from keras.models import Sequential
from keras.layers import Dense

# Instantiate a Sequential model
model = Sequential()

# Build the input and hidden layer
model.add(Dense(3, input_shape=(2,), activation='relu'))

# Add the ouput layer
model.add(Dense(1))

Perfect! You've shown you can already translate a visual representation of a neural network into Keras code. Let's keep going!

### **1.1.3-Surviving a meteor strike**

**1. Surviving a meteor strike**

Welcome back! We are going to learn just what you're missing to be able to save the earth from a meteor strike. But first, let's see how to compile, train and predict with your models!

**2. Recap**

In the previous lesson we saw how easy it was to create a model with Keras. You instantiate your Sequential model,add a couple of layers and their activations, and that's it, you've built a simple model in no time.

**3. Compiling**

A model needs to be compiled before training. We can compile our model by calling the compile method on it. The compile method receives an optimizer, which we can see as the algorithm that will be used to update our neural network weights, and a loss function, which is the function we want to minimize during training. In this case, we choose adam as our optimizer and mean squared error as our loss function. Optimizers and loss functions will be covered later on in the course, so don't worry about it for now. Compiling our model produces no output. Our model is now ready to train!


![](https://drive.google.com/uc?export=view&id=1Ai59-e0if_bvOZ9XNiXCHIQXJDJdzqWk)

**4. Training**

Creating a model is useless if we don't train it. We train our model by calling the fit method and passing the features in X_train,the labels in y_train and the number of epochs to train for. An epoch corresponds to our entire training data passing through the network once and the respective weight updates during back-propagation. As our model is being trained, we will get some output showing the progress. We can see the model is improving since the mean squared error loss is decreasing at each epoch.

In [None]:
# Train your model
model.fit(X_train, y_train, epochs=5)

**5. Predicting**

To obtain predictions from our trained model we just need to call predict on the new set of data. We can store the predictions in a variable for later use. The predictions are just numbers in a numpy array, we will interpret these depending on our dataset and problem at hand.

In [None]:
#Predict on new data
preds=models.predict(X_test)
# look at the predictions
print(preds)

**6. Evaluating**

To quickly evaluate how well our model performs on unseen data we can use the model's evaluate method. This performs feed-forward with all samples in our test dataset (X_test). Feed-forward consists in computing a model's outputs from a given set of inputs. It then computes the error comparing the results to the true values stored in y_test. In this particular example, the model we trained for 5 epochs before, has a mean squared error of 0.25.

In [None]:
# evaluate your result
model.evaluate(X_test,y_test)

**7. The problem at hand**

Are you ready?! A meteor is approaching the earth and we want to make sure it won't take us to extinction. A group of scientists is trying to estimate the orbit by using historical data gathered about previous orbits of similar meteors.

![](
https://drive.google.com/uc?export=view&id=1asHHOzX2uyr5dCmn0PGwEu1m8Q7eY1Pd)


**8. Scientific prediction **

Scientist have used this data alongside their knowledge to estimate an 80-minute orbit, that is, an orbit from -40 minutes to +40 minutes. t=0 corresponds to the time of crossing the impact region. It looks like the meteor will be close! Perhaps it won't hit us, but we must make sure we are right!

**9. Your task**

You have data for the path a previous meteor took during a period of 20 minutes, 10 minutes before and 10 minutes after crossing the impact region. You will train a model on this data and extrapolate your predictions to an 80-minute orbit to see how it compares to the scientists prediction. Will your orbit be similar to that of the scientists, where we don't die just by a small bit?
![](
https://drive.google.com/uc?export=view&id=1u9vRuASe1NlnsMmXeJScKLySxSzYztz5)

![](https://drive.google.com/uc?export=view&id=1WNeqkyKSxwa3Hc1TArGvjxnou_v8eTiP
)

#### **1.1.3.1 Specifying a model**

You will build a simple regression model to predict the orbit of the meteor!

Your training data consist of measurements taken at time steps from -10 minutes before the impact region to +10 minutes after. Each time step can be viewed as an X coordinate in our graph, which has an associated position Y for the meteor orbit at that time step.

Note that you can view this problem as approximating a quadratic function via the use of neural networks.

This data is stored in two numpy arrays: one called time_steps , what we call features, and another called y_positions, with the labels. Go on and build your model! It should be able to predict the y positions for the meteor orbit at future time steps.

Keras Sequential model and Dense layers are available for you to use.


**Instruction**
- Instantiate a Sequential model.
- Add a Dense layer of 50 neurons with an input shape of 1 neuron.
- Add two Dense layers of 50 neurons each and 'relu' activation.
- End your model with a Dense layer with a single neuron and no activation.

In [None]:
# Instantiate a Sequential model
model = Sequential()
# Add a Dense layer with 50 neurons and an input of 1 neuron
model.add(Dense(50, input_shape=(1,), activation='relu'))
# Add two Dense layers with 50 neurons and relu activation
model.add(Dense(50,activation='relu'))
model.add(Dense(50,activation='relu'))
# End your model with a Dense layer and no activation
model.add(Dense(1))

You are closer to forecasting the meteor orbit! It's important to note we aren't using an activation function in our output layer since y_positions aren't bounded and they can take any value. Your model is built to perform a regression task.

#### **1.1.3.2 Training**

You're going to train your first model in this course, and for a good cause!

Remember that before training your Keras models you need to compile them. This can be done with the .compile() method. The .compile() method takes arguments such as the optimizer, used for weight updating, and the loss function, which is what we want to minimize. Training your model is as easy as calling the .fit() method, passing on the features, labels and a number of epochs to train for.

The regression model you built in the previous exercise is loaded for you to use, along with the time_steps and y_positions data. Train it and evaluate it on this very same data, let's see if your model can learn the meteor's trajectory.

**Instruction**

- Compile your model making use of the 'adam' optimizer and 'mse' as your loss function.
- Fit your model using the features and labels for 30 epochs.
- Evaluate your model with the .evaluate() method, passing the features and labels used during training.

In [None]:
# Compile your model
model.compile(optimizer = 'adam', loss = 'mse')
print("Training started..., this can take a while:")
# Fit your model on your data for 30 epochs
model.fit(time_steps,y_positions, epochs = 30)
# Evaluate your model 
print("Final loss value:",model.evaluate(time_steps, y_positions))

Final loss value: 0.11601032888400369
Amazing! You can check the console to see how the loss function decreased as epochs went by. Your model is now ready to make predictions on unseen data.

#### **1.1.3.3 Predicting the orbit!**

You've already trained a model that approximates the orbit of the meteor approaching Earth and it's loaded for you to use.

Since you trained your model for values between -10 and 10 minutes, your model hasn't yet seen any other values for different time steps. You will now visualize how your model behaves on unseen data.

If you want to check the source code of plot_orbit, paste show_code(plot_orbit) into the console.

Hurry up, the Earth is running out of time!

Remember np.arange(x,y) produces a range of values from x to y-1. That is the [x, y) interval.

Use the model's .predict() method to predict from -10 to 10 minutes.

In [None]:
# Predict the twenty minutes orbit
twenty_min_orbit = model.predict(np.arange(-10, 11))

# Plot the twenty minute orbit 
plot_orbit(twenty_min_orbit)

![](https://drive.google.com/uc?export=view&id=1yrUjhuvgdxLMap8SNeKavX6JIfp5qn1H)

In [None]:
# Predict the eighty minute orbit
eighty_min_orbit = model.predict(np.arange(-40, 41))

# Plot the eighty minute orbit 
plot_orbit(eighty_min_orbit)

Your model fits perfectly to the scientists trajectory for time values between -10 to +10, the region where the meteor crosses the impact region, so we won't be hit! However, it starts to diverge when predicting for new values we haven't trained for. This shows neural networks learn according to the data they are fed with. Data quality and diversity are very important. You've barely scratched the surface of what neural networks can do. Are you prepared for the next chapter?

![](
https://drive.google.com/uc?export=view&id=1rnb0PHvr-zHKO1J-wf2A0Bra5TztaPyT)

## **1.2 Going Deeper**


### **1.2.1 Binary classification**


**1. Binary classification**

You're now ready to learn about binary classification, so let's dive in.

**2. When to use binary classification?**

You will use binary classification when you want to solve problems where you predict whether an observation belongs to one of two possible classes. A simple binary classification problem could be learning the boundaries to separate blue from red circles as shown in the image.

![](https://drive.google.com/uc?export=view&id=1UYqpsy_0Lq4eIGZRa6idqXjQXyqDqU51)

**3. Our dataset**

The dataset for this problem is very simple. The coordinates are pairs of values corresponding to the X and Y coordinates of each circle in the graph. The labels are 1 for red circles and 0 for the blue circles.

![](https://drive.google.com/uc?export=view&id=1OnE1rovNG_UWwvLM1RJwpT6Redg6nXSN)

**4. Pairplots**

We can make use of seaborn's pairplot function to explore a small dataset and identify whether our classification problem will be easily separable. We can get an intuition for this if we see that the classes separate well-enough along several variables. In this case, for the circles dataset, there is a very clear boundary: the red circles concentrate at the center while the blue are outside. It should be easy for our network to find a way to separate them just based on x and y coordinates.

![](https://drive.google.com/uc?export=view&id=11Qext6FFSD9xD6ObdifRm9GvKzr_WAla)

**7. The NN architecture 3**

Then we have one hidden layer with four neurons. Four is a good enough number to learn the separation of classes in this dataset. This was found by experimentation.

**8. The NN architecture 4**

We finally end up with a single output neuron which makes use of the sigmoid activation function. It's important to note that, regardless of the activation functions used for the previous layers, we need the sigmoid activation function for this last output node.


**9. The sigmoid function**

The sigmoid activation function squashes the neuron output of the second to last layer to a floating point number between 0 and 1.

**10. The sigmoid function**

You can consider the output of the sigmoid function as the probability of a pair of coordinates being in one class or another. So we can set a threshold and say everything below 0.5 will be a blue circle and everything above a red one.

![](https://drive.google.com/uc?export=view&id=16I4Jv7-qkldO9-6zGUN45hOx_MIdH1sW)

![](https://drive.google.com/uc?export=view&id=1q1gmN39GmBNI0K1-zUCHqe2KnZITE2Yx)

**11. Let's build it**

So let's build our model in keras: We start by importing the sequential model and the dense layer. We then instantiate a sequential model.

**12. Let's build it**

We add a hidden layer of 4 neurons and we define an input shape, which consists of 2 neurons. We use the tanh as the activation function, for this hidden layer. Activation functions are covered later in the course, so don't worry about this choice for now.

**13. Let's build it**

We finally add an output layer which contains a single neuron, we make use of the sigmoid activation function so that we achieve the behavior we expect from this network, that is obtaining a value between 0 and 1. Our model is now ready to be trained.

![](https://drive.google.com/uc?export=view&id=1Z-s34OwqTGx__Okz0hBmzhMOTqlYL6Qs)

![](https://drive.google.com/uc?export=view&id=1Cdcj9wln-YAUnBg6Peq5OCEiOJkOdhuV)

#### **1.2.1.1 Exploring dollar bills**

You will practice building classification models in Keras with the Banknote Authentication dataset.

Your goal is to distinguish between real and fake dollar bills. In order to do this, the dataset comes with 4 features: variance,skewness,kurtosis and entropy. These features are calculated by applying mathematical operations over the dollar bill images. The labels are found in the dataframe's class column.

**Instruction**

- Import seaborn as sns.
- Use seaborn's pairplot() on banknotes and set hue to be the name of the column containing the labels.
- Generate descriptive statistics for the banknotes authentication data.
= Count the number of observations per label with .value_counts().

In [None]:
# Import seaborn
import seaborn as sns
# Use pairplot and set the hue to be our class column
sns.pairplot(banknotes, hue='class') 
# Show the plot
plt.show()
# Describe the data
print('Dataset stats: \n', banknotes.descibe())
# Count the number of observations per class
print('Observations per class: \n', banknotes['class'].value_counts())

![](
https://drive.google.com/uc?export=view&id=1B1MpQeZt1Kg82l6mAED9jogBDJFa885J)

#### **1.2.1.2 Is this dollar bill fake ?**

You are now ready to train your model and check how well it performs when classifying new bills! The dataset has already been partitioned into features: X_train & X_test, and labels: y_train & y_test.

**Insruction **

- Train your model for 20 epochs calling .fit(), passing in the training data.
- Check your model accuracy using the .evaluate() method on the test data.
- Print accuracy.


In [None]:
# Train your model for 20 epochs
model.fit(X_train, y_train, epochs = 20)
# Evaluate your model accuracy on the test set
accuracy = model.evaluate(X_test, y_test)[1]
# Print accuracy
print('Accuracy:', accuracy)

Accuracy: 0.8252427167105443


Alright! It looks like you are getting a high accuracy even with this simple model!

### **1.3.2 Multi-class classification**

**1. Multi-class classification**

What about when we have more than two classes to classify? We run into a multi-class classification problem, but don't worry, we just have to tweak our neural network architecture.

**2. Throwing darts**

Identifying who threw which dart in a game of darts is a good example of a multi-class classification problem. Each dart can only be thrown by one competitor. That means our classes are mutually exclusive, no dart can be thrown by two different competitors simultaneously.

**3. The dataset**

The darts dataset consist of dart throws by different competitors. The coordinate pairs xCoord and yCoord show where each dart landed.

**4. The dataset**

Based on the landing position of previously thrown darts we should be able to distinguish between throwers if there's enough variation among them. In our pairplot we can see players tend to aim at certain regions of the board.

![](https://drive.google.com/uc?export=view&id=19F566c9S2ANiBtqcoNjwwkfFJ5u_RPC4)

![](https://drive.google.com/uc?export=view&id=1BZs1-YxO9mebsdM9i_algsc2NCie2zpb)

**5. The architecture**

The model for this dataset has two neurons as inputs,since our predictors are xCoord and yCoord. We will define them using the input_shape argument, just as we've done before.

**6. The architecture**

In between there will be a series of hidden layers, we are using 3 Dense layers of 128, 64 and 32 neurons each.

**7. The architecture**

As outputs we have 4 neurons, one per competitor. Let's look closer at the output layer.

![](https://drive.google.com/uc?export=view&id=1eHha7Z85p0IgGhyDCR4j0BHTdiqeHLX3)

**8. The output layer**

We have 4 outputs, each linked to a possible competitor. Each competitor has a probability of having thrown a given dart, so we must make sure the total sum of probabilities for the output neurons equals one. We achieve this with the softmax activation function. Once we have a probability per output neuron we then choose as our prediction the competitor whose associated output has the highest probability.

![](https://drive.google.com/uc?export=view&id=1_hail1aH8YBmkvE2QkTUVcrEhchRtU81)

**9. Multi-class model**

You can build this model as we did in the previous lesson; instantiate a sequential model, add a hidden layer, also defining an input layer with the input_shape parameter,and finish by adding the remaining hidden layers and an output layer with softmax activation. You will do all this yourself in the exercises.

![](https://drive.google.com/uc?export=view&id=1RYWoEBIo6X1bw-BmbbLb7Be-njP8RIOF
)

**10. Categorical cross-entropy**

When compiling your model, instead of binary cross-entropy as we used before, we now use categorical cross-entropy or log loss. Categorical cross-entropy measures the difference between the predicted probabilities and the true label of the class we should have predicted. So if we should have predicted 1 for a given class, taking a look at the graph we see we would get high loss values for predicting close to 0 (since we'd be very wrong) and low loss values for predicting closer to 1 (the true label).

![](https://drive.google.com/uc?export=view&id=1DlfFP_A0DIJpgWMnhdFhqd-lEcdvEp5z
)

**11. Preparing a dataset**

Since our outputs are vectors containing the probabilities of each class, our neural network must also be trained with vectors representing this concept. To achieve that we make use of the keras.utils to_categorical function. We first turn our response variable into a categorical variable with pandas Categorical, this allows us to redefine the column using the categorical codes (cat codes) of the different categories. Now that our categories are each represented by a unique integer, we can use the to_categorical function to turn them into one-hot encoded vectors, where each component is 0 except for the one corresponding to the labeled categories.

![](https://drive.google.com/uc?export=view&id=1jW4UBNYXLNpq_wDL-NlZ-cnZSSWVPp7Z)

**12. One-hot encoding**

Keras to_categorical essentially perform the process described in the picture above. Label encoded Apple,Chicken and Broccoli turn into a vector of 3 components. A 1 is placed to represent the presence of the class and a 0 to indicate its absence.

![](https://drive.google.com/uc?export=view&id=1tjy2FpyNgDXRyz_LFOsQb7ivk74ok4ce)

#### **1.3.1 A multi-class model**

You're going to build a model that predicts who threw which dart only based on where that dart landed! (That is the dart's x and y coordinates on the board.)

This problem is a multi-class classification problem since each dart can only be thrown by one of 4 competitors. So classes/labels are mutually exclusive, and therefore we can build a neuron with as many output as competitors and use the softmax activation function to achieve a total sum of probabilities of 1 over all competitors.

Keras Sequential model and Dense layer are already loaded for you to use.

**instruction**

- Instantiate a Sequential model.
- Add 3 dense layers of 128, 64 and 32 neurons each.
- Add a final dense layer with as many neurons as competitors.
- Compile your model using categorical_crossentropy loss.

#### **1.3.2-Prepare your dataset**

In the console you can check that your labels, darts.competitor are not yet in a format to be understood by your network. They contain the names of the competitors as strings. You will first turn these competitors into unique numbers,then use the to_categorical() function from keras.utils to turn these numbers into their one-hot encoded representation.

This is useful for multi-class classification problems, since there are as many output neurons as classes and for every observation in our dataset we just want one of the neurons to be activated.

The dart's dataset is loaded as darts. Pandas is imported as pd. Let's prepare this dataset!

**instruction 1**

- Use the Categorical() method from pandas to transform the competitor column.
- Assign a number to each competitor using the cat.codes attribute from the competitor column.

In [None]:
# Transform into a categorical variable
darts.competitor = pd.Categorical(darts.competitor)

# Assign a number to each category (label encoding)
darts.competitor = darts.competitor.cat.codes 

# Print the label encoded competitors
print('Label encoded competitors: \n',darts.competitor.head())

![](
https://drive.google.com/uc?export=view&id=1IJLVmZrkt69yjzUbZw97uAAmrGjJFKf-)

**Instructon 2**

- Import to_categorical from keras.utils.
- Apply to_categorical() to your labels.

In [None]:
# Transform into a categorical variable
darts.competitor = pd.Categorical(darts.competitor)

# Assign a number to each category (label encoding)
darts.competitor = darts.competitor.cat.codes 

# Import to_categorical from keras utils module
from keras.utils import to_categorical

coordinates = darts.drop(['competitor'], axis=1)
# Use to_categorical on your labels
competitors = to_categorical(darts.competitor)

# Now print the one-hot encoded labels
print('One-hot encoded competitors: \n',competitors)

![](https://drive.google.com/uc?export=view&id=1sTycp1dzBzwwRg0zl9f8XGSpm5fHY5pj
)

Great! Each competitor is now a vector of length 4, full of zeroes except for the position representing her or himself.

#### **1.3.3 Training on dart throwers**

Your model is now ready, just as your dataset. It's time to train!

The coordinates features and competitors labels you just transformed have been partitioned into coord_train,coord_test and competitors_train,competitors_test.

Your model is also loaded. Feel free to visualize your training data or model.summary() in the console.

Let's find out who threw which dart just by looking at the board!

**instruction**

- rain your model on the training data for 200 epochs.
- Evaluate your model accuracy on the test data.


In [None]:
# Fit your model to the training data for 200 epochs
model.fit(coord_train,competitors_train,epochs=200)

# Evaluate your model accuracy on the test data
accuracy = model.evaluate(coord_test, competitors_test)[1]

# Print accuracy
print('Accuracy:', accuracy)

Your model just trained for 200 epochs! The accuracy on the test set is quite high. How are the predictions looking? Let's find out!


Accuracy: 0.85

#### **1.3.4 Softmax predictions**

Your recently trained model is loaded for you. This model is generalizing well!, that's why you got a high accuracy on the test set.

Since you used the softmax activation function, for every input of 2 coordinates provided to your model there's an output vector of 4 numbers. Each of these numbers encodes the probability of a given dart being thrown by one of the 4 possible competitors.

When computing accuracy with the model's .evaluate() method, your model takes the class with the highest probability as the prediction. np.argmax() can help you do this since it returns the index with the highest value in an array.

Use the collection of test throws stored in coords_small_test and np.argmax()to check this out!


**instruction  1**

- Predict with your model on coords_small_test.
- Print the model predictions.

In [None]:
# Predict on coords_small_test
preds = model.predict(coords_small_test)

# Print preds vs true values
print("{:45} | {}".format('Raw Model Predictions','True labels'))
for i,pred in enumerate(preds):
  print("{} | {}".format(pred,competitors_small_test[i]))

![](https://drive.google.com/uc?export=view&id=1BrNsjSp6G_EMRc9W_LOvCx1qcAywMU0r)

**instruction 2**

- Use np.argmax()to extract the index of the highest probable competitor from each pred vector in preds.

In [None]:
# Predict on coords_small_test
preds = model.predict(coords_small_test)

# Print preds vs true values
print("{:45} | {}".format('Raw Model Predictions','True labels'))
for i,pred in enumerate(preds):
  print("{} | {}".format(pred,competitors_small_test[i]))

# Extract the position of highest probability from each pred vector
preds_chosen = [np.argmax(pred) for pred in preds]

# Print preds vs true values
print("{:10} | {}".format('Rounded Model Predictions','True labels'))
for i,pred in enumerate(preds_chosen):
  print("{:25} | {}".format(pred,competitors_small_test[i]))

![](https://drive.google.com/uc?export=view&id=1i9dek-iLNpynRFXVSpIkAWW2CrJbDVLl)

Well done! As you've seen you can easily interpret the softmax output. This can also help you spot those observations where your network is less certain on which class to predict, since you can see the probability distribution among classes per prediction. Let's learn how to solve new problems with neural networks!

### **1.4 Multi-label classification**

**1. Multi-label classification**

Now that you know how multi-class classification works, we can take a look at multi-label classification. They both deal with predicting classes, but in multi-label classification, a single input can be assigned to more than one class.

**2. Real world examples**

We could use multi-label classification, for instance, to tag a serie’s genres by its plot summary.


**4. Multi-class vs multi-label**

Imagine we had three classes; sun, moon and clouds. In multi-class problems if we took a sample of our observations each individual in the sample will belong to a unique class. However in a multi-label problem each individual in the sample can have all, none or a subset of the available classes. As you can see in the image, multi-label vectors are also one-hot encoded, there's a 1 or a 0 representing the presence or absence of each class.

1 https://gombru.github.io/2018/05/23/cross_entropy_loss/

![](
https://drive.google.com/uc?export=view&id=1ZeFAOFjvQpIWhPQurHG0_cMbOxj3Z8vQ)

**5. The architecture**

Making a multi-label model for this problem is not very different to what you did when building your multi-class model. We first instantiate a sequential model. For the sake of this example, we will assume that to differentiate between these 3 classes, we need just one input and 2 hidden neurons. The biggest changes happen in the output layer and in its activation function. In the output layer, we use as many neurons as possible classes but we use sigmoid activation this time.

![](https://drive.google.com/uc?export=view&id=11wquJNNBo_bLTWGDIzgwotnDqk4wbRVM)

**6. Sigmoid outputs**

We use sigmoid outputs because we no longer care about the sum of probabilities. We want each output neuron to be able to individually take a value between 0 and 1. This can be achieved with the sigmoid activation because it constrains our neuron output in the range 0-1. That's what we did in binary classification, though we only had one output neuron there.

![](https://drive.google.com/uc?export=view&id=1ePx9KweW5gjN4UC0wo2xIiUbrd8sFRos)

#**References**
[[1] Introduction to Deep Learning with Keras](https://learn.datacamp.com/courses/introduction-to-deep-learning-with-keras)

[[2]Machine-Learning-Scientist-with-Python-by-DataCamp](https://github.com/abdelrahmaan/Machine-Learning-Scientist-with-Python-by-DataCamp)