1\. Binary classification
-------------------------

00:00 - 00:04

You're now ready to learn about binary classification, so let's dive in.

2\. When to use binary classification?
--------------------------------------

00:04 - 00:19

You will use binary classification when you want to solve problems where you predict whether an observation belongs to one of two possible classes. A simple binary classification problem could be learning the boundaries to separate blue from red circles as shown in the image.

3\. Our dataset
---------------

00:19 - 00:32

The dataset for this problem is very simple. The coordinates are pairs of values corresponding to the X and Y coordinates of each circle in the graph. The labels are 1 for red circles and 0 for the blue circles.

4\. Pairplots
-------------

00:32 - 01:00

We can make use of seaborn's pairplot function to explore a small dataset and identify whether our classification problem will be easily separable. We can get an intuition for this if we see that the classes separate well-enough along several variables. In this case, for the circles dataset, there is a very clear boundary: the red circles concentrate at the center while the blue are outside. It should be easy for our network to find a way to separate them just based on x and y coordinates.

5\. The NN architecture
-----------------------

01:00 - 01:05

This is the neural network we will build to classify red and blue dots in our graph.

6\. The NN architecture 2
-------------------------

01:05 - 01:13

We have two neurons as an input layer, one for the x coordinate and another for the y coordinate of each of the red and blue circles in the graph.

7\. The NN architecture 3
-------------------------

01:13 - 01:22

Then we have one hidden layer with four neurons. Four is a good enough number to learn the separation of classes in this dataset. This was found by experimentation.

8\. The NN architecture 4
-------------------------

01:22 - 01:36

We finally end up with a single output neuron which makes use of the sigmoid activation function. It's important to note that, regardless of the activation functions used for the previous layers, we do need the sigmoid activation function for this last output node.

9\. The sigmoid function
------------------------

01:36 - 01:44

The sigmoid activation function squashes the neuron output of the second to last layer to a floating point number between 0 and 1.

10\. The sigmoid function
-------------------------

01:44 - 01:57

You can consider the output of the sigmoid function as the probability of a pair of coordinates being in one class or another. So we can set a threshold and say everything below 0.5 will be a blue circle and everything above a red one.

11\. Let's build it
-------------------

01:57 - 02:06

So let's build our model in keras: We start by importing the sequential model and the dense layer. We then instantiate a sequential model. We add a hidden

12\. Let's build it
-------------------

02:06 - 02:21

layer of 4 neurons and we define an input shape, which consists of 2 neurons. We use the tanh as the activation function, for this hidden layer. Activation functions are covered later in the course, so don't worry about this choice for now. We finally add an output layer

13\. Let's build it
-------------------

02:21 - 02:34

which contains a single neuron, we make use of the sigmoid activation function so that we achieve the behavior we expect from this network, that is obtaining a value between 0 and 1. Our model is now ready to be trained.

14\. Compiling, training, predicting
------------------------------------

02:34 - 03:01

Just as before, we need to compile our model before training. We will use stochastic gradient descent as an optimizer and binary cross-entropy as our loss function. Binary cross-entropy is the function we use when our output neuron is using sigmoid as its activation function. We train our model for 20 epochs passing our coordinates and labels as parameters. Then, we obtain the predicted labels by calling predict on coordinates.

15\. Results
------------

03:01 - 03:07

These are boundaries that were learned to classify our circles. It looks like our model did pretty well!

16\. Let's practice!
--------------------

03:07 - 03:11

It's time to have some fun now with this new architecture you've learned.

Exploring dollar bills
======================

You will practice building classification models in Keras with the **Banknote Authentication** dataset. 

Your goal is to distinguish between real and fake dollar bills. In order to do this, the dataset comes with 4 features: `variance`,`skewness`,`kurtosis` and `entropy`. These features are calculated by applying mathematical operations over the dollar bill images. The labels are found in the dataframe's `class` column.

![](https://assets.datacamp.com/production/repositories/4335/datasets/6ce6fd4fdc548ecd6aaa27b033073c5bfc0995da/dollar_bills.png)

A pandas DataFrame named `banknotes` is ready to use, let's do some data exploration!

Instructions
------------

-   Import `seaborn` as `sns`.
-   Use `seaborn`'s `pairplot()` on `banknotes` and set `hue` to be the name of the column containing the labels.
-   Generate descriptive statistics for the banknotes authentication data.
-   Count the number of observations per label with `.value_counts()`.

In [None]:
# Import seaborn
import seaborn as sns

# Use pairplot and set the hue to be our class column
sns.pairplot(banknotes, hue='class') 

# Show the plot
plt.show()

# Describe the data
print('Dataset stats: \n', banknotes.describe())

# Count the number of observations per class
print('Observations per class: \n', banknotes['class'].value_counts())


A binary classification model
=============================

Now that you know what the **Banknote Authentication** dataset looks like, we'll build a simple model to distinguish between real and fake bills. 

You will perform binary classification by using a single neuron as an output. The input layer will have 4 neurons since we have 4 features in our dataset. The model's output will be a value constrained between 0 and 1. 

We will interpret this output number as the probability of our input variables coming from a fake dollar bill, with 1 meaning we are certain it's a fake bill.

![](https://assets.datacamp.com/production/repositories/4335/datasets/db1c482fd8cb154572c3ce79fe9a406c25ed1a9b/model_chapter2_binary_classification.JPG)

Instructions
------------

-   Import the `Sequential` model and `Dense` layer from tensorflow.keras.
-   Create a sequential model.
-   Add a 4 neuron input layer with the `input_shape` parameter and a 1 neuron output layer with `sigmoid` activation. 
-   Compile your model using `sgd` as an optimizer.

In [None]:
# Import the sequential model and dense layer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create a sequential model
model = Sequential()

# Add a dense layer
model.add(Dense(1, input_shape=(4,), activation='sigmoid'))

# Compile your model
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

# Display a summary of your model
model.summary()

Is this dollar bill fake ?
==========================

You are now ready to train your `model` and check how well it performs when classifying new bills! The dataset has already been partitioned into features: `X_train` & `X_test`, and labels: `y_train` & `y_test`.

Instructions
------------

-   Train your model for 20 epochs calling `.fit()`, passing in the training data.
-   Check your model accuracy using the `.evaluate()` method on the test data.
-   Print `accuracy`.

In [None]:
# Train your model for 20 epochs
model.fit(X_train, y_train, epochs=20)

# Evaluate your model accuracy on the test set
accuracy = model.evaluate(X_test, y_test)[1]

# Print accuracy
print('Accuracy:', accuracy)


1\. Multi-class classification
------------------------------

00:00 - 00:10

What about when we have more than two classes to classify? We run into a multi-class classification problem, but don't worry, we just have to make a minor tweak to our neural network architecture.

2\. Throwing darts
------------------

00:10 - 00:26

Identifying who threw which dart in a game of darts is a good example of a multi-class classification problem. Each dart can only be thrown by one competitor. And that means our classes are mutually exclusive since no dart can be thrown by two different competitors simultaneously.

3\. The dataset
---------------

00:26 - 00:35

The darts dataset consist of dart throws by different competitors. The coordinate pairs xCoord and yCoord show where each dart landed.

4\. The dataset
---------------

00:35 - 00:49

Based on the landing position of previously thrown darts we should be able to distinguish between throwers if there's enough variation between their throws. In our pairplot we can see that different players tend to aim at specific regions of the board.

5\. The architecture
--------------------

00:49 - 01:00

The model for this dataset has two neurons as inputs, since our predictors are xCoord and yCoord. We will define them using the input_shape argument, just as we've done before.

6\. The architecture
--------------------

01:00 - 01:08

In between there will be a series of hidden layers, we are using 3 Dense layers of 128, 64 and 32 neurons each.

7\. The architecture
--------------------

01:08 - 01:14

As outputs we have 4 neurons, one per competitor. Let's look closer at the output layer now.

8\. The output layer
--------------------

01:14 - 01:39

We have 4 outputs, each linked to a possible competitor. Each competitor has a probability of having thrown a given dart, so we must make sure the total sum of probabilities for the output neurons equals one. We achieve this with the softmax activation function. Once we have a probability per output neuron we then choose as our prediction the competitor whose associated output has the highest probability.

9\. Multi-class model
---------------------

01:39 - 01:58

You can build this model as we did in the previous lesson; instantiate a sequential model, add a hidden layer, also defining an input layer with the input_shape parameter,and finish by adding the remaining hidden layers and an output layer with softmax activation. You will do all this yourself in the exercises.

10\. Categorical cross-entropy
------------------------------

01:58 - 02:26

When compiling your model, instead of binary cross-entropy as we used before, we now use categorical cross-entropy or log loss. Categorical cross-entropy measures the difference between the predicted probabilities and the true label of the class we should have predicted. So if we should have predicted 1 for a given class, taking a look at the graph we see we would get high loss values for predicting close to 0 (since we'd be very wrong) and low loss values for predicting closer to 1 (the true label).

11\. Preparing a dataset
------------------------

02:26 - 03:06

Since our outputs are vectors containing the probabilities of each class, our neural network must also be trained with vectors representing this concept. To achieve that we make use of the tensorflow.keras.utils to_categorical function. We first turn our response variable into a categorical variable with pandas Categorical, this allows us to redefine the column using the categorical codes (cat codes) of the different categories. Now that our categories are each represented by a unique integer, we can use the to_categorical function to turn them into one-hot encoded vectors, where each component is 0 except for the one corresponding to the labeled categories.

12\. One-hot encoding
---------------------

03:06 - 03:23

Keras to_categorical essentially perform the process described in the picture above. Label encoded Apple, Chicken and Broccoli turn into a vector of 3 components. A 1 is placed to represent the presence of the class and a 0 to indicate its absence.

13\. Let's practice!
--------------------

03:23 - 03:29

Let's further explore these concepts as you build a multi-class model that predicts who threw which dart!

A multi-class model
===================

You're going to build a model that predicts who threw which dart only based on where that dart landed! (That is the dart's x and y coordinates on the board.)

This problem is a multi-class classification problem since each dart can only be thrown by one of 4 competitors. So classes/labels are mutually exclusive, and therefore we can build a neuron with as many output as competitors and use the `softmax` activation function to achieve a total sum of probabilities of 1 over all competitors.

The `Sequential` model and `Dense` layers are already imported for you to use.

Instructions
------------

-   Instantiate a `Sequential` model.
-   Add 3 dense layers of 128, 64 and 32 neurons each.
-   Add a final dense layer with as many neurons as competitors.
-   Compile your model using `categorical_crossentropy` loss.

In [None]:
# Instantiate a sequential model
model = Sequential()
  
# Add 3 dense layers of 128, 64 and 32 neurons each
model.add(Dense(128, input_shape=(2,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
  
# Add a dense layer with as many neurons as competitors
model.add(Dense(4, activation='softmax'))  # Assuming there are 4 competitors
  
# Compile your model using categorical_crossentropy loss
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


Prepare your dataset
====================

In the console you can check that your labels, `darts.competitor` are not yet in a format to be understood by your network. They contain the names of the competitors as strings. You will first turn these competitors into unique numbers,then use the `to_categorical()` function from `keras.utils`to turn these numbers into their one-hot encoded representation. 

This is useful for multi-class classification problems, since there are as many output neurons as classes and for every observation in our dataset we just want one of the neurons to be activated.

The dart's dataset is loaded as `darts`. Pandas is imported as `pd`. Let's prepare this dataset!

Instructions 1/2
----------------

-   Use the `Categorical()` method from pandas to transform the `competitor` column.
-   Assign a number to each competitor using the `cat.codes` attribute from the competitor column.

In [None]:
# Transform into a categorical variable
darts.competitor = pd.Categorical(darts.competitor)

# Assign a number to each category (label encoding)
darts.competitor = darts.competitor.cat.codes

# Print the label encoded competitors
print('Label encoded competitors: \n', darts.competitor.head())


Instructions 2/2
----------------

-   Import `to_categorical` from `tensorflow.keras.utils`.
-   Apply `to_categorical()` to your labels.

In [None]:
# Transform into a categorical variable
darts.competitor = pd.Categorical(darts.competitor)

# Assign a number to each category (label encoding)
darts.competitor = darts.competitor.cat.codes 

# Import to_categorical from keras utils module
from tensorflow.keras.utils import to_categorical

coordinates = darts.drop(['competitor'], axis=1)
# Use to_categorical on your labels
competitors = to_categorical(darts.competitor)

# Now print the one-hot encoded labels
print('One-hot encoded competitors: \n', competitors)


Training on dart throwers
=========================

Your model is now ready, just as your dataset. It's time to train!

The `coordinates` features and `competitors` labels you just transformed have been partitioned into `coord_train`,`coord_test` and `competitors_train`,`competitors_test`.

Your `model` is also loaded. Feel free to visualize your training data or `model.summary()` in the console. 

Let's find out who threw which dart just by looking at the board!

Instructions
------------

-   Train your `model` on the training data for 200 `epochs`.
-   Evaluate your `model` accuracy on the test data.

In [None]:
# Fit your model to the training data for 200 epochs
model.fit(coord_train, competitors_train, epochs=200)

# Evaluate your model accuracy on the test data
accuracy = model.evaluate(coord_test, competitors_test)[1]

# Print accuracy
print('Accuracy:', accuracy)


Softmax predictions
===================

Your recently trained `model` is loaded for you. This model is generalizing well!, that's why you got a high accuracy on the test set. 

Since you used the `softmax` activation function, for every input of 2 coordinates provided to your model there's an output vector of 4 numbers. Each of these numbers encodes the probability of a given dart being thrown by one of the 4 possible competitors. 

When computing accuracy with the model's `.evaluate()` method, your model takes the class with the highest probability as the prediction. `np.argmax()` can help you do this since it returns the index with the highest value in an array. 

Use the collection of test throws stored in `coords_small_test` and `np.argmax()`to check this out!

Instructions 1/2
----------------

-   Predict with your `model` on  `coords_small_test`.
-   Print the model predictions.

In [None]:
# Predict on coords_small_test
preds = model.predict(coords_small_test)

# Print preds vs true values
print("{:45} | {}".format('Raw Model Predictions', 'True labels'))
for i, pred in enumerate(preds):
    print("{} | {}".format(np.argmax(pred), competitors_small_test[i]))


Instructions 2/2
----------------

-   Use `np.argmax()`to extract the index of the highest probable competitor from each `pred` vector in `preds`.

In [None]:
# Predict on coords_small_test
preds = model.predict(coords_small_test)

# Print preds vs true values
print("{:45} | {}".format('Raw Model Predictions', 'True labels'))
for i, pred in enumerate(preds):
    print("{} | {}".format(pred, competitors_small_test[i]))

# Extract the position of highest probability from each pred vector
preds_chosen = [np.argmax(pred) for pred in preds]

# Print preds vs true values
print("{:10} | {}".format('Rounded Model Predictions', 'True labels'))
for i, pred in enumerate(preds_chosen):
    print("{:25} | {}".format(pred, competitors_small_test[i]))
