<a target="_blank" href="https://colab.research.google.com/github/JLDC/Data-Science-Fundamentals/blob/master/notebooks/207_neural-networks-in-sklearn.ipynb">
    <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Open this notebook in Google Colab
</a>

___

# Neural Networks with `scikit-learn`
___

In the previous notebook, we have seen how to write out the equations for forward passes and backpropagation to build our own neural network using only NumPy.

While this is a nice didactical example, in practice, you would never code your own neural network from scratch. This would be terribly inefficient. A lot of very smart people have spent a long time figuring out how to write code to make neural network training extremely efficient, it would be a mistake to not re-use their work.

In this notebook, we will very briefly show you how to use `scikit-learn` to set up a neural network for either classification or regression. **We recommend using `scikit-learn` for your project**, as this follows the same syntax of the other estimators you have seen until now. However, as your Python knowledge grows, you will realize that for neural networks, `scikit-learn` is not used. Instead, you will encounter either [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/), or [JAX](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html). Those are the three dominant libraries in Python when it comes to deep learning. Unfortunately, they are a bit more complicated and require knowledge of a few somewhat advanced concepts that we have not taught you in this course. For your project, working with a known framework such as `scikit-learn` is more than enough.

Lastly, note that we have not taught you `scikit-learn` because it is easy. It is, without doubt, the dominant package in Python when it comes to machine learning. The three above mentioned packages are simply particularly specialized for deep learning (i.e., neural networks with many layers), they do not offer any other type of machine learning techniques such as decision trees, random forests or clustering algorithms that `scikit-learn` implements.

In [None]:
# Import our standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import the multilayer perceptron regressor and classifier from sklearn
from sklearn.neural_network import MLPRegressor, MLPClassifier
# Import a helper to scale our data
from sklearn.preprocessing import StandardScaler

# Define the path where the data is stored
DATA_PATH = "https://raw.githubusercontent.com/JLDC/Data-Science-Fundamentals/master/data"

## Classification
___

In this short example, we show how to build a neural network to classify multiple categories (virginica, setosa, and versicolors iris flowers).

In [None]:
# Use the iris dataset
df = pd.read_csv(f"{DATA_PATH}/data/iris.csv")

In [None]:
# Initialize a classifier which we will use to classify the species
# Use three hidden layers, with sizes 32, 64, and 32, ReLU activation functions,
# stochastic gradient descent optimization and a regularization paramater
# (lambda or alpha) of 0.001, batchsizes of 32 and 1000 epochs
nnet = MLPClassifier(hidden_layer_sizes=(32, 64, 32), activation="relu", solver="sgd",
                    alpha=0.001, batch_size=32, max_iter=1000)

In [None]:
# Features of our dataset
X = df[df.columns[:-1]]
# Output to predict
y = df["species"]

# Scale the inputs
scaler = StandardScaler()
X = scaler.fit_transform(X)

In [None]:
# Fit the network
nnet.fit(X, y)

# Make predictions
ypred = nnet.predict(X)

# Count the number of missclassifications
missclassifications = ypred != y

print(f"The network missclassifies {missclassifications.sum()} flowers")

## Regression
___

In this short example, we show how to build a neural network to predict a continuous variable such as the crop yields.

In [None]:
# Use the US crops dataset
df = pd.read_csv(f"{DATA_PATH}/data/us_crops.csv")

In [None]:
# Initialize a neural network to predict the crop yields given the temperature
# Use three hidden layers, with sizes 32, 64, and 32, ReLU activation functions,
# stochastic gradient descent optimization and a regularization paramater
# (lambda or alpha) of 0.001, batchsizes of 32 and 1000 epochs
nnet = MLPRegressor(hidden_layer_sizes=(32, 64, 32), activation="relu", solver="sgd",
                    alpha=0.001, batch_size=32, max_iter=1000)

In [None]:
# Features of our dataset
X = df[["temp"]]
# Output to predict
y = df["yield"]

# Scale the inputs
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Scale the outputs (we cannot use StandardScaler() on a 1D array)
mu, sigma = y.mean(), y.std() # We will use this to scale back to original values!
y = (y - mu) / sigma

In [None]:
# Fit the network
nnet.fit(X, y)

# Make predictions
ypred = nnet.predict(X)

# Reconstruct outputs and scale predictions
y = y * sigma + mu
ypred = ypred * sigma + mu

# Compute the MAE
mae = np.sum(np.abs(ypred - y))

print(f"The mean absolute error is {mae}")

In [None]:
# Create a prediction on the min to the max temp (for plotting)
xmin, xmax = X.min(), X.max()
xs = np.linspace(X.min(), X.max(), 100)
# Predict and scale back to original range for the plot
ys = nnet.predict(xs.reshape(-1, 1)) * sigma + mu

In [None]:
# Plot the results
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the true values
ax.scatter(df["temp"], df["yield"], label="True values", alpha=0.8)
# Plot the predictions
ax.plot(
    scaler.inverse_transform(xs.reshape(-1, 1)),
    ys, color="red", linestyle="dashdot", label="Predictions"
)

# Add labels and legend
ax.set_xlabel("Temperature")
ax.set_ylabel("Crop yields")
ax.legend()

⚠️ As a last warning, neural networks are not the *End All Be All* of machine learning models. They come with many hyperparameters and choosing the right ones is a daunting task. As you might find out, a change of hyperparameters can lead to drastic changes in the model (good or bad). If your network is very unstable, a good performance might simply mean that you have a lucky seed on your validation set, be wary!

In this notebook, we have chosen some hyperparameters that might be good or bad, don't just use the ones from those examples but try to find the best ones for your problem instead (*Hint*: Think about some methods we have seen to figure out which model is best!). Furthermore, scaling is incredibly important, if you try running the regression problem without scaling the output, you will see that the network is doing much worse than even the simplest estimator we covered in class.

#### <font style="color:green">**➡️ ✏️ Question 1**</font>

Using the code above as an inspiration, create either a classifier or a regressor neural network for a data set of your choice (you can pick one from the data folder). Try fitting multiple features, play around with different hyperparameters and discuss how the results compare.