# Your first venture into Deep Learning

Hello, welcome to this last part of your journey towards deep learning and AI. Maybe it isn't the last part but really your first step.
So far you have learned a lot about Data Science, and the power of data and machine learning. Deep learning goes a step further and enables us to work with almost any kind of data, structured or unstructured, to gain insights and use from it. Deep learning is an important, probably the most important part for any "weak" AI System (that is a system that is only intelligent in a restricted setting) and for any upcoming strong or general AI.

But before we build an AI Butler or a self driving car, we should start with something more familiar to you.
Deep learning essentially follows the same structure as any supervised machine learning: it can perform classification and regression given features and a target. (There are more advanced deep learning use cases like reinforcement learning, but we won't cover them here)

Deep learning is a computer technique to extract and transform data–-with use cases ranging from human speech recognition to animal imagery classification–-by using multiple layers of neural networks. Each of these layers takes its inputs from previous layers and progressively refines them. The layers are trained by algorithms that minimize their errors and improve their accuracy. In this way, the network learns to perform a specified task.

This first notebook will walk you through a regression task on tabular data (something you are familiar with) using deep learning.
A lot of people assume that you need all kinds of hard-to-understand stuff to get great results with deep learning, but as you'll see it can be quite simple.
We will be using the fast.ai library which is build on top of Pytorch and makes deep learning relatively simple. fast.ai also provides a course and a book, that make up a good portion of our Techlabs AI track, so if you haven't had enough after this track you know where to find more!

You first need to have the following libraries installed.
If you find yourself without one of them, uncommend the corresponding line and install it (make sure this notebook is opened in the environment you want the package to be installed in)

In [None]:
##not necessary if run in colabs:
#!pip install -q seaborn 
##necessary even in colabs:
#!pip install -Uqq fastbook #if this does not work try:
#!pip install fastai==2.0

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import mean_absolute_error
from fastai import * #good for experimentation not production code
from fastai.tabular.all import * #good for experimentation not production code

# Regression using a deep neural net

**Remember:** In a regression problem, we aim to predict the output of a continuous value, like a price. Whereas in a classification problem, we aim to select a class from a list of classes (for example, where a picture contains a dog or a cat, recognizing which pet is in the image).

This notebook uses the classic Auto MPG Dataset and builds a model to predict the fuel efficiency of late-1970s and early 1980s automobiles. 
To do this, we'll provide the model with a description of many automobiles from that time period. 
This description includes attributes like: cylinders, displacement, horsepower, and weight.

We load the data through the public url and name the columns accprding to the documentation. Run the code below to get a csv of the raw dataset.

In [None]:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']

raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?', comment='\t',
                          sep=' ', skipinitialspace=True)

Let's look at the data to get a feel of what the dataset contains.

#### NOW YOU
use functions and methods you know like `.head()` to get a good look at the raw data. Don't change anything in the data just yet.

In [None]:
data = raw_dataset.copy()
##YOUR CODE HERE

In [None]:
##YOUR CODE HERE

In [None]:
##YOUR CODE HERE

### Clean the data
As with any data application or ML algorithm Deep Learning requires relatively clean data. Sometimes the model itself may help you clean the data (e.g. in Image recognition) but more on that later. 

Let's check the data.

The dataset may contain a few unknown or missing values.
#### NOW YOU 
Find out how many missings are in each column of the dataset.

In [None]:
##YOUR CODE HERE

As there are only very few missing values we can just drop them.

In [None]:
data.dropna(inplace = True) #careful with 'inplace = True', only use it if you are sure what the result is

### Looking at the individual columns

If you haven't inspected the individual columns yet, it is a good time to do so here.

If we look at the "Origin" column we see that it is categorical and not numeric. In order to make our neural network work we need to convert the categories to numbers and one-hot encode them.

#### NOW YOU 

overwrite the 'Origin' column with the number mappins given in the origin_mappings dictionary.

In [None]:
origin_mappings = {1: 'USA', 2: 'Europe', 3: 'Japan'}
##YOUR CODE HERE
data.head()#check if everything is correct

Next we will one-hot-encode our categories. There are multiple ways to do that, we will use plain pandas. Other ways would include scikit learn, and fast ai. Check out their documentation to find out how we might have accomplished this task through their functions. Maybe fast.ai would have been faster?

In [None]:
data = pd.get_dummies(data, prefix='', prefix_sep='')
data.head()

### Split the data into train and test

#### NOW YOU
Split the dataset into a training set and a test set. Use 80% of the data for our training set. Set a random_state.

We will use the test set in the final evaluation of our models and won't look at it before that to avoid any data.

In [None]:
##YOUR CODE HERE
##YOUR CODE HERE

### Inspect the data visually
Let's have a quick look at the joint distribution of a few pairs of columns from the training set.

In [None]:
sns.pairplot(train[['MPG', 'Cylinders', 'Displacement', 'Weight']], diag_kind='kde')

#### NOW YOU

Take your time and study the plot, try to spot some relationships. Think about which features could have the highest importance.


- ...
- ...
- ...





**ONLY READ ON IF YOU HAVE CAREFULLY THOUGHT ABOUT THE TASK ABOVE**

**No Cheating**



Looking at the top row it should be clear that the fuel efficiency (MPG) is a function of all the other parameters. Looking at the other rows it should become evident that they are each functions of eachother.

Let's also look at the overall statistics, note how each feature covers a very different range:

In [None]:
train.describe().T #.T transposes the output dataframe

#### NOW YOU
Think about what this might mean for our Neural network, do we need to scale the features? Normalize them? Which range is appropriate and why? Note down your answer, we will get to this issue in a moment.

- ...
- ...
- ...


### Split features from labels

We still need to separate the target value, the "label", from the features. This label is the value that you will train the model to predict. In our case 'MPG' (miles per gallon)

In [None]:
train_features = train.copy()
test_features = test.copy()

train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')

#### BONUS: think about other ways of accomplishing the operation above and share them with your fellow learners on Slack

#### NOW YOU 
Check if everything went as expected by looking at the data and shape of the data

In [None]:
##YOUR CODE HERE

In [None]:
##YOUR CODE HERE

In [None]:
##YOUR CODE HERE

In [None]:
##YOUR CODE HERE

### Normalization
Just a moment ago you thought about the different scales our features have. If you want to look again: In the table of statistics it's easy to see how different the ranges of each feature are.

In [None]:
train_features.describe().T[["mean", "std","min","max"]]

It is good practice to normalize features that use different scales and ranges.

One reason this is important is because in a Neural Network the features are multiplied by the model weights. So the scale of the outputs and the scale of the gradients are affected by the scale of the inputs.

Although a model might converge without feature normalization, normalization makes training much more stable and is a basic step in most Deep Learning workflows.


**fast.ai** makes Normalizing very easy. They provide a Dataloader which can load and transform your data as needed. The `from_df()` method takes the `procs=`argument which let's you specify which processing steps you want the DataLoader to perform. In our case we could have used `Categorify,Normalize` and maybe even others that handle missings. fast.ai provides a lot of these handy functionalities that you will learn about further down the road, or when you check out their extensive documentation.

Here we will use the `TabularDataLoaders` which is a class that combines data loading and preparation. It makes our data 'ready' to be use by our deep learning model. If this seems a little 'black-box' right now it's fine. You can dig into the code and create more custom ways to load and prepare data later if you want. The AI-track will give you more knowledge on how to do these things. For now we will stick to the high level basics so that we can create something that works before going into all of the details.



More about the tabular data loaders and related functions [here](https://docs.fast.ai/tabular.data.html)

The `TabularDataLoaders` from fast.ai also expects you to pass the names of all continous and categorical variables.

#### NOW YOU
List the categorical and continous variables from our dataframe below to pass them to the function.

In [None]:
cat_names = ##YOUR CODE HERE
cont_names = ##YOUR CODE HERE
procs = [Categorify,Normalize]
y_names = 'MPG'
dls = TabularDataLoaders.from_df(data, procs=procs, 
                                 cat_names=##YOUR CODE HERE, 
                                 cont_names=##YOUR CODE HERE, 
                                 y_names= y_names, bs=64) #bs is the batch size more on that later


We can use our dataloaders object (dls) and look at a batch with `.show_batch()`

In [None]:
dls.show_batch()

## Training the model

Now we need to specify the model. fast.ai makes this very easy, if you are just starting out you can use their functions and defaults to create working solutions without having to specify layers,activation functions, optimizers etc. (which is the goal here as we don't expect you to know everything from the start)

We simply create a learner with `tabular_leaner()` pass in our dataloaders object and set a metric. In our case the mean-absolute-error will do fine, remember this is a regression problem.

In [None]:
learner = tabular_learner(dls, metrics = mae)

Now that we have specified the model it's time to train it.
fast.ai provides different ways to train a model, we will use the one cycle method. If you are curious what it does check out the documentation.

In order to fit a model, we have to provide at least one piece of information: how many times to look at each data point (known as number of epochs). The number of epochs you select will largely depend on how much time you have available, and how long you find it takes in practice to fit your model. If you select a number that is too small, you can always train for more epochs later.

Here we will use 35 epochs. If you want to learn how to select the number of epochs or what automatic methods for the epoch settings there are (e.g. early stopping) check out our AI track.

### BONUS
Once you are done with this notebook, experiment with using more ore less epochs to train, notice what happens to the MAE and to the train and validation losses.

In [None]:
learner.fit_one_cycle(35)

## Model evaluation

After we have trained our model and looked at the validation MAE (that is the score on the validation set which was automatically created by fast.ai), it's time to see how it performs on unseen data. You can use the `learner.predict()` to get predictions for a single row of data. To get predictions on our test data (test_features) and compare the predicition to the actual labels  (test_labels) you can use the test_dl method from the DataLoaders. This will load the test dataframe and prepare it for predicitions. *Note*: this dataframe should not have the dependent variable in its columns, so we will just use our test_features not the test_labels.

In [None]:
test_dl = learner.dls.test_dl(test_features)

preds = learner.get_preds(dl = test_dl)
#preds is a tuple containing a tensor, extract tensor convert to array and unpack array for calculation of MAE
preds = np.array(preds[0].T)[0]
preds

#### NOW YOU 

Calculate the mean absolute error using the respective function from scikit-learn, the test labels and our predictions.

In [None]:
##YOUR CODE HERE

If everything went correct this error looks really good. (in my version around 2.1, which is even slightly better than my validation error.

### Visualize predictions

#### Scatterplot (with regression line)

You can use this to visualize how close the predicitions are to the actual values. If everything worked well the dots should be very close to the diagonal line (which in this case would mean perfect prediction)

In [None]:
sns.regplot(x = test_labels, y = preds)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
sns.despine()
plt.xlim([5,45])
plt.ylim([5,45])
plt.show()

#### Error distribution

In [None]:
error = preds - test_labels
sns.histplot(error, bins=25)
sns.despine()
plt.xlabel('Prediction Error [MPG]')

And that's basically it. Of course you can add steps, or use other techniques to gain more insights. But you have just trained a Neural Network on tabular data and gotten pretty good results.

Check out the following notebooks to see how to use fast.ai and neural nets for image classification.

### BONUS

1. Compare your results with other methods (e.g linear regression, random forest, svm). Post your results in the slack channel.

2. Try to reach the best MAE without overfitting. If necessary revise how to spot overfitting and discuss it with your fellow learners!