# Introduction to Neural Networks with Tensorflow
## Overview

### What You'll Learn
In this section, you'll learn
1. Why use neural networks for machine learning
2. When to use neural networks
3. How to implement a neural network with Tensorflow

### Prerequisites
Before starting this section, you should have an understanding of
1. [scikit-learn](https://colab.research.google.com/github/HackBinghamton/MachineLearningWorkshopWeek1/blob/master/intro_ml_scikit.ipynb)
2. [Basic Python (functions, loops, lists)](https://github.com/HackBinghamton/PythonWorkshop)
3. (Optional) [Matplotlib and numpy](https://github.com/HackBinghamton/DataScienceWorkshop)

### Introduction
Neural networks are a type of machine learning algorithm designed to make connections somewhat like the human brain does. They can be significantly more powerful than scikit-learn libraries, but aren't always the best approach to a machine learning problem.

### Initial Setup Commands

In [None]:
# RUN ME
!pip3 install tensorflow
!pip3 install sklearn

---

## Why Neural Networks?

So why use Neural Networks at all when we have such great models like Linear and Logistic Regression at our disposal?

Well sometimes, these models just  aren't enough to separate out our data. Take this dataset for example:

![Difficult data](images/gaussian-kernel.png)

We can't just draw a straight line, or an S-shaped line though this thing to cleanly
divide the red and blue points. We need a model that can take on more non-linear shapes.
That's where Neural Networks can be very useful!

> **NOTE**: Neural Networks can still be overkill for problems like this, but this is for the sake
of an introductory workshop. There are plenty of other models in the Scikit Learn library that
can handle data like this. Try checking out [Support Vector Machines](https://scikit-learn.org/stable/modules/svm.html)
for example.

---

## The MNIST Dataset
We were able to achieve a respectable accuracy on scikit-learn's `digits` dataset using logistic regression. However,
`digits` is quite small and simple - each image was only 8 pixels by 8 pixels. Modern computer monitors display millions of pixels at a time, so an 8x8 image is primitive compared to real-world data we'd be working with.

The MNIST dataset, on the other hand, consists of 28 x 28 pixel images of handwritten digits for classification. This is closer to a real-world dataset's complexity, but this also means that our logistic regression approach from Part 1 will see a noticeable decrease in accuracy.

## Using Regressions with the MNIST Dataset

### Loading in the MNIST Dataset
First, we'll import tensorflow and load the MNIST dataset into our program.

In [None]:
# RUN ME
import tensorflow as tf

# Load in the MNIST dataset from TensorFlow
mnist = tf.keras.datasets.mnist

Next, we're going to split our data into training and testing sets. Unlike the `digits` dataset, the MNIST dataset is already split into training and testing sets.

In [None]:
# RUN ME
(X_train, Y_train), (X_test, Y_test) = mnist.load_data() 

Machine Learning algorithms work best with values between 0 and 1, inclusive. However, each pixel value in the MNIST dataset is between 0 and 255. Let's fix this by dividing each pixel value by 255.

In [None]:
# RUN ME
X_train, X_test = X_train / 255, X_test / 255

By default, the MNIST dataset is loaded in as a 60000 x 28 x 28 array. This is because there are 60000 images of size 28 x 28 pixels. However, Logistic Regression requires a two dimensional input! We need to vectorize our data by reshaping our data from three dimensions to two.

In [None]:
# RUN ME
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)

### Training a Logistic Regression Model on the MNIST Dataset
As we did in Part 1, we'll now use `scikit-learn` to implement a Logistic Regression model.

In [None]:
# RUN ME
from sklearn.linear_model import LogisticRegression

# Initialize a LogisticRegression object
logistic_model = LogisticRegression(solver='liblinear', multi_class='ovr')

# Fit the logistic regression algorithm with the training data
logistic_model.fit(X_train[:10000, :], Y_train[:10000])

Let's see how it holds up -

In [None]:
# RUN ME
print("Logistic Regression Regression accuracy:", str(logistic_model.score(X_test, Y_test) * 100) + "%")

While logistic regression achieved ~95% accuracy on the digits dataset, it only achieves ~90% on the MNIST dataset.
But what if we need better than this? This is where neural networks come in handy. 

## Building Our First Neural Network
### Reloading the MNIST data
Unlike scikit-learn's Logistic Regression model, Tensorflow's neural networks can process 3D data. Let's load the MNIST dataset back as a 3D array.

In [None]:
# RUN ME
(X_train, Y_train), (X_test, Y_test) = mnist.load_data() 
X_train, X_test = X_train / 255, X_test / 255

### Creating a Neural Network Model

In [None]:
'''
    Vectorize the input for faster processing
'''
flatten_layer = tf.keras.layers.Flatten(input_shape=(28, 28))


'''
    This line adds a dense layer to the neural network. Dense layers are fully connected layers, 
    which are the standard "layers" in a neural network.

    128 - units of output dimensionality. We generally try to use powers of 2 (64, 128, 256, etc) 
    here because they're most efficient on GPUs. Finding a good value here is important - 
    2048 would be overkill on the MNIST dataset, but 16 might not be enough.

    activation='relu' - relu stands for Rectified Linear Unit. Essentially, this activation adds 
    non-linearity to the neural network. If you try to run a linear regression model on this dataset, 
    you'll see it does very poorly. This suggests that it would be a good idea to add some 
    non-linearity to this problem.
'''
dense_relu_layer = tf.keras.layers.Dense(128, activation='relu')


'''
    Dropout is a good layer for avoiding overfitting - training a machine learning algorithm
    on a training set too much. This causes the machine learning algorithm to notice irrelevant aspects 
    ("noise") of the training set. It essentially adds a layer of randomness to the neural network by 
    ignoring a percentage of random inputs (in this case, ignore a random 20%) on each iteration. 

    "If you're good at something while drunk, you'll be really good at it sober" - Ryan McCormick, former HackBU co-director
'''
dropout_layer = tf.keras.layers.Dropout(0.2)


'''
    Add another dense layer, but this time with an output dimensionality of 10 units because there are
    only 10 options (there are only 10 digits).

    activation='softmax' turns the arbitrary outputs of the neural network into "probabilities".

    The final decision made by the neural network should be the output with the highest probability.

    Example:
        Input: Image of handwritten "2"
        Output: [0.01, 0.02, 0.9, 0.01, 0.005, 0.005, 0.02, 0.01, 0.01, 0.01]
            > Softmax gives us 10 probabilities, one for each digit 0 - 9.
        Interpretation: This image is a handwritten "2" with probability of 0.9, or 90%.
'''
dense_softmax_layer = tf.keras.layers.Dense(10, activation='softmax')

In [None]:
model = tf.keras.models.Sequential([
    flatten_layer,
    dense_relu_layer,
    dropout_layer,
    dense_softmax_layer
])

## Training Our Model

Now, let's compile our model:

In [None]:
# RUN ME
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

and fit it to our training set:

In [None]:
# RUN ME
model.fit(X_train, Y_train, epochs=5)

`epochs` is the number of times we fit our neural network to the training set. This value needs to be carefully decided in a neural network. 

While it may be tempting to make `epochs` as high as possible, too many epochs will cause our neural network to make irrelevant connections, decreasing its accuracy. This is known as **overfitting**.

We all have that one friend that reads too much into 1 or 2 events and does something stupid as a result. Don't let your neural networks be that friend.

### Assessing our Model

In [None]:
# RUN ME
# Evaluate the accuracy of the neural network and print it out
test_loss, test_acc = model.evaluate(X_test, Y_test)

print(test_acc)

We can see that our simple neural network greatly outperforms our logistic regression model (~98% vs ~90%). Although an 8% increase may seem somewhat underwhelming, the accuracy increase vs. difficulty scale of neural networks is very much logarithmic. That is, it's often easier to go from 60% to 70% accuracy than it is to go from 98 to 99%.

---
## Next section (recommended): [House Price Prediction with Machine Learning](https://colab.research.google.com/github/HackBinghamton/MachineLearningWorkshopWeek1/blob/master/boston_housing_price_exercise.ipynb)