# Neural Networks

## Agenda
* Fit a neural network to a dataset with three classes
    * Tweak parameters to fit the data
* You Try: Brain Cancer dataset

Let's make a neural net that separates these three classes

In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
%matplotlib inline
from sklearn.datasets import make_moons, make_circles, make_classification

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                           random_state=0, n_clusters_per_class=1)
rng = np.random.RandomState(2)
X += 2 * rng.uniform(size=X.shape)
linearly_separable = (X, y)

X1,Y1 = make_moons(noise=0.3, random_state=0)

plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
            s=200, edgecolor='k')

In [None]:
import pandas as pd
import numpy as np

data = pd.DataFrame(X1)
data['target'] = Y1
data.columns = ['x_pt','y_pt', 'class']

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

# Create train, test split
X_train, X_test, y_train, y_test = train_test_split(data[['x_pt','y_pt']], data['class'], random_state=0)

# Instantiate a Multilayer Perceptron Classifier
mlp = MLPClassifier(activation='logistic',solver='lbfgs',random_state=42)

mlp.fit(X_train, y_train)

print('accuracy on training set is ' + str(mlp.score(X_train, y_train)))
print('accuracy on test set is ' + str(mlp.score(X_test, y_test)))

Note about the 'solver' parameter: 
* The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.
* Use 'adam' for large datasets (>1000 rows) and 'lbfgs' for anything else than that

**What about the total number of hidden layers? **

In [None]:
mlp

* It's defaulted to have one layer with 100 perceptrons. This could lead to overfitting. Let's change it so that we only have two perceptrons
* What does it mean when we set the model to have two perceptrons?

In [None]:
mlp = MLPClassifier(activation='logistic',solver='lbfgs',random_state=42, hidden_layer_sizes=(2))   

mlp.fit(X_train, y_train)

print('Accuracy on the training subset: {:.3f}'.format(mlp.score(X_train, y_train)))
print('Accuracy on the test subset: {:.3f}'.format(mlp.score(X_test, y_test)))

**Let's take a look at the decision boundary and see how the model fit the data**

In [None]:
from matplotlib.colors import ListedColormap

def plot_decision_boundary(mlp):
    x_min, x_max = X_train['x_pt'].min() - .5, X_train['x_pt'].max() + .5
    y_min, y_max = X_train['y_pt'].min() - .5, X_train['y_pt'].max() + .5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))

    # just plot the dataset first
    cm = plt.cm.RdBu
    cm_bright = ListedColormap(['#FF0000', '#0000FF'])

    Z = mlp.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=cm, alpha=.8)

    # Plot also the training points
    plt.scatter(X_train['x_pt'], X_train['y_pt'], c=y_train, cmap=cm_bright,
               edgecolors='black', s=25)
    # and testing points
    #plt.scatter(X_test['x_pt'], X_test['y_pt'], c=y_test, cmap=cm_bright,
    #           alpha=0.6, edgecolors='black', s=25)

plot_decision_boundary(mlp)

Note that there's only one decision boundary since we intitially set the model to have two perceptrons in the hidden layer. We can take a look at the decision boundary coefficients and intercept.

In [None]:
print(mlp.coefs_)
print('\n')
print(mlp.intercepts_)

What happens if we toggle the number of perceptrons in our hidden layer?

In [None]:
plt.figure(figsize=(16, 32))

# Let's create a neural net with a hidden layer of 1, 2, 3, 4, 5, 20 and 50 nodes
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]

for i, nn_hdim in enumerate(hidden_layer_dimensions):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer size %d' % nn_hdim)
    mlp = MLPClassifier(activation='logistic',solver='lbfgs',random_state=42, hidden_layer_sizes=(nn_hdim))   
    mlp.fit(X_train, y_train)
    plot_decision_boundary(mlp)
plt.show()

Let's also standardize our features since Multilayer Perceptron is sensitive to feature scaling

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit(X_train).transform(X_train)
X_test_scaled = scaler.fit(X_test).transform(X_test)

mlp = MLPClassifier(activation='logistic',solver='lbfgs',random_state=42, hidden_layer_sizes=(3))  

mlp.fit(X_train_scaled, y_train)

print('Accuracy on the training subset: {:.3f}'.format(mlp.score(X_train_scaled, y_train)))
print('Accuracy on the test subset: {:.3f}'.format(mlp.score(X_test_scaled, y_test)))

# Optical Recognition of Handwritten Digits Data Set

* Can we build a neural network to classify handwritten digits?

![title](https://shapeofdata.files.wordpress.com/2013/12/digits.png)

The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.

* Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

* The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.

* Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).

* Read more on [kaggle](https://www.kaggle.com/c/digit-recognizer/data)

## Partner-Up

In [None]:
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

digits = pd.read_csv('../../../datasets/digit_train.csv')

In [None]:
digits.head()

* How many features do we have?
* How do we go about choosing the number of perceptrons to have in one hidden layer?

Let's try using two perceptrons for visualization purposes

In [None]:
import time

# Time it
now = time.time()

# Create train, test split
X_train, X_test, y_train, y_test = train_test_split(digits.drop('label',axis=1), digits.label, random_state=0)

# Let's set a hidden_layer param
hidden_layers_neurons = (10)

# Scale
#scaler = StandardScaler()
#X_train_scaled = scaler.fit(X_train).transform(X_train)
#X_test_scaled = scaler.fit(X_test).transform(X_test)
X_train_scaled = X_train / 255.0
X_test_scaled = X_test / 255.0

# Instantiate a Multilayer Perceptron Classifier
mlp = MLPClassifier(activation='logistic', max_iter=500, random_state=42, hidden_layer_sizes=hidden_layers_neurons)  

mlp.fit(X_train_scaled, y_train)

print('accuracy on training set is ' + str(mlp.score(X_train_scaled, y_train)))
print('accuracy on test set is ' + str(mlp.score(X_test_scaled, y_test)))

current = time.time()
print('time difference is ' + str(current - now))

That took a long time! We will try to address the processing issue later. Let's see what our coefficients in our hidden layer look like and see if they separate the data well

In [None]:
fig, axes = plt.subplots(2,5)

# use global min / max to ensure all weights are shown on the same scale

vmin, vmax = mlp.coefs_[0].min(), mlp.coefs_[0].max()
for coef, ax in zip(mlp.coefs_[0].T, axes.ravel()):
    ax.matshow(coef.reshape(28, 28), cmap=plt.cm.gray, vmin=.5 * vmin,
               vmax=.5 * vmax)
    ax.set_xticks(())
    ax.set_yticks(())

You may not have the luxury to set the iteration to a very high number due to processing limitations.