# Neural network methods

Author: Gaurav Vaidya

## Learning objectives
* Understand what a neural network is and how it can be used.
* Implement a neural network to label data based on multiple input features.
* Understand when a neural network might be useful and when it might be worse than other methods.
* Learn where to learn more about neural networks.

## Learn deeply
[Artificial Neural Networks (ANNs)](https://en.wikipedia.org/wiki/Artificial_neural_network) and [deep learning](https://en.wikipedia.org/wiki/Deep_learning) are currently getting a lot of interest, both as a subject of research and as a tool for analyzing datasets. ANNs are similar to feature crosses, except that thanks to something called [backpropagation](https://en.wikipedia.org/wiki/Backpropagation), an ANN is able to choose its own features based on the testing data. A lot of the other advantages of ANNs are related specifically to interpreting visual and auditory data, which we won't be doing today, but I'll point you some resources to learn about [convolutional neural networks](https://en.wikipedia.org/wiki/Convolutional_neural_network) yourself.

## Reminders of the ground rules
* Always have training data (used to train the ANN) and testing data (used to test how well the ANN might work against data it has never seen before).
* Never, ever, ever, ever, *ever* let the ANN see testing data while training!

## What sort of forest is this?
Let's jump in with a dataset called [Covertype](https://archive.ics.uci.edu/ml/datasets/Covertype), where we try to predict forest cover type based on a number of features of a 30x30m area of forest as follows:

| Column | Feature | Units | Description | How measured |
|---|--------|-------|-------------|--------------|
| 1 | Aspect | degrees azimuth | Aspect in degrees azimuth | Quantitative |
| 2 | Slope | degrees | Slope in degrees | Quantitative |
| 3 | Horizontal_Distance_To_Hydrology | meters | Horz Dist to nearest surface water features | Quantitative |
| 4 | Vertical_Distance_To_Hydrology | meters | Vert Dist to nearest surface water features | Quantitative |
| 5 | Horizontal_Distance_To_Roadways | meters | Horz Dist to nearest roadway | Quantitative |
| 6 | Hillshade_9am | 0 to 255 index | Hillshade index at 9am, summer solstice | Quantitative |
| 7 | Hillshade_Noon | 0 to 255 index | Hillshade index at noon, summer soltice | Quantitative |
| 8 | Hillshade_3pm | 0 to 255 index | Hillshade index at 3pm, summer solstice | Quantitative |
| 9 | Horizontal_Distance_To_Fire_Points | meters | Horz Dist to nearest wildfire ignition points | Quantitative |
| 10-14 | Wilderness_Area | 4 binary columns with 0 (absence) or 1 (presence) | Which wilderness area this plot is in | Qualitative |
| 14-54 | Soil_Type | 40 binary columns with 0 (absence) or 1 (presence) | Soil Type designation | Qualitative |

Using this information, we are trying to classify each 30x30m plot as one of seven forest types.

This dataset is built into Scikit, so we can use it to download and load the dataset for use.

In [16]:
from sklearn import datasets
help(datasets.fetch_covtype)

Help on function fetch_covtype in module sklearn.datasets.covtype:

fetch_covtype(data_home=None, download_if_missing=True, random_state=None, shuffle=False, return_X_y=False)
    Load the covertype dataset (classification).
    
    Download it if necessary.
    
    Classes                        7
    Samples total             581012
    Dimensionality                54
    Features                     int
    
    Read more in the :ref:`User Guide <covtype_dataset>`.
    
    Parameters
    ----------
    data_home : string, optional
        Specify another download and cache folder for the datasets. By default
        all scikit-learn data is stored in '~/scikit_learn_data' subfolders.
    
    download_if_missing : boolean, default=True
        If False, raise a IOError if the data is not locally available
        instead of trying to download the data from the source site.
    
    random_state : int, RandomState instance or None (default)
        Determines random number genera

So we don't need to provide any arguments, but it warns us that it may need to download this dataset. It also describes the the returned dataset object will have the following properties:
- .data: a numpy array with the features.
- .target: a numpy array with the target labels. Note that each plot is classified into only one of these values.
- .DESCR: describe this forest covertype.

In [37]:
covtype = datasets.fetch_covtype(shuffle=True)
print("Data: ", covtype.data)
print("Data shape: ", covtype.data.shape) # Describe the size of this array

Data:  [[3.066e+03 4.700e+01 9.000e+00 ... 0.000e+00 0.000e+00 0.000e+00]
 [3.106e+03 2.390e+02 1.500e+01 ... 0.000e+00 0.000e+00 0.000e+00]
 [2.975e+03 2.510e+02 1.000e+01 ... 0.000e+00 0.000e+00 0.000e+00]
 ...
 [3.317e+03 1.420e+02 2.700e+01 ... 0.000e+00 1.000e+00 0.000e+00]
 [3.218e+03 9.000e+01 3.000e+00 ... 0.000e+00 0.000e+00 0.000e+00]
 [3.210e+03 7.600e+01 1.700e+01 ... 0.000e+00 0.000e+00 0.000e+00]]
Data shape:  (581012, 54)


In [38]:
print("Target: ", covtype.target)
print("Target shape: ", covtype.target.shape)

Target:  [1 1 2 ... 1 1 2]
Target shape:  (581012,)


In [39]:
print("Description: ", covtype.DESCR)

Description:  .. _covtype_dataset:

Forest covertypes
-----------------

The samples in this dataset correspond to 30×30m patches of forest in the US,
collected for the task of predicting each patch's cover type,
i.e. the dominant species of tree.
There are seven covertypes, making this a multiclass classification problem.
Each sample has 54 features, described on the
`dataset's homepage <http://archive.ics.uci.edu/ml/datasets/Covertype>`__.
Some of the features are boolean indicators,
while others are discrete or continuous measurements.

**Data Set Characteristics:**

    Classes                        7
    Samples total             581012
    Dimensionality                54
    Features                     int

:func:`sklearn.datasets.fetch_covtype` will load the covertype dataset;
it returns a dictionary-like object
with the feature matrix in the ``data`` member
and the target values in ``target``.
The dataset will be downloaded from the web if necessary.



## Training data for training, testing data for testing, and validation data for validation.


In [43]:
# Out of 581,012 data entries, let's hold back:
# - 50,000 records as our validation dataset
# - 250,000 records as our test dataset
# - remaining records as our training dataset

validation_data = covtype.data[0:50_000]
validation_labels = covtype.target[0:50_000]

test_data = covtype.data[50_000:300_000]
test_labels = covtype.target[50_000:300_000]

train_data = covtype.data[300_000:]
train_labels = covtype.target[300_000:]

print("Validation data shape: ", validation_data.shape)
print("Validation labels shape: ", validation_labels.shape)

print("Test data shape: ", test_data.shape)
print("Test labels shape: ", test_labels.shape)

print("Training data shape: ", train_data.shape)
print("Training labels shape: ", train_labels.shape)

Validation data shape:  (50000, 54)
Validation labels shape:  (50000,)
Test data shape:  (250000, 54)
Test labels shape:  (250000,)
Training data shape:  (281012, 54)
Training labels shape:  (281012,)


Our data is ready for processing. But remember that we have a variety of different input types: binary (0, 1), continuous in small ranges (0-255) and in large ranges (elevations). Before we process this data, we should normalize them into a standard range.

In [58]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# scaler = MinMaxScaler()
scaler = StandardScaler()

# Figure out how to scale all the input features in the training dataset.
scaler.fit(train_data)
scaled_train_data = scaler.transform(train_data)
print(scaled_train_data[0])

# Also tranform our validation and testing data in the same way.
scaled_test_data = scaler.transform(test_data)
print(scaled_test_data[0])

scaled_validation_data = scaler.transform(validation_data)
print(scaled_validation_data[0])

[ 0.14495808 -0.44460642 -0.01483259 -1.12871848 -0.95158403 -1.14205634
  1.1539444  -0.06448008 -1.00587464 -0.1625825   1.10680138 -0.23269186
 -0.87887473 -0.26017494 -0.07323182 -0.11580935 -0.09114007 -0.14768141
 -0.0528258  -0.10609663 -0.01178148 -0.01819495 -0.0444448  -0.24423533
 -0.14848449 -0.23403979 -0.17405532 -0.03180653  0.         -0.06989097
 -0.07593357 -0.05607992 -0.08400155 -0.12647098 -0.03850401 -0.24617625
 -0.33199924 -0.1942708  -0.02905327 -0.06761817 -0.04355168 -0.04062432
  2.01133316 -0.23438637 -0.21600486 -0.31576769 -0.28939566 -0.0532979
 -0.05668334 -0.0148553  -0.02310996 -0.1657519  -0.15665416 -0.1247316 ]
[ 0.18782869 -0.77477467  0.11871014  0.32895147 -0.34977489  0.2451638
  0.85484102 -0.77104378 -1.11038191 -0.35210596 -0.90350447 -0.23269186
  1.13781858 -0.26017494 -0.07323182 -0.11580935 -0.09114007 -0.14768141
 -0.0528258  -0.10609663 -0.01178148 -0.01819495 -0.0444448  -0.24423533
 -0.14848449 -0.23403979 -0.17405532 -0.03180653  0.

In [60]:
from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(
    solver='adam', alpha=1e-5,
    hidden_layer_sizes=(100, 20, 20),
    batch_size='auto',
    verbose=True,
    early_stopping=True
)
clf.fit(scaled_train_data, train_labels)

Iteration 1, loss = 0.63656137
Validation score: 0.757277
Iteration 2, loss = 0.52483717
Validation score: 0.780443
Iteration 3, loss = 0.48668358
Validation score: 0.795175
Iteration 4, loss = 0.46069857
Validation score: 0.805530
Iteration 5, loss = 0.44184685
Validation score: 0.808305
Iteration 6, loss = 0.42752549
Validation score: 0.818696
Iteration 7, loss = 0.41688971
Validation score: 0.821863
Iteration 8, loss = 0.40834953
Validation score: 0.824354
Iteration 9, loss = 0.40131312
Validation score: 0.826774
Iteration 10, loss = 0.39367988
Validation score: 0.835599
Iteration 11, loss = 0.38872091
Validation score: 0.831151
Iteration 12, loss = 0.38319309
Validation score: 0.824496
Iteration 13, loss = 0.37791768
Validation score: 0.838624
Iteration 14, loss = 0.37328673
Validation score: 0.845136
Iteration 15, loss = 0.37011426
Validation score: 0.845598
Iteration 16, loss = 0.36716183
Validation score: 0.851149
Iteration 17, loss = 0.36332916
Validation score: 0.846737
Iterat

MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=True, epsilon=1e-08,
       hidden_layer_sizes=(100, 20, 20), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=True, warm_start=False)

In [61]:
clf.score(scaled_validation_data, validation_labels)

0.87518

In [62]:
clf.score(scaled_test_data, test_labels)

0.875876