<a href="https://colab.research.google.com/github/RaeganGutierrez/Intro-Deep-Learning/blob/main/Sequential_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# P1 (40pt): A first look at a neural network

We will now take a look at a first concrete example of a neural network, which makes use of the Python library Keras to learn to classify
hand-written digits. The problem we are trying to solve here is to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10
categories (0 to 9). The dataset we will use is the MNIST dataset, a classic dataset in the machine learning community, which has been
around for almost as long as the field itself and has been very intensively studied. It's a set of 60,000 training images, plus 10,000 test
images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. You can think of "solving" MNIST
as the "Hello World" of deep learning -- it's what you do to verify that your algorithms are working as expected. As you become a machine
learning practitioner, you will see MNIST come up over and over again, in scientific papers, blog posts, and so on.



The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays. Read in the MNIST dataset and print out the shapes of the train and test sets. **(5pt)**

In [1]:
#load libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist

#read in data
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

#print train and test set shapes
train_images.shape
test_images.shape

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


(10000, 28, 28)

The core building block of neural networks is the "layer", a data-processing module which you can conceive as a "filter" for data. Some
data comes in, and comes out in a more useful form. Precisely, layers extract _representations_ out of the data fed into them -- hopefully
representations that are more meaningful for the problem at hand. Most of deep learning really consists of chaining together simple layers
which will implement a form of progressive "data distillation". A deep learning model is like a sieve for data processing, made of a
succession of increasingly refined data filters -- the "layers".

Give this network a sequence of two `Dense` layers, which are densely-connected (also called "fully-connected") neural layers.
The first layer will have 512 neurons, a "ReLU" activiation, and be sure to include the input shape.
The second (and last) layer is a 10-way "softmax" layer, which means it will return an array of 10 probability scores (summing to 1). Each
score will be the probability that the current digit image belongs to one of our 10 digit classes. **(10pt)**

In [2]:
from tensorflow.keras import models
from tensorflow.keras import layers

#add layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

To make our network ready for training, we need to pick three more things, as part of "compilation" step:

* A loss function: the is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be
able to steer itself in the right direction.
* An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.
* Metrics: monitor during training and testing.

Define the appropriate optimizer, loss function, and metrics to compile the NN model. **(10pt)**

In [3]:
#compile data
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

Before training, we will preprocess our data by reshaping it into the shape that the network expects, and scaling it so that all values are in
the `[0, 1]` interval. Previously, our training images for instance were stored in an array of shape `(60000, 28, 28)` of type `uint8` with
values in the `[0, 255]` interval.

Reshape and transform the data into a `float32` array of shape `(60000, 28 * 28)` with values between 0 and 1. **(5pt)**

In [4]:
#reshape data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Once we one-hot encode the labels, we will then be ready to train our network. In Keras, this is done via a call to the `fit` method of the network:
we "fit" the model to its training data.

Transform all labels into their one-hot encoding forms. Fit the model to its training data with 5 epochs and a batch size of 128. **(5pt)**

In [5]:
from tensorflow.keras.utils import to_categorical

#encode data
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

#train data
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x79dc76ac28f0>

We quickly reach an accuracy of 0.989 (i.e. 98.9%) on the training data. Now let's check that our model performs well on the test set too.

Make predictions on the testing dataset and print out the test accuracy. **(5pt)**

In [6]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

test_acc: 0.9811000227928162


Our test set accuracy turns out to be 98.1% -- which is slightly lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of "overfitting", the fact that machine learning models tend to perform worse on new data than on their training data.

# P2 (60pt): Write a Python code in Colab using NumPy, Panda, Scikit-Learn and Keras to complete the following tasks:
1.	Import the Auto MPG dataset, use the attribute names as explained in the dataset description as the column names, view the strings ‘?’ as the missing value, and whitespace as the column delimiter. Print out the shape and first 5 rows of the DataFrame. **(5pt)**

    a.	Dataset source file: http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data

    b.	Dataset description: http://archive.ics.uci.edu/ml/datasets/Auto+MPG

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(100)

tf.random.set_seed(100)

In [None]:
#read in data
auto = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data',
                   header = None,
                   names =["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration",
                           "model year", "origin", "car name"],
                   na_values ='?',
                   delimiter = '\s+')
print(auto.shape)
print(auto.head(5))

(398, 9)
    mpg  cylinders  displacement  horsepower  weight  acceleration  \
0  18.0          8         307.0       130.0  3504.0          12.0   
1  15.0          8         350.0       165.0  3693.0          11.5   
2  18.0          8         318.0       150.0  3436.0          11.0   
3  16.0          8         304.0       150.0  3433.0          12.0   
4  17.0          8         302.0       140.0  3449.0          10.5   

   model year  origin                   car name  
0          70       1  chevrolet chevelle malibu  
1          70       1          buick skylark 320  
2          70       1         plymouth satellite  
3          70       1              amc rebel sst  
4          70       1                ford torino  


2.	Delete the “car_name” column and drop the rows containing NULL values. Print out the shape of the DataFrame. **(5pt)**

In [None]:
#drop null values and 'car name' column
auto2 = auto.drop('car name', axis = 'columns')
auto3 = auto2.dropna()
print(auto3.shape)
print(auto3.info())

(392, 8)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 392 entries, 0 to 397
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           392 non-null    float64
 1   cylinders     392 non-null    int64  
 2   displacement  392 non-null    float64
 3   horsepower    392 non-null    float64
 4   weight        392 non-null    float64
 5   acceleration  392 non-null    float64
 6   model year    392 non-null    int64  
 7   origin        392 non-null    int64  
dtypes: float64(5), int64(3)
memory usage: 27.6 KB
None


3.	For the ‘origin’ column with categorical attribute, replace it with the columns with numerical attributes using one-hot encoding. Print out the shape and first 5 rows of the new DataFrame. **(5pt)**

In [None]:
#one-hot encode the 'origin' column
from enum import auto
auto3 = pd.get_dummies(data = auto3, columns = ['origin'])
print(auto3.head(5))

    mpg  cylinders  displacement  horsepower  weight  acceleration  \
0  18.0          8         307.0       130.0  3504.0          12.0   
1  15.0          8         350.0       165.0  3693.0          11.5   
2  18.0          8         318.0       150.0  3436.0          11.0   
3  16.0          8         304.0       150.0  3433.0          12.0   
4  17.0          8         302.0       140.0  3449.0          10.5   

   model year  origin_1  origin_2  origin_3  
0          70         1         0         0  
1          70         1         0         0  
2          70         1         0         0  
3          70         1         0         0  
4          70         1         0         0  


4.	Separate the “mpg” column from other columns and view it as the label vector and others as the feature matrix. Split the data into a training set (80%) and testing set (20%) and print out their shapes. Print out the statistics of your training feature matrix.  **(5pt)**

In [None]:
from sklearn.model_selection import train_test_split

#create train test split
X = auto3.drop('mpg', axis = 'columns')
y = auto3['mpg']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0, train_size = .8)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

X_train.describe()

(313, 9)
(79, 9)
(313,)
(79,)


Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin_1,origin_2,origin_3
count,313.0,313.0,313.0,313.0,313.0,313.0,313.0,313.0,313.0
mean,5.447284,192.78754,104.009585,2972.255591,15.560383,76.070288,0.623003,0.169329,0.207668
std,1.690263,103.201153,37.915348,841.134947,2.785476,3.660449,0.48541,0.375643,0.406287
min,3.0,68.0,46.0,1613.0,8.0,70.0,0.0,0.0,0.0
25%,4.0,105.0,78.0,2230.0,13.6,73.0,0.0,0.0,0.0
50%,4.0,151.0,95.0,2815.0,15.5,76.0,1.0,0.0,0.0
75%,8.0,260.0,120.0,3574.0,17.0,79.0,1.0,0.0,0.0
max,8.0,455.0,230.0,5140.0,24.8,82.0,1.0,1.0,1.0


5.	Normalize the feature columns in both training and testing datasets so that their means equal to zero and variances equal to one. Describe the statistics of your normalized feature matrix. **(5pt)**




In [None]:
#normalize data with StandardScaler
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.transform(X_test)


X_train = pd.DataFrame(X_train, columns = X.columns)
X_test = pd.DataFrame(X_test, columns = X.columns)

X_train.describe()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin_1,origin_2,origin_3
count,313.0,313.0,313.0,313.0,313.0,313.0,313.0,313.0,313.0
mean,2.5538680000000002e-17,1.4188150000000003e-17,-1.163429e-16,-2.411986e-16,2.447457e-16,-2.965324e-16,3.405157e-17,0.0,-5.675261000000001e-17
std,1.001601,1.001601,1.001601,1.001601,1.001601,1.001601,1.001601,1.001601,1.001601
min,-1.450191,-1.211104,-1.532426,-1.618566,-2.718563,-1.661,-1.285512,-0.451493,-0.5119539
25%,-0.8576193,-0.8520071,-0.6870894,-0.8838584,-0.7049146,-0.8401165,-1.285512,-0.451493,-0.5119539
50%,-0.8576193,-0.4055619,-0.2380042,-0.1872558,-0.02171266,-0.01923264,0.7778999,-0.451493,-0.5119539
75%,1.512666,0.6523191,0.4224152,0.7165414,0.5176573,0.8016512,0.7778999,-0.451493,-0.5119539
max,1.512666,2.544859,3.328261,2.581293,3.322381,1.622535,0.7778999,2.214873,1.953301


6.	Build a sequential neural network model in Keras with two densely connected hidden layers (32 neurons and ReLU activation function for each hidden layer), and an output layer that returns a single, continuous value. Print out the model summary. **(10pt)**

In [None]:
#build NN
from tensorflow.keras import models
from tensorflow.keras import layers

nnmod = models.Sequential()
nnmod.add(layers.Dense(32, activation = 'relu', input_shape = (9,)))
nnmod.add(layers.Dense(32, activation = 'relu'))
nnmod.add(layers.Dense(1))

print(nnmod.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 32)                320       
                                                                 
 dense_1 (Dense)             (None, 32)                1056      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
Total params: 1,409
Trainable params: 1,409
Non-trainable params: 0
_________________________________________________________________
None


7.	Define the appropriate loss function, optimizer, and metrics for this specific problem and compile the NN model. **(10pt)**

In [None]:
#compile model
nnmod.compile(optimizer = 'rmsprop',
              loss = 'mse',
              metrics = ['mae'])

8.	Put aside 20% of the normalized training data as the validation dataset and set verbose = 0 to compress the model training. Train the NN model for 100 epochs and batch size of 32 and plot the training and validation loss progress with respect to the epoch number. **(10pt)**

In [None]:
#train model
nnmod.fit(X_train, y_train, batch_size = 32, epochs = 100, validation_split = 0.2, verbose = 0)

<keras.callbacks.History at 0x7fb2b4171b10>

9.	Use the trained NN model to make predictions on the normalized testing dataset and observe the prediction error. **(5pt)**

In [None]:
#evaluate model
metrics = nnmod.evaluate(X_test, y_test)
print(metrics)

[6.127064228057861, 1.8070756196975708]
