# Small image classification project

### first lets import tensor and keras and confirm versions

In [None]:
import tensorflow as tf
from tensorflow import keras
tf.__version__

In [None]:
keras.__version__

### load and define data set

In [None]:
#we will use MNIST, which is a famous/frequently used training db for images. 
#This version Fashion MNIST includes 70000 grey scale images 28x28 px,10 classes, representing fashion items 
#with keras you can load MNIST, Fashion MNIST, housing data set OOTB 
fashion_mnist = keras.datasets.fashion_mnist

In [None]:
# data set is split for us into train and test
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

In [None]:
#every image is an array of the pixels , stored as integers (0-255)
X_train_full.shape

In [None]:
X_train_full.dtype

In [None]:
from matplotlib import pyplot

In [None]:
#whats in the test data ? lets look at some of the images in a plot
# summarize loaded dataset
print('X_Test: X=%s, y=%s' % (X_test.shape, y_test.shape))
# plot first few images
for i in range(9):
 # define subplot
 pyplot.subplot(330 + 1 + i)
 # plot raw pixel data
 pyplot.imshow(X_test[i], cmap=pyplot.get_cmap('gray'))
# show the figure
pyplot.show()

In [None]:
#next we need to create a validation data set of 5000 images
#& scale the data set to suit the Gradient Descent training method (divide by 255.0)
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
#note, we do not need to scale the dependent variable (ie, what we are going to predict)

### Define the classes of the data set

In [None]:
#we know there are 10 classes, but there are currently no labels in the data set (0-9) 
# as we need labels for the validation lets add them
# its a standard list which you can look up on the keras documentation and kaggle
class_names = [""]

In [None]:
#validate the classes using the images in the above plot 
class_names[y_test[]]

### Create the model (NN) - Sequential API - in this case a single stack of linear layers, connected sequentially

In [None]:
model = keras.models.Sequential() # create model
model.add(keras.layers.Flatten(input_shape=[28, 28])) #first layer - flattens each image into a 1d array (Pre-pr)
model.add(keras.layers.Dense(300, activation="relu")) # add a 300 neuron dense layer
model.add(keras.layers.Dense(100, activation="relu")) # add a second layer of neurons
model.add(keras.layers.Dense(10, activation="softmax")) # add a third layer of 10 neurons (one per class)

# note : dense layers often have tonnes of parameters ( first dense layer = 235500 p's)
# this adds flexibility, but risks overfit if training data not substantial 

Relu : 

In order to use stochastic gradient descent with backpropagation of errors to train deep neural networks, an activation function is needed that looks and acts like a linear function, but is, in fact, a nonlinear function allowing complex relationships in the data to be learned.

The function must also provide more sensitivity to the activation sum input and avoid easy saturation.

The rectified linear activation function is a simple calculation that returns the value provided as input directly, or the value 0.0 if the input is 0.0 or less. ie IF input > 0 return input, else 0 

The rectifier function mostly looks and acts like a linear activation function.

In general, a neural network is easier to optimize when its behavior is linear or close to linear.

Softmax : 
    
Recall that logistic regression produces a decimal between 0 and 1.0. For example, a logistic regression output of 0.8 from an email classifier suggests an 80% chance of an email being spam and a 20% chance of it being not spam. Clearly, the sum of the probabilities of an email being either spam or not spam is 1.0.

Softmax extends this idea into a multi-class world. That is, Softmax assigns decimal probabilities to each class in a multi-class problem. Those decimal probabilities must add up to 1.0. This additional constraint helps training converge more quickly than it otherwise would.

Softmax is implemented through a neural network layer just before the output layer. The Softmax layer must have the same number of nodes as the output layer.

### Compile the model - optional to specify metrics for compute in training and evaluation stage

In [None]:
model.compile(loss="sparse_categorical_crossentropy", # we have few labels and they are exclusive (0-9, trousers/ankle boots)
optimizer="sgd", #using stochastic gradient descent to train model
metrics=["accuracy"]) # as its a classifer, we are interested in accuracy 


### train and evaluate the model

In [None]:
history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid))

In [None]:
# at each stage we can see the number of instancves processed, the mean training time per sample
#plus the loss and accuracy or any other metrics requested in the last stage 

# your computer is likely working hard at this point
# - note that for complex models, more CPU is needed than an average computer can supply

In [None]:
# we can even plot the learning curve 

import pandas as pd
from matplotlib import pyplot as plt
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1) # set the vertical range to [0-1]
plt.show()
# training and val accuracy slowly increase, training and val loss steadily decrease 
# if you arent satisfied with model performance, go back and tune parameters - eg no of layers, neruons, activation fx, epochs, batch size

In [None]:
#evaluate on test set to see accuracy 

### model predictions