## In-class exercise: implement "accuracy" - number of images correctly labeled

## In-class exercise: split the training data into training & validation, and track validation loss during the training loop. 

Tip: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- How do we know when to stop training? For example, you might stop when the validation loss stops improving.
- Now that we have the basics, we can experiment with more complicated networks. Rather than implement these all by hand, we will move to using existing Python packages next week.
- There are variants of "universal approximation theorems" roughly stating that there exists a nonlinear neural network with one hidden layer (possibly very wide) can fit an "arbitrary" nice/smooth function arbitrarily well. However, we can make the optimization easier with fancier layers than "fully connected," like convolutional layers, which we will learn about next week. 

## Homework: improve the accuracy of this model. 

Update this notebook so that the accuracy is improved. How high can you get it? You could change things directly in the notebook, such as increasing the number of epochs, changing the learning weight, changing the width of the hidden layer, etc. If you're more ambitious, you could also try changing the model definition itself by checking out the associated Python files. For example, you could add more layers to the network. The current notebook has a training accuracy of about 43%, but will vary with randomness.

In [None]:
# !git clone https://github.com/AlexLeeSmith/ai-science-training-series.git
# %cd ai-science-training-series/02_neural_networks_python/

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from numpy import power, argmax, float32, prod, int32, arange, zeros, random
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from fc_net import TwoLayerNet

def calc_decay(t, y0, decayRate: float):
    """
    Calculates the value at a specified time point of an exponential decay function.

    Side Effects: None
    """
    return y0 * power(1 - decayRate, t)

def sgd(model, gradients, learningRate: float):
    """
    Performs one iteration of stochastic gradient decent (SGD) for each 
    parameter in the model.

    Side Effects:
    - The models weights will be changed.
    """
    # Iterate over each parameter in the model.
    for key, weights in model.params.items():
        # Get the adjusted weights (gradients) for the current key.
        dw = gradients[key]

        # Apply these gradients to the current weights.
        model.params[key] = weights - learningRate * dw

    return model

def calc_accuracy(model, x, trueValues):
    """
    Calculates the percentage of matching values between x and trueValues.

    Side Effects: None
    """
    # Get the scores of each x input to each output class.
    scores = model.loss(x) # [x.shape[0], model.num_classes]

    # Select the output class with the highest score for each x input
    predictions = argmax(scores, axis=1)

    # Calculate the percentage of correct predictions.
    accuracy = (trueValues == predictions).sum() / predictions.shape[0]
    
    return accuracy


# ===================================================================== #


learningRate = 0.7      # Initial rate at which the model learns from each batch.
decayRate = 0.005       # Rate at which the learning rate decays.
numHiddenDim = 300      # Number of nodes in the hidden layer.
modelWeightScale = 0.1  # Standard deviation for random initialization of the model weights.
numEpochs = 5           # Number of times the entire dataset will be iterated over.
# delta = 2e-3
batchSizeTrain = 10000  # Size of each training data batch.

# Load and split the MNIST data set.
(xTrain, yTrain), (xTest, yTest) = mnist.load_data()

# Do some pre-processing on the images.
## Convert pixel values from integer to float32.
xTrain = xTrain.astype(float32)
xTest  = xTest.astype(float32)

## Normalize the pixels to be within 0 and 1.
xTrain /= 255.
xTest  /= 255.

## Flatten each image to a vector.
xTrain = xTrain.reshape(xTrain.shape[0], prod(xTrain[0,:,:].shape)) # [60000, 28 * 28]
xTest = xTest.reshape(xTest.shape[0], prod(xTest[0,:,:].shape))

## Set the correct classifications as integers.
yTrain = yTrain.astype(int32) # [60000,] => (0, 1, ..., or 9)
yTest  = yTest.astype(int32)

# Split the training data into training and validation.
xTrain, xValid, yTrain, yValid = train_test_split(xTrain, yTrain, test_size=0.1)

# One-hot encoding.
nbClasses = 10
yTrainOnehot = to_categorical(yTrain, nbClasses)
yValidOnehot = to_categorical(yValid, nbClasses)
yTestOnehot = to_categorical(yTest, nbClasses)

# Calculate batches.
batchesPerEpoch = int(xTrain.shape[0] / batchSizeTrain)
batchSizeValid = int(xValid.shape[0] / batchesPerEpoch)

# Generate necessary arrays.
indicesTrain = arange(xTrain.shape[0])
lossesTrain = zeros(batchesPerEpoch * numEpochs,)
accTrain = zeros(batchesPerEpoch * numEpochs,)

indicesValid = arange(xValid.shape[0])
lossesValid = zeros(batchesPerEpoch * numEpochs,)
accValid = zeros(batchesPerEpoch * numEpochs,)

# Generate the model as a two-layer network.
model = TwoLayerNet(input_dim=xTrain.shape[1], hidden_dim=numHiddenDim, num_classes=nbClasses, weight_scale=modelWeightScale)

# Iterate over the training data to update the model's weights.
batchCount = 0
for curEpoch in range(0, numEpochs):
    for curBatch in range(0, batchesPerEpoch):
        # Get the next batch of data (training and validation).
        offsetTrain = curBatch * batchSizeTrain
        batchRangeTrain = range(offsetTrain, offsetTrain + batchSizeTrain)
        xTrainBatch = xTrain[batchRangeTrain, :]
        yTrainBatch = yTrain[batchRangeTrain]
        yTrainOnehotBatch = yTrainOnehot[batchRangeTrain,:]

        offsetValid = curBatch * batchSizeValid
        batchRangeValid = range(offsetValid, offsetValid + batchSizeValid)        
        xValidBatch = xValid[batchRangeValid, :]
        yValidBatch = yValid[batchRangeValid]
        yValidOnehotBatch = yValidOnehot[batchRangeValid,:]
        
        # Calculate the losses.
        lossTrain, gradientsTrain = model.loss(xTrainBatch, yTrainOnehotBatch)
        lossValid, gradientsValid = model.loss(xValidBatch, yValidOnehotBatch)

        # Update the model's weights.
        model = sgd(model, gradientsTrain, learningRate)

        # Decay the learning rate.
        learningRate = calc_decay(batchCount, learningRate, decayRate)

        # Save the losses and accuracies.
        lossesTrain[batchCount] = lossTrain
        accTrain[batchCount] = calc_accuracy(model, xTrainBatch, yTrainBatch)

        lossesValid[batchCount] = lossValid
        accValid[batchCount] = calc_accuracy(model, xValidBatch, yValidBatch)

        # Increment the batch count.
        batchCount += 1
    
    # Shuffle the data so that we get a new set of batches.
    random.shuffle(indicesTrain)
    xTrain = xTrain[indicesTrain,:]
    yTrain = yTrain[indicesTrain]
    yTrainOnehot = yTrainOnehot[indicesTrain,:]

    random.shuffle(indicesValid)    
    xValid = xValid[indicesValid,:]    
    yValid = yValid[indicesValid]    
    yValidOnehot = yValidOnehot[indicesValid,:]
  
# Output the results.
print("Loss: %.4f\nAccuracy: %.4f" % (lossesTrain[batchCount - 1], accTrain[batchCount - 1]))
fig, ax = pyplot.subplots(1, 2, figsize=(18, 6), dpi=80) # Create a 1 by 2 plot grid.

## Plot the loss history.
ax[0].set_xlabel('Batch #')
ax[0].set_ylabel('Loss')
ax[0].plot(lossesTrain, label='Training')
ax[0].plot(lossesValid, label='Validation')
ax[0].legend()

## Plot the accuracy history.
ax[1].set_xlabel('Batch #')
ax[1].set_ylabel('Accuracy')
ax[1].plot(accTrain, label='Training')
ax[1].plot(accValid, label='Validation')
ax[1].legend()
pass

: 

: 