Write an MNIST classifier that trains to 99% accuracy or above, and does it without a fixed number of epochs -- i.e. you should stop training once you reach that level of accuracy. In the lecture you saw how this was done for the loss but here you will be using accuracy instead.

Some notes:

Given the architecture of the net, it should succeed in less than 10 epochs.
When it reaches 99% or greater it should print out the string "Reached 99% accuracy so cancelling training!" and stop training.
If you add any additional variables, make sure you use the same names as the ones used in the class. This is important for the function signatures (the parameters and names) of the callbacks.

In [5]:
import os
import tensorflow as tf
from tensorflow import keras
# Load the data

# Get current working directory
current_dir = os.getcwd()

# Append data/mnist.npz to the previous path to get the full path
"""
  if data is cached in specified path then Tensorflow will use it instead of downloading to specified location. 
  The _ variable in (x_train, y_train), _ = tf.keras.datasets.mnist.load_data(path=data_path) is used to discard the test set (x_test and y_test) that is normally returned by load_data(). 
"""
data_path = os.path.join(current_dir, "data/mnist.npz")

# Discard test set
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data(path=data_path)
        
# Normalize pixel values
"""
We do Normalizing on input features, which is an important preprocessing step in ML. Reasons why we have to normalize are:
1. Uniform Scale:  Normalizing the data brings all feature values into a similar scale. This is crucial because features with larger scales (e.g., feature A ranges from 0 to 1000 and feature B ranges from 0 to 1) can dominate the learning process and prevent the model from effectively learning from other features.

2. Faster Convergence: Normalization helps the optimization algorithm (e.g., gradient descent) converge faster during training. When data is normalized, the optimization process is more efficient and stable, leading to quicker convergence to the optimal solution.

3. Better Model Performance: Normalization can lead to improved performance of the model. It can help prevent issues like vanishing or exploding gradients, which are common in deep neural networks with large input feature scales.

4. Regularization: Data normalization can be considered a form of regularization. By constraining the range of input values, normalization can reduce overfitting and improve the generalization ability of the model.

5. Handling Different Units: When dealing with features that have different units or magnitudes (e.g., weight in kilograms vs. height in centimeters), normalization ensures that the model treats each feature equally regardless of its original scale.

6. Compatability with Activation Functions: Some activation functions (e.g., sigmoid, tanh) are sensitive to the scale of input values. Normalizing input data to a specific range (e.g., between 0 and 1 or -1 and 1) can improve the performance of these activation functions.

Common Normalization Techniques:
1. Min-Max Scaling (Normalization): Scales the data to a fixed range, typically between 0 and 1 or -1 and 1.
2. Standardization (Z-score Normalization): Centers the data around zero with unit variance.
3. Feature Scaling: Scaling each feature to have a mean of zero and a standard deviation of one.

import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Min-Max Scaling
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


"""
x_train = x_train / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [6]:
#Now take a look at the shape of the training data:
data_shape = x_train.shape

print(f"There are {data_shape[0]} examples with shape ({data_shape[1]}, {data_shape[2]})")

There are 60000 examples with shape (28, 28)


In [7]:
# GRADED CLASS: myCallback
### START CODE HERE

# Remember to inherit from the correct class
class myCallback(tf.keras.callbacks.Callback):
        # Define the correct function signature for on_epoch_end
        def on_epoch_end(self, epoch, logs={}):
            if logs.get('accuracy') is not None and logs.get('accuracy') > 0.99: # @KEEP
                print("\nReached 99% accuracy so cancelling training!") 
                
                # Stop training once the above condition is met
                self.model.stop_training = True

### END CODE HERE

In [8]:
# GRADED FUNCTION: train_mnist
def train_mnist(x_train, y_train):

    ### START CODE HERE
    
    # Instantiate the callback class
    callbacks = myCallback()
    
    # Define the model, it should have 3 layers:
    # - A Flatten layer that receives inputs with the same shape as the images
    # - A Dense layer with 512 units and ReLU activation function
    # - A Dense layer with 10 units and softmax activation function
    model = tf.keras.models.Sequential([ 
        keras.layers.Flatten(input_shape=(28, 28)),
        keras.layers.Dense(512, activation=tf.nn.relu),
        keras.layers.Dense(10, activation=tf.nn.softmax)
    ]) 

    # Compile the model
    model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy', 
                  metrics=['accuracy']) 
    
    # Fit the model for 10 epochs adding the callbacks
    # and save the training history
    history = model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])

    ### END CODE HERE

    return history

In [9]:
hist = train_mnist(x_train, y_train)

  super().__init__(**kwargs)


Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - accuracy: 0.8988 - loss: 0.3401
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 4ms/step - accuracy: 0.9752 - loss: 0.0807
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9841 - loss: 0.0508
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 4ms/step - accuracy: 0.9888 - loss: 0.0341
Epoch 5/10
[1m1869/1875[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 4ms/step - accuracy: 0.9915 - loss: 0.0255
Reached 99% accuracy so cancelling training!
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 4ms/step - accuracy: 0.9915 - loss: 0.0255
