Dataset is called MNIST and it refers to handwritten digit recognition. More here: http://yann.lecun.com/exdb/mnist/
Dataset contain 70 000 images in resolution 28x28 pixels.
Goals: 
- write an algorithm that detects digit
- build a NN with 2 hidden layers 

#### Import packages and loading the data:

In [34]:
import numpy as np 
import tensorflow as tf 
import tensorflow_datasets as tfds 

In [35]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True )
#tfds.load loads a datasets from tensorflow_datasets 
#tfds has a large number of datasets so it need argument name='mnist', this also download the data to computer 
#as_supervide=True loads the dataset in 2-tuple structure (input,target)
# with_info=True info about version and features 




#### Preprocess the data:

In [36]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']
#extract train and test dataset 

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)
#extract 10% from dataset for validation dataset 
#tf.cast(x,dtype) = converts a variable into a given data type 

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

In [37]:
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

#image = tf.cast(image, tf.float32) - make sure that all values are floats 
#image /= 255. image / =255 shades of gray give values from 0 to 1, dot make it to the float 


In [38]:
scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)

#dataset.map(*function*) applies a custom transformation to a given dataset
#it takes input a function which determines the transformation 

#### Shuffle and batch: 

Shuffling the data means that we keeping the data but in a different order. We doing this because it possible that one batch having only "0" or only "1" as it target. 

In [39]:
BUFF_SIZE = 10000
#instruct tf to shuffle 10000 of variable and then take next 100000 and shuffle again because we dealing with big dataset 

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFF_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
#creating the validation dataset

train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


Batch size should be small comparising to the dataset but high enough, this value can be changed to optimise accuracy and speed of learning. 

In [40]:
BATCH_SIZE = 100 
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

#train_data.batch(BATCH_SIZE) - combines the consecutive elements from dataset into batches
#validation_data = validation_data.batch(num_validation_samples) - creating new column in tensor 



Validation dataset must have the same shape and properties as the train and test datasets:

In [41]:
validation_inputs, validation_targets = next(iter(validation_data))
#iter() create an object which can be iterated one element at a time
#next () loads the next element of an iterable object

#### Model: 

- 784 inputs = 28 pixels x 28 pixels 
- 10 outputs (0,1,2,3,4,5,6,7,8 or 9) 
- 2 hidden layers (each size is 50) 


In [42]:
input_size = 784
output_size = 10 
hidden_layer_size = 50 

In [43]:
model = tf.keras.Sequential([
                            tf.keras.layers.Flatten(input_shape=(28,28,1)),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(output_size, activation='softmax')
                            ])

#tf.keras.Sequential() = function that is laying down the model (used to 'stack layers')
#tf.keras.layers.Flatten(input_shape=(28,28,1)) - is for each input 28x28x1 
##tf.keras.layers.Dense(hidden_layer_size, activation='relu') twice because we have two hidden layers 
#activation ='relu' and 'softmax' are activation functions, more: https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

#### Loss function and optimizer: 

In [49]:
model.compile(optimizier='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
#model.compile(optimizer, loss) configures the model for training
#adam - adaptive moment estimation, more: https://towardsdatascience.com/optimisation-algorithm-adaptive-moment-estimation-adam-92144d75e232

Loss = 

- binary_crossentropy - for binary data 
- caterogical_crossentropy - for onehot encoded the targets
- sparse_caterogical_crossentropy - applies one-hot encoding

#### Training: 

In [50]:
number_of_epochs = 5
# its arbitrary value 

In [51]:
VALIDATION_STEPS = num_validation_samples // BATCH_SIZE

In [52]:
model.fit(train_data, epochs = number_of_epochs, validation_data=(validation_inputs, validation_targets), validation_steps = VALIDATION_STEPS, verbose = 2)

Epoch 1/5
540/540 - 92s - loss: 0.0563 - accuracy: 0.9826 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/5
540/540 - 88s - loss: 0.0515 - accuracy: 0.9846 - val_loss: 0.0624 - val_accuracy: 0.9808
Epoch 3/5
540/540 - 73s - loss: 0.0453 - accuracy: 0.9859 - val_loss: 0.0606 - val_accuracy: 0.9807
Epoch 4/5
540/540 - 56s - loss: 0.0426 - accuracy: 0.9868 - val_loss: 0.0598 - val_accuracy: 0.9818
Epoch 5/5
540/540 - 57s - loss: 0.0398 - accuracy: 0.9878 - val_loss: 0.0599 - val_accuracy: 0.9818


<tensorflow.python.keras.callbacks.History at 0x65a3b6810>

values: 
- loss: should be smaller with every epoch 
- val_loss: if the value would be higher than epoch before we have overfitting 
- accuracy: % of the cases where output was the same that target 
- val_accuracy: true accuracy of the model 


Now we can still make the model better by changing for example number of hidden layers. It should have val_accuracy more than 98%

#### Test:

In [53]:
test_loss, test_accuracy = model.evaluate(test_data)



Accuracy is REAL accuracy of our model. Should be lower that accuracy of the validation. Getting a test accuracy very close to the validation accuracy shows that we have not overfit. 

- After this we can't do any changes in model because after this point the test_data will no longer be a data set that the model has never seen.