<a href="https://colab.research.google.com/github/blue-slushy9/py-tensorflow-tutorials/blob/main/keras_beginners.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import tensorflow as tf
print('TensorFlow version:', tf.__version__)

# MNIST refers to the Modified National Institute of Standards and Technology
# database, it is a large database of handwritten digits that is commonly used
# for training various image processing systems, particularly in the field of
# machine learning; the below command is used to load the module and access the
# MNIST dataset;
mnist = tf.keras.datasets.mnist

# The returned values in the ()'s are tuples, x_train & x_test are arrays
# containing the images, and y_train & y_test are arrays containing the labels
(x_train, y_train), (x_test, y_test) mnist.load_data()

# Performs data normalization on the pixel values of the images in the training
# and testing datasets; dividing each pixel value by 255.0, the pixel values in
# typical image datasets like MNIST range from 0 to 255, representing the
# intensity of the pixel (0 for black, 255 for white); dividing by 255 scales
# these values to the range [0, 1], which is often beneficial for training
# neural networks. Normalization helps in ensuring that the features
# (pixel values) are on a similar scale, which can help improve convergence
# during training; it's important to apply the same preprocessing steps to the
# testing data as to the training data to ensure consistency and fair
# evaluation of the model's performance;
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build a tf.keras.Sequential model:
# Sequential is a type of model that represents a linear stack of layers, it is
# a straightforward way to build neural networks where each layer has exactly
# one input tensor and one output tensor; 
# 1) create a Sequential model object;
model = tf.keras.models.Sequential([
    # 2) Add layers to it one by one, these can be instances of various layers
    # provided by Keras; e.g. Flatten, which converts multi-dimensional data
    # into a one-dimensional array; input_shape value indicates images with 
    # dimensions 28x28, which are the dimensions of the images in MNIST;
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    # 3) Each layer added to the Sequential model gets stacked one after the
    # other, forming a linear stack of layers; Dense is a fully connected 
    # neural network layer, i.e. every neuron in the current layer is connected
    # to every neuron in the previous layer, and each connection has its own
    # weight parameter that will be learned during training; 128 is the number
    # of neurons; the activation function is ReLU (Rectified Linear Unit), i.e.
    # it returns the input directly if it's positive and returns zero otherwise;
    tf.keras.layers.Dense(128, activation='relu'),
    # Dropout is a regularization technique used to prevent overfitting in
    # neural networks; overfitting occurs when a model learns to memorize the
    # training data instead of learning the underlying patterns, resulting in
    # poor generalization to unseen data; dropout helps address this issue by
    # randomly dropping (i.e. setting to zero) a fraction of input units during
    # training, which forces the network to reduce reliance on any individual
    # neuron; a dropout value of 0.2 means 20% of the input units will be
    # randomly set to zero during training;
    tf.keras.layers.Dropout(0.2),
    # If no activation function is specified, the Dense layer will apply a
    # linear activation function to the output of each neuron by default, i.e.
    # the output of each neuron will be a weighted sum of its inputs without any
    # non-linearity applied to it;
    tf.keras.layers.Dense(10)
])

# Performs inference using the trained model on a single input data point from
# the training set x_train; x_train[:1] selects the first data point from the
# training set x_train, it is common to use slices of datasets for testing;
# model(x_train[:1]) passes the selected input data point to the model for
# inference; in Keras, models are callable, so you can directly pass input data
# to a model instance to obtain predictions; .numpy() converts the output of
# the model (TensorFlow tensor) into a NumPy array; the numpy() method is used
# here to extract the actual values of the predictions for further processing;
predictions = model(x_train[:1]).numpy()
# Simply evaluates the variable predictions in the current context and outputs
# its value; in Python, when you type a variable name in the interactive 
# interpreter or in a Jupyter Notebook cell without assigning it to anything,
# the interpreter or notebook will display the value of that variable if it's
# available in the current context; i.e. same thing as 'print(predictions)';
predictions

# Converts the above logits to probabilities for each class; applies the
# softmax function to the array of predictions and converts the result into a
# NumPy array; softmax is an activation function, predictions is the input;
# softmax is commonly used in classification problems to convert raw prediction
# scores into probabilities, it squashes the input values between 0 & 1 and 
# normalizes them so they add up to 1;
tf.nn.softmax(predictions).numpy()

# Define a loss function for training, the loss function takes a vector of
# ground truth values and a vector of logits and returns a scalar loss for each
# example; this loss is equal to the negative log probability of the true class,
# the loss is zero if the model is sure of the correct class; tf.keras...entropy
# creates an instance of the SCC loss function class provided by TF's Keras API,
# this loss function is commonly used for multi-class classification tasks where
# the target labels are integers; from...True specifies that the input to the
# loss function ('logits') are unnormalized prediction scores (logits) rather
# than probabilities; from...True specifies for the loss function to internally
# apply the softmax functions to the input logits before computing cross-entropy
# loss;
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Calling the loss_fn() defined above with arguments, it represents the loss
# function to be ued for computing the loss between the true labels and the
# predictions; y_train[:1] is the true labels for a single training example
# (usually the ground truth labels), predictions represents the predicted
# labels for the same example; 
loss_fn(y_train[:1], predictions).numpy()

# Configure and compile the model for training; optim...'adam' specifies the
# optimizer to be used during training, Adam is a popular choice for training
# neural networks due to its adaptive learning rate properties and efficiency;
# loss function we are using was defined above; metrics=['accuracy'] specifies
# the evaluation metrics to be using during training and validation, in this
# case accuracy is the metric we will monitor during training, which computes
# the accuracy of the model on the training data; additional metrics can be
# added as needed; 
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

# 4) Once the model is constructed, you compile it by specifying the loss
# function, optimizer, and any evaluation metrics; then you can train the model
# on your data using the fit() method;

# Adjust model parameters and minimize loss; model we defined earlier consists
# of input layers, hidden layers, an output layer, and has been compiled with
# an optimizer, loss function, and possibly additional metrics; fit() is a
# method provided by Keras models for training models on a given dataset, the
# fit method iteratively trains the model using the specified training data for
# a certain number of epochs; again, x_train is input training data and y_train
# is target training data containing the corresponding labels or target values
# for the input samples in x_train; an epoch is a complete pass through the
# entire training dataset, in this case 5; during training, the model will
# adjust its weights and biases based on the training data and the optimization
# algorithm to minimize the loss function and improve its performance on the
# given task, e.g. classification or regression; after training completes, the
# model will have learned patterns and relationships in the training data that
# enable it to make predictions on new, unseen data;
model.fit(x_train, y_train, epochs=5)

# Checks the model's performance, usually on a separate validation set or test 
# set than what it was trained on; the Keras evaluate method computes the loss
# value and any specified metrics on the test data and returns them as output;
# verbose controls the verbosity of the evaluation process, 2 means progress
# bars will be displayed during evaluation, with one progress bar per epoch;
# evaluate returns a list of test results;
model.evaluate(x_test, y_test, verbose=2)

# 5) After training, you can use the model to make predictions on new data
# using the predict() method; the difference between evaluate() and predict()
# is that evaluate is used for assessing the model's performance on a dataset
# by computing loss and metrics, while predict() is used for obtaining
# predictions from the model for a given input dataset;

# If you want your model to return a probability, you can wrap the trained
# model and attach the softmax to it; defines a new Keras model called
# prob...model composed of two sequential layers: model (which we defined above) 
# and the Softmax layer, which converts the raw output (logits) into
# probabilities; it applies the softmax activation function to the output of
# the previous model's layers, which normalizes the output into a probability
# distribution over the predicted classes; each output neuron's activation
# represents the probability that the input belongs to the corresponding class;
# the Softmax layer is typicall used in classification tasks to obtain class
# probabilities; this type of model is useful when you want to obtain the class
# probabilities instead of raw scores or when you want to use the model for
# inference on new data where class probabilities are needed;
probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])
# i.e. print(probability_model(x_test[:5]))
probability_model(x_test[:5])



TensorFlow version: 2.15.0
