<a href="https://colab.research.google.com/github/KaosElegent/tensorflow-fun/blob/main/tfTutorial1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.15.0


Refer to the following link for terminology reference: https://developers.google.com/machine-learning/glossary

load_data() or train_test_split(), etc. functions split the data into 4 parts: x_train, y_train, x_test, y_test.

X stands for the independent variables (x1, x2, x3...)
Y stands for the dependent variable (eg. the class we are predicting)

In [3]:
mnist = tf.keras.datasets.mnist

# Loading MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Converting the 0-255 int values to 0-1 float
x_train, x_test = x_train / 255.0, x_test / 255.0

1) Why Sequential?
- Sequential models are useful for stacking layers where each layer has 1 input tensor and 1 output tensor.

2) What does the layer do?
- Alayer is basically a function with a known mathematical structure. This structure is trainable and canbe reused.

3) What layers are used in this image classification model?
- This model uses 'Flatten', 'Dense' and 'Dropout' layers.

4) What does this stack of layers return?
- The model returns a 'vector of logits or log-odds scores'.
More info can be found here: https://developers.google.com/machine-learning/glossary#logits

In [4]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

In [12]:
# Test code
sample_predictions = model(x_train[:1])
sample_predictions

<tf.Tensor: shape=(1, 10), dtype=float32, numpy=
array([[ 0.53169596, -0.92048705, -0.4032669 ,  0.53544474,  0.39081994,
        -0.6868503 ,  0.757062  , -0.45413458, -0.5543986 , -0.8083941 ]],
      dtype=float32)>

1) What are the numbers above?
- They are called logits.

2) What are logits?
- Logits are the vector of raw (non-nomralized) preditions that a classification model generates. Normally these are thus passed to a normalization function.

3) Are logits always fed to a normalization function?
- No. If the model is solving a multi-class classification problem, logits typically becomean input to the 'softmax' function. This functiongenerates a vector of normalized probabilities with 1 value for each possibile class.

**Note: It is possible to incorporate the softmax function directly into the activation function of the neurons in the last layer of the network. While this canmake the model's output easier to interpret directly (without doing the following step), it is discouraged. This is ecause it is impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.**

In [13]:
# Test code
tf.nn.softmax(sample_predictions)

<tf.Tensor: shape=(1, 10), dtype=float32, numpy=
array([[0.16611472, 0.03888061, 0.06521671, 0.16673861, 0.14428675,
        0.04911342, 0.2081054 , 0.06198225, 0.05606905, 0.0434925 ]],
      dtype=float32)>

## Loss Function
Now that we have a model, we need to train it. To do so, we'll need to use a loss function. For this model, we'll use: losses.SparseCategoricalCrossentropy().

1) What are this loss function's inputs and outputs?
- The loss function takes a 'vector of ground truth values' and a 'vector of logits'. It then returns a 'scalar loss' for each example we provide.

2) What does this scalar loss represent mathematically?
- This loss is equal to the 'negative log probability' of the true class. If the model is sure of the correct class, the loss becomes 0.

In [19]:
loss_fn= tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# testing the loss function on the untrained model
loss_fn(y_train[0:1], model(x_train[:1])) # the 2nd parameter is just sample_predictions

<tf.Tensor: shape=(), dtype=float32, numpy=3.0136228>

Although have now have our data, model and loss function, before we start training, we still need to configure and compile the model (last step).

In [20]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

## Training

In [21]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7ddbca9b3df0>

## Evaluate

In [22]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 1s - loss: 0.0726 - accuracy: 0.9781 - 746ms/epoch - 2ms/step


[0.07257018238306046, 0.9781000018119812]

## Probability Output

In [23]:
probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])
probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[4.18566382e-09, 7.25947302e-09, 1.97881172e-06, 2.54628430e-05,
        9.27672609e-12, 2.29215942e-07, 9.29694554e-14, 9.99971986e-01,
        3.75240745e-08, 2.39933058e-07],
       [1.71039574e-07, 5.38491295e-05, 9.99937415e-01, 8.15597105e-06,
        2.18755767e-15, 3.20579034e-07, 1.87809999e-08, 8.62294351e-12,
        9.73333556e-08, 2.36361935e-14],
       [8.10277743e-06, 9.98400152e-01, 3.88404209e-04, 1.08080267e-05,
        8.70907661e-06, 4.14348915e-06, 1.85158606e-05, 8.72604491e-04,
        2.87413772e-04, 1.16955778e-06],
       [9.99435484e-01, 3.03244252e-09, 3.43543652e-04, 1.69965162e-07,
        2.49371897e-06, 1.18702355e-05, 9.62124323e-05, 9.37564691e-05,
        2.49177656e-08, 1.64682642e-05],
       [1.40902125e-06, 4.72845541e-09, 6.21301115e-06, 2.97799630e-07,
        9.98634279e-01, 1.19562992e-05, 1.47840012e-06, 3.38060963e-05,
        5.25195787e-07, 1.30996923e-03]], dtype=float32)>