## TensorFlow

TensorFlow is an open-source platform developed by Google for machine learning and deep learning tasks. It allows developers to build and train models using both high-level APIs (like Keras) and low-level operations.

üß† Why TensorFlow for AI Research?

Scalable across CPUs, GPUs, and TPUs

Integrated Keras API for quick prototyping

Production-ready (used by Google internally)

Tools like TensorBoard, TensorFlow Lite, TensorFlow.js, and more for deployment

In [None]:
%pip install tensorflow tensorflow-datasets tensorboard
%pip install keras
%pip install keras-nlp

Verify Metal for MACOS

In [None]:
import tensorflow as tf

# Check if TensorFlow detects the GPU
print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print("GPU Devices:", tf.config.list_physical_devices('GPU'))

# Check if TensorFlow is running on Metal
tf.debugging.set_log_device_placement(True)

# Force TensorFlow to run on GPU if available
device = "/GPU:0" if tf.config.list_physical_devices('GPU') else "/CPU:0"
print("Using device:", device)

In [None]:
# Tf import to install use: pip install tensorflow
import tensorflow as tf

""" 
Tersors
a tensor is a multi-dimensional array that can be used for various computations. 
conastants are immutable tensors, meaning their values cannot be changed after they are created.
"""
x = tf.constant([[1,2], [3,4]]) # you must wrap the values in a list the outer brackets signify the tensor, this is a 2D tensor
print(x) # prints the tensor + shape = (2,2) i.e 2 rows and 2 columns and dtype = int32 i.e 32 bit integer

""" 
Computational Graph
A computational graph is a way to represent the operations and data flow in a TensorFlow program.
for example, if you want to add two tensors, you can create a computational graph that represents the addition operation.
EX cont: say you have two tensors A and B, and you want to add them together. A= [[1,2],[3,4]] and B=[[5,6],[7,8]]
the computational graph to add them will have two nodes, one for each tensor, and an edge that represents the addition operation.
The graph will look like this:
A ----> + ----> C
       ^
       |
       B
where A and B are the input tensors, + is the addition operation, and C is the resulting tensor after the addition operation.

- real life graph 


"""

# Create a computational graph
x = tf.constant([[3, 4]])
y = tf.constant([[5, 6]])
print(x+y) # prints the result of the addition operation = [[ 8 10]] as 3+5 = 8 and 4+6 = 10
# unlike TF1, TF2 does not require you to create a session to run the graph.
# a session is a way to execute the graph and get the result.
# in TF2, the graph is executed immediately when you run the operation.
# tf functions are used to create a graph and run it in a session. tf functions are used to create a graph and run it in a session. it compiles functions into a static graph and runs them in a session. (better performance)
# EX: lets use the function decorator to create a graph and run it in a session. 
@tf.function
def add_tensors(a, b):
    return a * b # multipling as also a graph operation (computational graph)
print(add_tensors(x, y)) # prints the result of the addition operation = [[ 15 24]] as 3*5 = 15 and 4*6 = 24

""" 
Building a Model
a model is a collection of layers that are connected together to form a neural network.
each layer is a function that takes an input tensor and produces an output tensor.
the input tensor can be for ex an image's grayscale value, and the output tensor will be some values that represent the image's features. 
this is then passed to the next layer, and so on, until the final output tensor is produced. which is the final prediction of the model.

- We use keras to build a model, keras is a high-level API for building and training deep learning models.
- from keras we can get the Sequential model, which is a linear stack of layers. the Sequential model is a simple way to build a model by adding layers one by one.
- This means that the output of one layer is the input to the next layer. sequential models are easy to use and understand, and they are a good choice for most applications.
- the end result is a model that can be trained on data and used to make predictions.

- We also use Dense layers, which are fully connected layers. meaning every neuron in the layer is connected to every neuron in the previous layer.
- this is not how all neural networks work, but it is a common way to build a model.
- The Dense layer takes an input tensor and produces an output tensor by applying a linear transformation to the input tensor.

- So to sum, Sequential model is a linear stack of layers, and Dense layers are fully connected layers we combine then to build a Stack of layers that are fully connected to each other.
"""
# to install keras, run the following command in the terminal: pip install keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

""" 
Here we first define the sequential model, then we add a Dense layer which creates a fully connected layer
the input layer has 784 neurons and the output layer has 10 neurons. this can be a image which is 28x28 pixels = 784 pixels
and outputs a 10 class prediction. 0-9 for a digit classification problem.

relu is a activation function that is used to introduce non-linearity in the model.
we cannot use a linear activation function because the model will not be able to learn complex patterns in the data.
Foe EX the XOR problem: XOR gate is as follows: 
XOR gate is a gate that outputs 1 if both inputs are different and 0 if both inputs are the same. in a table: 
0 0 0
0 1 1
1 0 1
1 1 0
on a plot they look like:
1| 1   0
 | 
0| 0   1
 |______
   0   1
There is no way the separate the two classes with a straight line. i.e you cannot set apart true and flase values with a linear function.
But using a function like signmoid or relu, we can separate the two classes with a curve beacuse there not a straight line they can bend the curve to separate the two classes.

Softmax is a activation function that is used to convert the output of the model into a probability distribution.
For EXample, if the model outputs a tensor with 10 values, softmax will convert it into a tensor with 10 values that sum to 1.
meaning no matter how many outputs we have each output will be a probability and all the probabilities will sum to 1.
Each nueron in the output layer will represent a class, and the value of the neuron will represent the probability of that class. this probability is 0-1
what is a class: a class is a group of objects that are similar to each other in some way. for example in the image prediction problem
we have 10 classes, one for each digit from 0 to 9. the model will output a tensor with 10 values, one for each class.
then the softmax function will convert the output into a probability distribution so for example if the model outputs [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
this means the digits 0-9 has the probabilities of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 respectively. where the model will choose 8 as the prediction.

ReLU (Rectified Linear Unit) sets all negative values to zero and keeps positive values unchanged.
In a simple NN, it adds non-linearity, helping the model learn complex patterns instead of just straight lines.
Mathematically:
ReLU(x)=max‚Å°(0,x)
ReLU(x)=max(0,x)
‚úÖ Keeps positives ‚Üí same
‚úÖ Turns negatives ‚Üí 0
"""
model = Sequential([
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(10, activation='softmax')
])

""" 
compile and train
- in the complile step, we define the loss function, optimizer, and metrics to be used during training.
- in the training step, we fit the model to the data and train it for a number of epochs.

in our ex wedifine some data first repersenting the 784 pixels of a image and the labels are the classes of the digits.
so 784 random inputs and a lable for each input (both are random and generated a 1000 times)

then we compile our model using the adam optimizer and sparse_categorical_crossentropy loss function.
- the loss function is used to measure how well the model is performing (0-infinite). for Example, if the model is predicting the wrong class, the loss function will return a high value.
  and vice versa adam is a popular optimizer that is used for most problems. sparse_categorical_crossentropy is used for multi-class classification problems like out digit classification problem.
  optimizer is used to update the weights of the model during training. metrics (accuracy) (0-1 or 0-100%) are used to measure the performance of the model during training. we use accuracy as our metric measuring how well the model is performing.
  loss vs metrics: loss is used to measure how well the model is performing, while metrics are used to measure the performance of the model during training.

Then we fit the model to the data and train it for 10 epochs.
- we define our data and labels using numpy and do 10 epochs of training.
  a epoch is one pass through the entire dataset for ex if we have 1000 samples of images, then 1 epoch is 1000 samples. 
"""
# def some data
import numpy as np
train_data = np.random.rand(1000, 784) # 1000 samples of 784 features for a image, this generates 784 random values each 784 values are in one row the other 999 rows are the other samples
# the data looks like 
# [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 ..... 784 values],
#   [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 .....784 values],
#   ...... 1000 rows]
# this denotes 100 samples of 784 features for a image
train_labels = np.random.randint(10, size=(1000,)) # 1000 samples of labels from 0 to 9 i.e the classes each lable in in a row each 784values row is paired with a label row i.e just one valie 0-9

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=10)
# Results : from the last epoch we get the loss and accuracy of the model: accuracy: 0.2501 - loss: 2.1262 our accuracy is bad but the data is random so we cannot expect a good accuracy.

""" 
Model Subclassing
- Model subclassing is a way to create custom models in TensorFlow by subclassing the Model class
- this means we define our own model by creating a class that inherits from the Model class.
- so we can have as many layers as we want and define our own forward pass which is the way the data flows through the model.
- this is good for custom models that are not linear stacks of layers.

in our Example the model has only two layers, but we can add as many layers as we want.
the frist layer is 64 neurons and the second layer is 10 neurons. we pass the inputs through the first layer and then pass the output of the first layer through the second layer.
and we return the output of the second layer as the final output of the model.
"""
class MyModel(tf.keras.Model): # inheriting from the Model class
    def __init__(self): # constructor
        super(MyModel, self).__init__() # call the constructor of the parent class so initialize the model
        self.dense1 = tf.keras.layers.Dense(64, activation='relu') # first layer 64 neurons + relu activation
        self.out = tf.keras.layers.Dense(10) # output layer 10 neurons

    def call(self, inputs): # forward pass
        x = self.dense1(inputs) # pass the inputs through the first layer
        return self.out(x) # pass the output of the first layer through the output layer

""" 
Model evaluation and prediction
- after training the model, we can evaluate the model on the test data to see how well it performs.

- the evaluate method takes the test data and labels as input and returns the loss and accuracy of the model on the test data.
- the predict method takes the test data as input and returns the predicted labels of the model on the test data.
"""
model.evaluate(train_data, train_labels)
new_data = np.random.rand(10, 784) # 10 samples of 784 features for a image this will be the new unseen data
model.predict(new_data) 
# output will be a 10x10 matrix of probabilities for each class
#EX run
""" 
array([[0.07370737, 0.06734557, 0.09007742, 0.09084072, 0.15472364,
        0.10088342, 0.15501715, 0.0846708 , 0.06713989, 0.11559402],
       [0.13559042, 0.08671413, 0.06254287, 0.10968324, 0.10897283,
        0.06311018, 0.140859  , 0.08376517, 0.05650004, 0.1522621 ],
       [0.06658942, 0.06215975, 0.12574631, 0.08794341, 0.24458086,
        0.15032104, 0.03119767, 0.06260405, 0.07711761, 0.09173983],
       [0.18518034, 0.03801233, 0.10225594, 0.05082671, 0.14458576,
        0.08729083, 0.1261759 , 0.06465442, 0.07568176, 0.12533602],
       [0.25580242, 0.0704459 , 0.09154338, 0.12910502, 0.15939018,
        0.06951969, 0.07261899, 0.05154628, 0.06342862, 0.03659946],
       [0.16323559, 0.06115254, 0.11572555, 0.0481673 , 0.2106075 ,
        0.08192023, 0.08052816, 0.06937485, 0.05051486, 0.11877337],
       [0.08927758, 0.09356026, 0.07743935, 0.04602603, 0.05058305,
        0.2109955 , 0.1288731 , 0.13315801, 0.12589532, 0.04419181],
       [0.12444694, 0.21631704, 0.09856055, 0.12358341, 0.12699795,
        0.04524506, 0.07598381, 0.09751168, 0.03933394, 0.0520197 ],
       [0.14829713, 0.12787852, 0.04917813, 0.08518186, 0.20955724,
        0.06541336, 0.12108779, 0.07558911, 0.05305878, 0.06475811],
       [0.09848133, 0.1336912 , 0.06284648, 0.1111226 , 0.17866668,
        0.10355557, 0.10615086, 0.0537717 , 0.06240985, 0.08930375]],
      dtype=float32)
      
This has 10 rows and 10 columns, each row is a sample and each column is a class.
for Example the first row is the prediction for the first sample, and the first column is the probability of class 0.
in simple tearms for the number 0 the model is 7% sure it is a 0, 6% sure it is a 1, 9% sure it is a 2, and so on.
the second row then dose this again fro numbers 0-9 then dose it again for the third row and so on.
since we had 10 samples, we have 10 rows and 10 columns for image 0-9 10 classes.
"""

""" 
Handling Missing Data, a few ways to handle missing data:
- Drop the rows with missing data
- Fill the missing data with a value
- Fill the missing data with the mean, median, or mode of the column
- Fill the missing data with the previous or next value in the column (forward or backward fill)
- Fill the missing data with a value from another column
- Fill the missing data with a value from a different dataset
- Fill the missing data with a value from the sorrounding data (interpolation)
"""
x = tf.constant([1.0, float('nan'), 2.0, float('nan')]) # creating a tensor with missing data
x_clean = tf.where(tf.math.is_nan(x), tf.zeros_like(x), x) # replace the missing data with 0
print(x_clean)  # [1.0, 0.0, 2.0, 0.0] 


""" 
Tensor Boards
- TensorBoard is a tool for visualizing the training process of a model.

The ex below will save the running data of the model to a log directory, and then we can use TensorBoard to visualize the data.
"""
from tensorflow.keras.callbacks import TensorBoard

log_dir = "logs/fit/" # directory to save the logs
tensorboard_callback = TensorBoard(log_dir=log_dir) # create a TensorBoard callback meaning it will save the logs to the directory
model.fit(train_data, train_labels, epochs=10, callbacks=[tensorboard_callback]) # using the same data and labels as before

In [None]:
# we can visulize the tf logs using tensorboard 
# # option 1) tensorboard --logdir=logs/fit (will runn on localhost:6006)
# option 2) within jupyter notebook
%load_ext tensorboard # load the tensorboard extension
%tensorboard --logdir logs/fit # run tensorboard in the notebook

In [None]:
import tensorflow as tf
import numpy as np

# 1. Dummy data for an image classification problem
train_data = np.random.rand(1000, 784)  # 1000 samples, 784 features each i.e a 2D arr ay with 1000 rows and 784 columns where each row is a sample and each column is a feature
train_labels = np.random.randint(10, size=(1000,))  # 1000 labels (classes 0-9) each label is an integer between 0 and 9
batch_size = 32 # batch size this is how many examples we process before updating the model weights

# Create a TensorFlow dataset and batch it, creates a dataset with tarin dat and lables as pairs comes in batches of 32
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_labels)).batch(batch_size)

# 2. Define the model
class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__() # initialize the parent class gives us access to the Model class methods and properties
        self.dense1 = tf.keras.layers.Dense(64, activation='relu')  # hidden layer (64 neurons) with relu activation
        self.out = tf.keras.layers.Dense(10)  # output layer (10 classes)

    # this defines the forward pass of the model whenever we do model(inputs) this is called automatically
    def call(self, inputs): # inputs is the input to the model
        x = self.dense1(inputs) # pass the inputs through the first layer this will pass it through the hidden layer (dense layer)
        return self.out(x) # pass the output of the first layer through the output layer # this will take the dense layer output and pass it through the output layer to get the final output

model = MyModel() # create an instance of the model

# 3. Define a custom loss function
# Warning: your custom loss is MSE (good for regression, not classification!)
# Normally for classification you use SparseCategoricalCrossentropy
def custom_loss(y_true, y_pred):
    # Mean Squared Error between true labels and predicted labels MSE calculates the mean squared error between the true labels and the predicted labels
    # we take the predicted labels (given as 10 logits this for ex in a digit classification prob can be [0.1, 0.2, 0.3, ..., 1.0] for each class 0-9 with the index corresponding to the class dgits 0-9) 
    # and the true labels (given as integers 0-9 corresponding to the class digits) we want to subtract this from each other but ones a list and the otehr is a number 
    # so we use one_hot encoding to convert the true labels into a list of 10 values where the index corresponding to the class digit is 1 and the rest are 0 (for ex if the true lable is 3 then the one hot encoding will be [0,0,0,1,0,0,0,0,0,0])
    # now we can subract the two and get the squared error and take the mean of all the errors to get the final loss value
    # at one time we will receive a batch of y_true and y_pred so we calculate the loss for the entire batch so thats a 2d array with shape (batch_size, 10) then we take the mean of all the errors using the reduce_mean function
    return tf.reduce_mean(tf.square(y_pred - tf.one_hot(y_true, depth=10))) # depth is important and corresponds to the number of classes

# 4. Define optimizer
# this is used to update the weights of the model during training Adam is a popular optimizer that is used for most problems. it combines the benefits of two other extensions of stochastic gradient descent.
optimizer = tf.keras.optimizers.Adam()

# 5. Custom training loop
epochs = 10 # a epoch is one pass through the entire dataset
for epoch in range(epochs): # for each epoch (in this loop we will see the entire dataset once)
    print(f"Epoch {epoch+1}/{epochs}")
    for step, (x_batch, y_batch) in enumerate(train_dataset): # for each step (in this loop we will process a batch of data) here from the dataset we get a batch of data and labels 
        with tf.GradientTape() as tape: # GradientTape is used to record the operations for automatic differentiation in basic terms it helps us calculate the gradients of the loss with respect to the model weights by recording the operations that are performed on the tensors in the model
            # the reson we set training=True is to ensure that any layers that behave differently during training and inference (like dropout or batch normalization) are in training mode, basically we are telling the model we are training it so it should behave accordingly like for ex its allowed to drop neurons in dropout layers
            logits = model(x_batch, training=True) # forward pass (get the model predictions for the batch of data) this will give us a list for each sample in the batch with 10 logits (one for each class 0-9) we will have 32 lists if the batch size is 32 with shape (32, 10)
            loss = custom_loss(y_batch, logits)  # we calculate the loss using our custom loss function (compare the predicted labels with the true labels for the batch) this will give us the loss value for the entire batch (a single number we will use this to update all the model weights)
        gradients = tape.gradient(loss, model.trainable_variables) # compute gradients (get the gradients of the loss with respect to the model weights) this will give us a list of gradients for each weight in the model basically how much we need to change each weight to reduce the loss so theres 1 gradient for each weight in the model
        optimizer.apply_gradients(zip(gradients, model.trainable_variables)) # update weights (apply the gradients to the model weights) this will update the weights of the model using the gradients we calculated and the optimizer we defined, here we pass a zip object that pairs each gradient with its corresponding weight in the model
        if step % 10 == 0: # print loss every 10 steps
            print(f"Step {step}: Loss = {loss.numpy():.4f}")

print("Training complete ‚úÖ")


## Keras

# üß† Introduction to Keras with TensorFlow

Welcome to your **Beginner-Friendly Introduction to Keras**, the high-level API built on top of **TensorFlow**.

Keras is designed to make building, training, and deploying deep learning models easy and intuitive.

---

## üìò What You'll Learn
1. What Keras is and how it relates to TensorFlow  
2. How to build models using the **Sequential API**  
3. How to train and evaluate models  
4. How to use **callbacks** to improve training  
5. How to save and load models  
6. Two practical examples:
   - A simple **regression model**
   - A **classification model** using MNIST digits

---


## üß© 1. Building a Simple Sequential Model

In [None]:

from tensorflow import keras
from tensorflow.keras import layers

# 2 ways of making layers:
# method 1 more complicated:
inputs = keras.Input(shape=(784,))
x = keras.layers.Dense(8, activation='relu')(inputs) # 8 neurons + relu activation and take in inputs
outputs = keras.layers.Dense(1)(x) # output layer with 1 neuron

# method 2:
# Create a Sequential model
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(10,)),
    layers.Dense(32, activation='relu'),
    layers.Dense(1)
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae']
)

# NOTE: Using strings vs classes for optimizers, loss functions, and activations etc 
""" 
when we specify the optimizer, loss function, and metrics etc in the compiler method we are passing in the optimizer, loss etc as strings which are then mapped to their corresponding classes in keras.
i.e these are all classes for ex the adam optimizer is a class in keras.optimizers.Adam it has its own methods and properties. in this case sicne we use the string to specify it keras will internally create an instance of the class for us and apply default parameters.
if you want to customize the optimizer or loss function you can create an instance of the class and pass it to the compile method instead of using the string.

EX
optimizer = keras.optimizers.Adam(learning_rate=0.0001) # create an instance of the Adam optimizer with a custom learning rate
model.compile(optimizer=optimizer, loss='mse') # pass the instance of the optimizer to the compile method

* FOR the parameters of the optimizers and loss functions you can refer to the keras documentation for more details.

# Activations:
### IF THE ACTIVATION FUNCTION HAS A LAYERS AND ACTIVATION FUNCTION BOTH (like LeakyReLU, Relu, etc) i.e Activation Layers

# Example of passing LeakyReLU as a class with a custom slope
leaky_relu = tf.keras.layers.LeakyReLU(alpha=0.05)  # Create an instance of LeakyReLU with a custom slope
x = keras.layers.Dense(8, activation=leaky_relu)(inputs)  # Use the instance as the activation function

# second example 
# Example of using Sequential with a custom LeakyReLU activation

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=seq_length),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(hidden_units, activation=tf.keras.layers.LeakyReLU(alpha=0.2)), # pass in inline activation
    tf.keras.layers.Dense(1, activation="sigmoid")
])

or 

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=seq_length),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(hidden_units),
    tf.keras.layers.LeakyReLU(alpha=0.2), # separate activation layer
    tf.keras.layers.Dense(1, activation="sigmoid")
])

NOTE: you can import the activation function like: from tensorflow.keras.layers import LeakyReLU and use it directly without tf.keras.layers.LeakyReLU
NOTE: here we use the layers apis LeakyReLU there is also a keras.activations.leaky_relu(x, alpha=0.2) function that can be used directly on tensors.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# 1) Use the stateless activation function (default alpha=0.2)
model = keras.Sequential([
    layers.Dense(64, activation=tf.keras.activations.leaky_relu),
    layers.Dense(1, activation='sigmoid')
])

# 2) Same but with a custom alpha (e.g. 0.05)
model_custom = keras.Sequential([
    layers.Dense(64, activation=lambda x: tf.keras.activations.leaky_relu(x, alpha=0.05)),
    layers.Dense(1, activation='sigmoid')
])

model.summary()
model_custom.summary()

#### IN SHORT ####
- Use the function for one-off ops; use the Layer when you want to include it in a model (serialization, config, graph). 

- stateless function: tf.keras.activations.leaky_relu(x, alpha=0.2)
import tensorflow as tf

x = tf.constant([[-3.0, -0.5, 0.0, 2.0]])
y = tf.keras.activations.leaky_relu(x, alpha=0.2)
print(y.numpy())  # tensor with negative values scaled by alpha

- as a layer instance: tf.keras.layers.LeakyReLU(alpha=0.2)
layer = tf.keras.layers.LeakyReLU(alpha=0.2)
y2 = layer(x)    # same effect, but layer is stateful/serializable
print(y2.numpy())

### IF THE ACTIVATION FUNCTION HAS ONLY A CLASS IN THE ACTIVATIONS CATEGORY (like Gelu, etc) i.e not in Activation Layers

# example with GELU activation function. gelu => keras.activations.gelu(x, approximate=False) # x = input tensor approximate = whether to use the approximate version of gelu

- using default parameters

1) # use string
model = keras.Sequential([
    layers.Dense(64, activation='gelu'),
    layers.Dense(1, activation='sigmoid')
])
2) # use default class
model = keras.Sequential([
    layers.Dense(64, activation=tf.keras.activations.gelu),
    layers.Dense(1, activation='sigmoid')
])

- using custom parameters (create custom layer class 

class GELULayer(tf.keras.layers.Layer): # inheriting from Layer class
    def __init__(self, approximate=False, **kwargs): # constructor with custom parameter
        super().__init__(**kwargs) # call the constructor of the parent class
        self.approximate = approximate # store the custom parameter 
    def call(self, inputs): # forward pass inputs is the input tensor to the layer i.e wether the neurons of the layer are activated or not
        return tf.keras.activations.gelu(inputs, approximate=self.approximate) # apply the gelu activation function with the custom parameter to inputs of the layer
    def get_config(self): # method to serialize the layer configuration (so we can save and load the model later)
        cfg = super().get_config()
        cfg.update({"approximate": self.approximate})
        return cfg
    
model = keras.Sequential([
    layers.Dense(64, activation=GELULayer(approximate=True)), # use the custom GELULayer with approximate=True
    layers.Dense(1, activation='sigmoid')
])

"""

# Summary
model.summary()


## üßÆ 2. Regression Example ‚Äî Predicting Continuous Values

In [None]:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from tensorflow.keras import layers

# Generate synthetic data
np.random.seed(42)
X = np.random.rand(1000, 3)
y = X @ [3.5, 1.2, -2.0] + np.random.randn(1000) * 0.2

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Scale data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build model
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(3,)),
    layers.Dense(32, activation='relu'),
    layers.Dense(1)
])

model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train model
history = model.fit(X_train, y_train, epochs=50, validation_split=0.2, verbose=0)

# Evaluate
loss, mae = model.evaluate(X_test, y_test, verbose=0)
print(f"Mean Absolute Error: {mae:.3f}")


## üî¢ 3. Classification Example ‚Äî MNIST Handwritten Digits

In [None]:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow import keras
from tensorflow.keras import layers

# Load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize and flatten images
X_train = X_train.reshape(-1, 28*28) / 255.0
X_test = X_test.reshape(-1, 28*28) / 255.0

# One-hot encode labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Build model
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(784,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
model.fit(X_train, y_train, epochs=5, batch_size=128, validation_split=0.1)

# Evaluate
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.3f}")


## ‚è∏Ô∏è 4. Using Callbacks (EarlyStopping, ModelCheckpoint)

In [None]:

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

callbacks = [
    EarlyStopping(patience=3, monitor='val_loss', restore_best_weights=True),
    ModelCheckpoint('best_model.keras', save_best_only=True)
]

model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=128,
    validation_split=0.1,
    callbacks=callbacks
)


## üíæ 5. Saving and Loading Models

In [None]:

# Save the entire model
model.save('my_model.keras')

# Load it back
new_model = keras.models.load_model('my_model.keras')
new_model.summary()


## üöÄ 6. Summary ‚Äî What You Learned

| Concept | Description |
|----------|--------------|
| **Sequential API** | Simple stack of layers for feedforward networks |
| **Activation Functions** | Add non-linearity (e.g., ReLU, sigmoid, softmax) |
| **Loss Function** | Defines how model performance is measured |
| **Optimizer** | Controls how weights are updated (Adam, SGD, etc.) |
| **Callbacks** | Automate training behavior (e.g., stop early, save model) |
| **Saving Models** | `model.save()` and `keras.models.load_model()` |

You now have a **solid beginner foundation** in Keras ‚Äî ready to move toward CNNs, RNNs, and advanced deep learning!


## Comparing Tensoflow with Keras

In [None]:
# Tensor Flow Vs Keras Example:

import tensorflow as tf

# Create random data
X = tf.random.normal((100, 3))
y = tf.random.normal((100, 1))

# Only with TF:
# Initialize weights and bias manually
W = tf.Variable(tf.random.normal((3, 1)))
b = tf.Variable(tf.zeros(1))

# Define a simple forward function
def model(x):
    return tf.matmul(x, W) + b  # Linear layer

# Define loss function (MSE)
def loss_fn(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true - y_pred))

# Define optimizer
optimizer = tf.optimizers.SGD(learning_rate=0.01)

# Training loop
for step in range(1000):
    with tf.GradientTape() as tape:
        y_pred = model(X)
        loss = loss_fn(y, y_pred)
    # Compute gradients and update weights
    grads = tape.gradient(loss, [W, b])
    optimizer.apply_gradients(zip(grads, [W, b]))

print("Trained weights:", W.numpy())
print("Trained bias:", b.numpy())

# with keras:
from tensorflow.keras import layers, models
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.compile(optimizer='sgd', loss='mse')
model.fit(X, y, epochs=100)



### Eager Execution

tf.config.run_functions_eagerly(True) controls how TensorFlow executes operations:

**Eager Execution (Enabled)**
What it does: Operations execute immediately and return concrete values, just like normal Python code.

**Benefits:**

- Intuitive debugging - You can use print statements, breakpoints, and inspect values directly
- Works with Python control flow - Standard if/else, loops work naturally
- Easy .numpy() calls - Can convert tensors to NumPy arrays anytime
- Great for prototyping - Fast iteration and development
  
Example:
```py
x = tf.constant([1, 2, 3])
print(x.numpy())  # Works! Prints [1 2 3]
```

**Graph Execution (Disabled - default in TF 2.x training)**
What it does: TensorFlow builds a computational graph first, then executes it optimized.

**Benefits:**

- Much faster - Optimizes the computation graph (2x-10x faster)
- Better for production - Can deploy to mobile, edge devices
- Parallelization - Automatically optimizes across devices
  
Trade-off:
```py
x = tf.constant([1, 2, 3])
print(x.numpy())  # Error! Can't access values in graph mode
```

**NOTE:** you do not need debuging in prod so never enable Eager Execution in production

"Immediate mode" explained
Immediate/Eager Mode (Eager Execution Enabled):
```py
# Code runs line by line, like normal Python
x = tf.constant([1, 2, 3])
y = x * 2
print(y.numpy())  # Executes immediately ‚Üí [2 4 6]
```
Operations execute instantly and return real values you can inspect.

Graph Mode (default during training) (Eager Execution Disabaled):
```py
# TensorFlow builds a "recipe" first, then executes it all at once
x = tf.constant([1, 2, 3])  # Records: "create constant"
y = x * 2                    # Records: "multiply by 2"
# Nothing actually computed yet!
# Later, TF executes the entire graph optimized
```

### Real-World TensorFlow Graph Example

Below is a **real computational graph** that TensorFlow builds when training a neural network. This shows actual nodes and operations, not just constants!

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers, models

# Disable eager execution to see the graph
tf.config.run_functions_eagerly(False)

# Create a simple neural network
model = models.Sequential([
    layers.Dense(4, activation='relu', input_shape=(3,), name='hidden_layer'),
    layers.Dense(2, activation='softmax', name='output_layer')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Create dummy data
X_train = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0], [10.0, 11.0, 12.0]])
y_train = np.array([0, 1, 0, 1])

# Use @tf.function to trace the computational graph
@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        # Forward pass
        predictions = model(x, training=True)
        loss = tf.keras.losses.sparse_categorical_crossentropy(y, predictions)
    
    # Backward pass
    gradients = tape.gradient(loss, model.trainable_variables)
    model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

# Run one training step
sample_x = tf.constant(X_train[:2])
sample_y = tf.constant(y_train[:2])
loss = train_step(sample_x, sample_y)

print("‚úÖ Graph created and executed!")
print(f"Loss: {tf.reduce_mean(loss).numpy():.4f}")
print("\nüìä The computational graph includes:")
print("  - Input tensors (x, y)")
print("  - Matrix multiplications (Dense layers)")
print("  - ReLU activation")
print("  - Softmax activation")
print("  - Loss calculation")
print("  - Gradient computation")
print("  - Weight updates")

### Visualizing the Graph with TensorBoard

Let's save and visualize the actual computational graph:

In [None]:
import tensorflow as tf
import numpy as np
from datetime import datetime

# Clear any previous graphs
tf.keras.backend.clear_session()

# Create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation='relu', input_shape=(3,), name='hidden'),
    tf.keras.layers.Dense(2, activation='softmax', name='output')
])

# Dummy data
X = np.random.rand(10, 3).astype(np.float32)
y = np.random.randint(0, 2, 10)

# Set up TensorBoard writer
log_dir = "assets/logs/graph/" + datetime.now().strftime("%Y%m%d-%H%M%S")
writer = tf.summary.create_file_writer(log_dir)

# Trace the graph using @tf.function
@tf.function
def forward_pass(x):
    return model(x)

# Execute and trace the graph
tf.summary.trace_on(graph=True, profiler=False)
forward_pass(X[:1])
with writer.as_default():
    tf.summary.trace_export(name="model_graph", step=0)

print(f"‚úÖ Graph saved to: {log_dir}")
print("\nüîç To visualize the graph, run:")
print(f"   tensorboard --logdir={log_dir}")
print("   Then open http://localhost:6006 in your browser")
print("\nüìä Or run in notebook:")
print(f"   %load_ext tensorboard")
print(f"   %tensorboard --logdir {log_dir}")

In [None]:
%load_ext tensorboard
%tensorboard --logdir assets/logs/graph/20251122-161242

There are 2 graphs in the tensorboard (graph1 and graph2)

**Small Graph**: High-level conceptual view showing your model architecture (what you designed)

**Big Graph**: Low-level detailed view showing every single mathematical operation TensorFlow executes (what actually runs)

### What a Real Computational Graph Looks Like

Here's a detailed breakdown of what happens in the graph above:

```
INPUT (shape: [batch, 3])
    ‚Üì
MatMul (weights: [3, 4])  ‚Üê First Dense Layer
    ‚Üì
BiasAdd (bias: [4])
    ‚Üì
ReLU Activation
    ‚Üì
MatMul (weights: [4, 2])  ‚Üê Output Dense Layer
    ‚Üì
BiasAdd (bias: [2])
    ‚Üì
Softmax Activation
    ‚Üì
OUTPUT (shape: [batch, 2])
```

**During Training (with GradientTape):**
```
Forward Pass: Input ‚Üí Hidden ‚Üí Output ‚Üí Loss
                ‚Üì         ‚Üì       ‚Üì       ‚Üì
Backward Pass: ‚àáLoss/‚àáInput ‚Üê ‚àáLoss/‚àáHidden ‚Üê ‚àáLoss/‚àáOutput ‚Üê Loss
                        ‚Üì
                Update Weights
```

**Key Nodes in the Graph:**
- **Placeholder nodes**: Input data (x, y)
- **Variable nodes**: Trainable weights and biases
- **Operation nodes**: MatMul, Add, ReLU, Softmax, Loss calculation
- **Gradient nodes**: Automatic differentiation operations
- **Optimizer nodes**: Weight update operations (Adam optimizer)

### Comparing Graph vs Eager Execution

In [None]:
import tensorflow as tf
import time

# Simple computation: (a + b) * (a - b)
def computation(a, b):
    return (a + b) * (a - b)

# Test data
x = tf.random.normal([1000, 1000])
y = tf.random.normal([1000, 1000])

print("=" * 60)
print("üê¢ EAGER MODE (runs immediately, line by line)")
print("=" * 60)
tf.config.run_functions_eagerly(True)

start = time.time()
for _ in range(100):
    result = computation(x, y)
eager_time = time.time() - start
print(f"Time taken: {eager_time:.3f} seconds")
print(f"Can inspect: {result[0, 0].numpy():.6f}")

print("\n" + "=" * 60)
print("üöÄ GRAPH MODE (builds optimized graph first)")
print("=" * 60)
tf.config.run_functions_eagerly(False)

# Wrap in tf.function to create graph
@tf.function
def graph_computation(a, b):
    return (a + b) * (a - b)

start = time.time()
for _ in range(100):
    result = graph_computation(x, y)
graph_time = time.time() - start
print(f"Time taken: {graph_time:.3f} seconds")
print(f"Speedup: {eager_time / graph_time:.2f}x faster!")

print("\n" + "=" * 60)
print("üìä WHAT'S HAPPENING IN GRAPH MODE:")
print("=" * 60)
print("1. First call: TensorFlow traces the function and builds a graph")
print("   - Records: create 'a', create 'b', add, subtract, multiply")
print("   - Optimizes: fuses operations, removes redundant computations")
print("   - Compiles: converts to efficient low-level operations")
print("\n2. Subsequent calls: Reuses the optimized graph")
print("   - No Python overhead")
print("   - Parallel execution where possible")
print("   - GPU/TPU optimizations applied")
print("\n3. The graph is a static computation plan:")
print("   - All operations predefined")
print("   - Can be serialized and deployed")
print("   - Works on mobile/edge devices")