<a href="https://colab.research.google.com/github/Twinkle-gawri/Tensorflow/blob/main/NLP_functional_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist

In [None]:
(x_train,y_train),(x_test,y_test) = mnist.load_data()   # initially hi yeh train and test me divided hote h

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [None]:
x_train.shape

(60000, 28, 28)

1. Convert 2D Images into 1D Vectors
* The original shape of the images is (28, 28), which is a 2D array.
* Neural networks, especially dense (fully connected) layers, typically expect the input as 1D vectors.
* Therefore, the images need to be flattened into a shape of 28×28 = 784 (1D vector) for processing.
* Example:
Original image: 28x28 (2D matrix)
After reshaping: 784 (1D array)
2. Normalize Pixel Values
* The division by 255.0 scales the pixel values to the range [0, 1].
* Pixel values in the MNIST dataset range from 0 to 255.
* Neural networks perform better when input values are normalized because it helps gradients converge faster during training.

In [None]:
x_train = x_train.reshape(-1,28*28)/255.0
x_test = x_test.reshape(-1,28*28)/255.0

In [None]:
x_train.shape

(60000, 784)

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_val,y_train,y_val = train_test_split(x_train,y_train,test_size=0.2,random_state=42)

In [None]:
x_train.shape

(48000, 784)

1. Sequential API :
The Sequential API is a simple and straightforward way to build models layer-by-layer, where each layer has exactly one input tensor and one output tensor. It’s best suited for linear stacks of layers without any branching, skipping, or multiple inputs/outputs.
2. The Functional API :
The Functional API is more flexible and powerful. It allows you to define models as directed acyclic graphs (DAGs) of layers. It supports complex architectures like models with multiple inputs, multiple outputs, shared layers, or skip connections.

In [None]:
#functional api -- multiple inputs and outputs
input = keras.Input(shape=(784,))  #shape=(784,) specifies that the input to the model is a vector of length 784.
x = layers.Dense(512,activation='relu')(input)
x = layers.Dense(256,activation='relu')(x)
x = layers.Dense(128,activation='relu')(x)
outputs = layers.Dense(10,activation='softmax')(x)
model=keras.Model(inputs=input,outputs=outputs) # input if more then 1 toh list hogi input and output bhi list hogi if >1

In [None]:
model.summary()

The compile() method is used to configure the model for training. It specifies:

* The optimizer: How the model updates its weights during training (Adam with a learning rate of 0.0003 in this case).
* The loss function: How the model measures its error (SparseCategoricalCrossentropy for multi-class classification with integer labels).
* The metrics: What performance metrics to track during training and evaluation.

In [None]:
model.compile(optimizer=keras.optimizers.Adam(3e-4),loss=keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])

1. x_train:
This is the input training data. In this case, x_train contains the input features for the model (e.g., flattened MNIST images with shape (num_samples, 784)). Each row represents one training sample.
2. y_train:
This is the target labels (or ground truth) corresponding to x_train. It should match the number of samples in x_train (e.g., if x_train has 60,000 samples, y_train must also have 60,000 labels).
Example: For MNIST, y_train would be a vector of class labels (e.g., [0, 1, 2, ..., 9]).
3. batch_size=64
This specifies the number of samples the model processes before updating its weight.
* What is a batch? -- Instead of processing all training samples at once, the data is split into smaller "batches." The model updates its weights after processing each batch.
* Batch Size: 64 means the model will process 64 samples at a time before updating weights.
* Why Batch Processing? --
It reduces memory usage, as only a small subset of the data is loaded into memory at once.
It improves training efficiency because weight updates occur more frequently than with full-batch training.

4. epochs=10
This specifies the number of complete passes through the training dataset:

* 1 Epoch: The model processes all the training samples once (split into batches).
* 10 Epochs: The model processes the entire dataset 10 times. Each pass allows the model to learn and adjust weights further.
* More epochs can lead to better training, but excessive epochs may lead to overfitting.

5. verbose=1
This controls the level of detail printed during training:

* 0: No output (silent mode).
* 1: Displays a progress bar for each epoch, along with training metrics like loss and accuracy.
* 2: Displays one line per epoch with the metrics, without the progress bar.
--------

* 938/938: Indicates the number of batches (for example, if there are 60,000 samples with batch size 64, there are 60000/64 = 938 batches).
* loss: Training loss on the current batch/epoch.
* accuracy: Training accuracy for the current batch/epoch.
* val_loss: Validation loss after each epoch.
* val_accuracy: Validation accuracy after each epoch.

In [None]:
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1,validation_data=(x_val,y_val))

Epoch 1/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8451 - loss: 0.5793 - val_accuracy: 0.9571 - val_loss: 0.1489
Epoch 2/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.9637 - loss: 0.1231 - val_accuracy: 0.9679 - val_loss: 0.1056
Epoch 3/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9768 - loss: 0.0749 - val_accuracy: 0.9729 - val_loss: 0.0887
Epoch 4/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9859 - loss: 0.0476 - val_accuracy: 0.9762 - val_loss: 0.0782
Epoch 5/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9908 - loss: 0.0329 - val_accuracy: 0.9778 - val_loss: 0.0763
Epoch 6/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9918 - loss: 0.0256 - val_accuracy: 0.9745 - val_loss: 0.0862
Epoch 7/10
[1m750/750[0m 

<keras.src.callbacks.history.History at 0x7fa76c10a710>

In [None]:
#subclassing 2 function -- init (to initialize the architecture of nn ) and forward (how your forward propagation will take place)
class Model(keras.Model):  # base class - keras.model -- inherits all property of keras.model
  def __init__(self):
    super(Model,self).__init__ ()  # initialize all parameters of parent class -- constructor --- how weights are computed, how back prop is taking place, yeh saara inherit hota h
    self.dense1 = layers.Dense(512,activation='relu')
    self.dense2 = layers.Dense(256,activation='relu')
    self.dense3 = layers.Dense(128,activation='relu')
    self.out = layers.Dense(10,activation='softmax')   # output as a name mat use karna keyword h woh

  def call(self,inputs):
    x = self.dense1(inputs)
    x = self.dense2(x)
    x = self.dense3(x)
    x = self.out(x)
    return x

In [None]:
model1 = Model()

In [None]:
model.compile(optimizer=keras.optimizers.Adam(3e-4),loss=keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1,validation_data=(x_val,y_val))

Epoch 1/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9964 - loss: 0.0104 - val_accuracy: 0.9808 - val_loss: 0.0788
Epoch 2/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 10ms/step - accuracy: 0.9976 - loss: 0.0066 - val_accuracy: 0.9799 - val_loss: 0.0907
Epoch 3/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.9970 - loss: 0.0096 - val_accuracy: 0.9804 - val_loss: 0.0894
Epoch 4/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9984 - loss: 0.0049 - val_accuracy: 0.9794 - val_loss: 0.0924
Epoch 5/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 11ms/step - accuracy: 0.9985 - loss: 0.0060 - val_accuracy: 0.9798 - val_loss: 0.0945
Epoch 6/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.9980 - loss: 0.0065 - val_accuracy: 0.9826 - val_loss: 0.0867
Epoch 7/10
[1m750/

<keras.src.callbacks.history.History at 0x796569177210>

In [None]:
# model.evaluate(x_test, y_test, batch_size=32, verbose=2) -- for evaluation