<a href="https://colab.research.google.com/github/Adityeah18/tensorflow/blob/main/module-3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Module-3

## 📚 Convolutional Neural Networks (CNN) – Complete Explanation

### ✅ CNN vs DNN – Core Differences

| Feature         | DNN (Deep Neural Network)                          | CNN (Convolutional Neural Network)                     |
|----------------|----------------------------------------------------|--------------------------------------------------------|
| Input Type     | Flat structured data                               | 3D structured data (Image: H × W × Channels)           |
| Layer Type     | Dense (Fully Connected)                            | Convolutional + Pooling + Dense                       |
| Pattern Type   | Global pattern recognition                         | Local pattern recognition                             |
| Image Handling | Poor with spatial changes, sensitive to shifts     | Good with translations, detects features anywhere     |
| Efficiency     | High computation on images                         | More efficient due to local connections and weight sharing |

---

## 🧠 CNN Architecture Concepts

### 1. 📏 Input Dimensions

- Shape: **Height × Width × Channels**
- Channels:
  - RGB image → 3 channels
  - Grayscale → 1 channel
- Pixel range: **0 to 255** (usually normalized to 0–1)

---

### 2. 🧰 Filters (Kernels)

- Learnable matrices: e.g., `3x3`, `5x5`
- Slides over image to extract features like:
  - Edges
  - Textures
  - Patterns
- Each filter outputs a **Feature Map**

---

### 3. ⚙️ Convolution Operation

- Filter **slides** across input
- At each step:
  - Performs **dot product** between filter and image patch
  - Result is stored in **feature map**
- Deeper layers = more abstract features

---

### 4. 🧍‍♂️ Stride

- How many pixels the filter moves at a time:
  - `stride = 1`: moves one pixel → high-resolution output
  - `stride = 2`: skips pixels → faster but lower-res

---

### 5. 🧱 Padding

- Adds borders to input so the filter can cover the edges
- Types:
  - `valid` → no padding (output shrinks)
  - `same` → zero-padding to keep size same as input

---

### 6. 💧 Pooling (Downsampling)

- Reduces dimensionality
- Helps generalization and speed
- Common types:
  - **Max Pooling**: keeps max value in patch
  - **Average Pooling**: takes average
- Example: `2x2 Max Pooling` halves H and W

---

## 🔄 Typical CNN Layer Flow

```text
Input Image (32x32x3)
↓
Conv Layer (with 3x3 filters)
↓
ReLU Activation
↓
Pooling Layer (2x2)
↓
Flatten
↓
Dense Layer
↓
Softmax Output (for classification)


## CNN Representation
# 🧠 CNN Architecture Visual Diagram

Below is a basic representation of how a Convolutional Neural Network processes an image input:

```text
┌────────────────────┐
│  Input Image       │   ← [32 x 32 x 3] (H x W x Channels)
└────────────────────┘
           │
           ▼
┌────────────────────┐
│ Convolution Layer  │   ← Apply multiple filters (e.g., 3x3)
│ (Conv2D)           │   → Feature maps: edges, textures
└────────────────────┘
           │
           ▼
┌────────────────────┐
│ Activation (ReLU)  │   ← ReLU introduces non-linearity
└────────────────────┘
           │
           ▼
┌────────────────────┐
│ Pooling Layer      │   ← MaxPooling2D (e.g., 2x2)
│ (Downsampling)     │   → Reduces spatial size
└────────────────────┘
           │
           ▼
┌────────────────────┐
│ More Conv + Pool   │   ← Stack deeper layers to learn
│ (Optional)         │      complex patterns
└────────────────────┘
           │
           ▼
┌────────────────────┐
│ Flatten Layer       │  ← Converts 2D feature maps into 1D
└────────────────────┘
           │
           ▼
┌────────────────────┐
│ Fully Connected     │  ← Dense layer
│ (Dense Layer)       │
└────────────────────┘
           │
           ▼
┌────────────────────┐
│ Output Layer        │  ← e.g., 10 classes → Dense(10, softmax)
└────────────────────┘


Essential Libraries

In [None]:
import tensorflow as tf

tf.__version__


'2.18.0'

##Dataset
CIFAR dataset = 60000 samples , 10 Labels, 6000 per labels
labels= Airplane,Automobile,Bird,Car,Deer,Dog,Frog,Horse,Ship,Truck

###Spliting /Loading and other preprocessing

In [None]:
import matplotlib.pyplot as plt

from tensorflow.keras import datasets
(train_image,train_label),(test_image,test_label)= datasets.cifar10.load_data() ## why there are so many watyys to load datset?
#Now the images of training and testing are ranging from 0-255 as being in color
#We have to Normalize it
#Normalize
train_image,test_image=train_image/255.0,test_image/255.0

class_name=['airplane','automobile','bird','car','deer',
            'dog','frog','horse','ship','truck'] #10 labels which gonna be used in ploting the data
#Let's see the image example
IMG_INDEX=7

plt.imshow(train_image[IMG_INDEX])
plt.xlabel(class_name[train_label[IMG_INDEX][0]])
plt.show()
#As result image 7 = somwhat a horse

##CNN Architecture
the architecture is simple:
1st stack of Convolution process(Filter ) and then Pooling method(extracting the features from images) and these are Flattend and the fed to CNN to determine which class it fall s under

###model

In [None]:
from tensorflow.keras import layers,models
model=models.Sequential()
#Convolution stacking
model.add(layers.Conv2D(32,(3,3),activation='relu',input_shape=(32,32,3))) #Making the matrix of 32 and then selecting 3x3 of input image matrix of 32x32x3(3=RGB)
#Now the Pooling ;here max pooling we took
model.add(layers.MaxPooling2D(2,2)) #After the filter the image goes throug th feature extraction in 2x2 matrix here takes the max from the 2x2 matrix

#Extra layers to add deptha nd good learning
#But as we alreaady try to limit t he computational usage, doest this increase it again?
model.add(layers.Conv2D(64,(3,3),activation='relu'))
model.add(layers.MaxPooling2D(2,2))
model.add(layers.Conv2D(64,(3,3),activation='relu'))
model.add(layers.MaxPooling2D(2,2))

model.summary() ## ask the gpt as it suppuse to show the depth increase bur special dimention reduction


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


###Model Dense

In [None]:
#Now flattening the data
model.add(layers.Flatten())

#Adding the Dense layer to classify the images in the labels
model.add(layers.Dense(64,activation='relu'))

#Output
model.add(layers.Dense(10,activation='softmax')) #as the ere is 10 Labels

model.summary() ## again what its telling and how to read???

##Compile

In [None]:
from tensorflow.keras import losses

model.compile(optimizer='adam',loss='SparseCategoricalCrossentropy',metrics=['accuracy'])

##Fitting/Training

In [None]:
model.fit(train_image,train_label,epochs=10,validation_data=(train_image,train_label))

Epoch 1/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 8ms/step - accuracy: 0.3352 - loss: 1.7885 - val_accuracy: 0.5365 - val_loss: 1.2836
Epoch 2/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 7ms/step - accuracy: 0.5656 - loss: 1.2184 - val_accuracy: 0.6335 - val_loss: 1.0399
Epoch 3/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.6326 - loss: 1.0448 - val_accuracy: 0.6543 - val_loss: 0.9858
Epoch 4/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.6710 - loss: 0.9407 - val_accuracy: 0.7016 - val_loss: 0.8615
Epoch 5/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.6993 - loss: 0.8560 - val_accuracy: 0.7284 - val_loss: 0.7763
Epoch 6/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - accuracy: 0.7239 - loss: 0.8005 - val_accuracy: 0.7578 - val_loss: 0.7021
Epoch 7/10
[

<keras.src.callbacks.history.History at 0x7a137f193510>

##Evaluate

In [None]:
loss,accuracy=model.evaluate(test_image,test_label,verbose=2)
print(f'the accuracy is:{accuracy:.04f}')

313/313 - 1s - 2ms/step - accuracy: 0.7136 - loss: 0.8616
the accuracy is:0.7136


---
##Working with small datasets

###Data Augmentation
It s a technique used to artificially expand the size of a training dataset by creating modified versions of images.

These modifications can include:

Rotation,Width/height shift,Zoom in/out, Horizontal/vertical flip,Brightness/contrast adjustment, Shear/stretch/skew

This helps the model:

Learn features from different perspectives. Become robust to variations, Generalize better to unseen data

🧠 In short: Instead of collecting 10,000 new images, you can teach your model to be smart by twisting and remixing your 1000 images in creative ways.



In [None]:
#Importation for image augmentation
#understand the working not the syntax
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator

#Data generator objectes for parameters to activate when processing the image
datagen=ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    fill_mode='nearest'
)
#Randomply selecting an image to transform
img_gen_test=train_image[20]
#converting image into numpy array
img=image.img_to_array(img_gen_test)
#Rehaping
img=img.reshape((1,)+img.shape)

i=0

for batch in datagen.flow(img,save_prefix='test',save_format='jpeg'):
  plt.figure(i)
  plot=plt.imshow(image.img_to_array(batch[0]))
  i+=1
  if i>4:
    break
plt.show()

###Pretrained dataset
CNNs trained on massive datasets (like ImageNet, which has over 1 million images across 1000+ classes) can be reused for transfer learning. Instead of training a CNN from scratch, we can:

Use the pretrained CNN as a feature extractor, and

Attach our own custom classifier (like a DNN) at the end for a specific task.

📌 Think of it like this: the CNN already knows how to detect edges, textures, shapes — so why re-invent the wheel?

---

####Fine Tuning
Fine-tuning is the process of unfreezing some layers of the pretrained model and retraining them (usually the deeper ones) on your specific dataset.

Why?

Early layers in a CNN learn general features like edges, corners, textures. These are useful across all kinds of images.

Later layers learn task-specific features (like cat faces, dog paws, etc.). You might want to tweak these to match your dataset (like medical scans, satellite images, etc.).

So instead of retraining the entire model, we just fine-tune the final layers to improve performance without starting from zero.




###Using these pretrained models
We will seperate the Cats with dogs


In [None]:

import tensorflow_datasets as tfds
tfds.disable_progress_bar #??
#Spliting the datset into training, validation and testing
(raw_train, raw_validation, raw_test), metadata = tfds.load(
    'cats_vs_dogs',
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
    with_info=True,
    as_supervised=True,
)
#I don't understand in the same why the loading and test_train image extraction is so different????
#I mean why not this kinda approach (train_image,train_label),(test_image,test_label)= datasets.cifar10.load_data()




Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/cats_vs_dogs/4.0.1...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]



Shuffling /root/tensorflow_datasets/cats_vs_dogs/incomplete.EWYYNV_4.0.1/cats_vs_dogs-train.tfrecord*...:   0%…

Dataset cats_vs_dogs downloaded and prepared to /root/tensorflow_datasets/cats_vs_dogs/4.0.1. Subsequent calls will reuse this data.


In [None]:
import matplotlib.pyplot as plt
get_label_name = metadata.features['label'].int2str  # creates a function object that we can use to get labels

# display 2 images from the dataset
for image, label in raw_train.take(5):
  plt.figure()
  plt.imshow(image)
  plt.title(get_label_name(label))

###Data Processing

In [None]:
## as seen from above the side of the images are different
#We try to have a image size which can compress but not strech coz then it makes hard to detect which image is this
#For this datset we well have image size=160
#IMG_SIZE=160 #160x160
import tensorflow as tf

def format(image,label):
  image=tf.cast(image,tf.float32)
  image=(image/127.5)-1
  image=tf.image.resize(image,(160,160))
  return image,label

In [None]:
train = raw_train.map(format)
validation = raw_validation.map(format)
test = raw_test.map(format)

In [None]:
for image, label in train.take(5):
  plt.figure()
  plt.imshow(image)
  plt.title(get_label_name(label))

Now thi size of every image is same

In [None]:
#To check the image sizes
for img, label in raw_train.take(5):
  print("Original shape:", img.shape)

for img, label in train.take(5):
  print("New shape:", img.shape)

Original shape: (262, 350, 3)
Original shape: (409, 336, 3)
Original shape: (493, 500, 3)
Original shape: (375, 500, 3)
Original shape: (240, 320, 3)
New shape: (160, 160, 3)
New shape: (160, 160, 3)
New shape: (160, 160, 3)
New shape: (160, 160, 3)
New shape: (160, 160, 3)


In [None]:
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000

train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)

##Picking a Pretrained Model
Pretrained Model=Mobilenet V2 by google
This have over a million pictures on 1000 classes

In [None]:
IMG_SHAPE=(160,160,3)
base_model=tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,include_top=False,
                                             weights='imagenet')
base_model.summary()


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_160_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [None]:
#Now we have to freeze the training  meaning the weights won't be changing during the training
base_model.trainable=False
base_model.summary()

###Model

In [None]:
#Adding Classifier= GlobalAveragePooling
global_avg=tf.keras.layers.GlobalAveragePooling2D()

#Ass theres only 2 class to predict one either cat or dog
#We  will add the output Dense layer in 1
prediction=tf.keras.layers.Dense(1)

#Creating Model
model=tf.keras.Sequential([base_model,
                           global_avg,
                           prediction])

###Compile

In [None]:
#compiling
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
              loss='BinaryCrossentropy',
              metrics=['accuracy'])

###Fitting/training

In [None]:
#Fit/Train
history=model.fit(train_batches,#whats this batches , and where is y train?
          epochs=3,
          validation_data=validation_batches,
          )


Epoch 1/3
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 66ms/step - accuracy: 0.6522 - loss: 2.8789 - val_accuracy: 0.8091 - val_loss: 1.4504
Epoch 2/3
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 57ms/step - accuracy: 0.8438 - loss: 1.1254 - val_accuracy: 0.8749 - val_loss: 0.9421
Epoch 3/3
[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 54ms/step - accuracy: 0.8958 - loss: 0.8062 - val_accuracy: 0.9089 - val_loss: 0.5936


###Evaluate

In [None]:
#Evaluate
loss,accuracy=model.evaluate(test_batches,verbose=2)
print(f'{accuracy:.03f}')

73/73 - 3s - 39ms/step - accuracy: 0.9144 - loss: 0.5648
0.914


###Saving

In [None]:
#Saving
model.save('dogs_vs_cats.h5')



###Loading the save model

In [None]:
##To load the saved model
new_model=tf.keras.models.load_model('dogs_vs_cats.h5')



FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/module-3.ipynb'