# CNNs and Image Tasks

* Convolutional Neural Network ek aisa deep learning model hota hai jo images ko samajhne ke liye banaya gaya hai. Ye model image ke pixels se automatically pattern nikalta hai jaise edges, curves, textures ya shapes.

* CNN multiple layers ka use karta hai jisse model simple feature se complex feature tak seekhta jise human vision jaise working milti hai.

* Isme convolution, pooling ( that reduces the spatial dimensions (width and height) of a feature map while retaining important information ), activation aur fully connected layers milkar ek powerful pipeline banate hain jo image ko mathematical form me convert karke meaning samajhte hain.

* CNN image classification, object detection, face recognition jaise tasks ke liye industry standard model ban chuka hai kyunki ye manual feature engineering ki dependency ko hata deta hai.

### Why in AI/ML :

* CNN images ko raw pixels se samajh leta hai isliye AI me visual tasks ke liye sabse reliable technique hai. Isse machine ko dekhna aur identify karna aata hai.

* CNN ka weight sharing aur local connectivity model ko computationally fast banata hai jisse bade image datasets ko efficiently train kiya ja sakta hai.

* Ye overfitting ko naturally kam karta hai kyunki ye pure image ko ekdum detail me memorize nahi karta balki general features seekhta hai.

* Real world me jitna bhi computer vision ka kaam hota hai jaise medical image analysis, self driving cars, CCTV analytics, usme CNN core foundation hota hai.

## CNN ke main components :


### # Convolution Layer :

* Image ke chote chote patch par filter slide karta hai aur important pattern detect karta hai.

### # Pooling Layer :

* Ye image ke size ko kam karta hai taa ki model fast ho jaye. (width and height)

### # Activation (ReLU) :

* Negative values ko zero karta hai jisse model nonlinear ban jata hai.

### # Fully Connected Layer:

* Final decision yahi layer karti hai jaise dog ya cat predict karna.


In [2]:
import torch
import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        # yaha 1 = grayscale channels, 8 = filters
        # kernel 3 means filter 3x3 patch par compute karega
        self.conv = nn.Conv2d(1 , 8 , kernel_size=3)

        # 2x2 region me maximum value choose karta hai
        self.pool = nn.MaxPool2d(2)

        self.relu = nn.ReLU()

        # final class prediction
        self.fc = nn.Linear(8*26*26 , 2)

    def forward(self , x):
        return self.conv(x)

# Exammple :

In [3]:
import torch.optim as opt

model = CNN()
loss = nn.CrossEntropyLoss()    # loss for classification
opt = opt.Adam(model.parameters() , lr=0.0001)  # update weights

# random fake images: batch=5, channel=1, size=28x28
images = torch.randn(5 , 1 , 28 , 28)

# dummy output classes
labels = torch.tensor([0, 1, 0, 1, 0])

for i in range(1):

    # CNN image se feature map extract karega
    out = model(images)

    # flatten because fully connected layer me vector chahiye
    out = out.view(out.size(0), -1)  

    # predicted output aur real label ka difference
    loss = loss(out , labels)

    # gradients calculate
    loss.backward()

    # parameters update jisse model improve ho
    opt.step()

    opt.zero_grad()  

    print("Item :" , i+1 , "Loss :" , loss.item())


Item : 1 Loss : 7.9719085693359375


# Use pre-trained MobileNet or ResNet for inference

### CNNs with Pre Trained MobileNet ya ResNet (Inference)

* Pre trained models wo hote hain jo already bade datasets par train kiye gaye hote hain jaise ImageNet. Inme million images aur thousand categories use hoti hain jisse model powerful visual features seekh leta hai.

* MobileNet ek lightweight model hota hai jo mobile aur low power devices par fast inference ke liye design kiya gaya hai. Ye depthwise separable convolution use karta hai jisse speed badh jaati hai.

* ResNet ek deep aur high accuracy model hota hai jo residual blocks use karta hai. Residual connections model ko 50 layer, 101 layer jaise deep networks train karne me madad karte hain bina gradient vanish ki problem ke.

* Inference ka matlab hota hai ki hum model ko train nahi karte. Bas already trained model ki help se image ka output predict karte hain, jaise class, feature vector ya probability.

#### Why in AI/ML :

* Pre trained models time aur compute dono bachate hain. Aapko scratch se model train nahi karna padta, fir bhi world class accuracy milti hai.

* Ye models general features seekh chuke hote hain jaise edges, texture, shapes, isliye new dataset par bhi achcha perform karte hain even without training.

* Inference se developer quickly prototype kar sakta hai, real time systems me deploy kar sakta hai aur production ready pipeline bana sakta hai.

* Real world me CCTV analytics, healthcare, ecommerce image search, robotics sabhi me pre trained CNN models fastest aur reliable choice hote hain.

### MobileNet / ResNet inference PyTorch example
MobileNet Example :

In [None]:
import torch
import torchvision.models as models
import torchvision.transforms as T
from PIL import Image

# load pretrained model
model = models.mobilenet_v2(weights = models.MobileNet_V2_Weights.IMAGENET1K_V1)

# eval mode kyunki hum inference kar rahe hain
model.eval()

# image preprocessing
transform = T.Compose([
    T.Resize(256), 
    T.CenterCrop(224),  
    T.ToTensor(),  
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])  

img = Image.open("test.jpg")

# batch dimension add kiya
img_t = transform(img).unsqueeze(0)  

while torch.no_grad():

    # yaha model image ka class prediction return karega
    out = model(img_t)

prob = torch.softmax(out, dim=1)
top_class = prob.argmax().item()

# final predicted class index
print(top_class)