In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
"""
What is Deep Learning?

Deep Learning is a subset of Machine Learning (ML), which itself is a subset of Artificial Intelligence (AI).
    AI = making machines smart enough to do tasks that typically require human intelligence.
    ML = making machines learn from data and improve over time without explicit programming for each task.
    Deep Learning (DL) = a specialized branch of ML that uses artificial neural networks with many layers (hence "deep") to model and solve complex problems.

The term "deep" refers to the number of layers in the neural network. Traditional ML models are often shallow (few layers), but deep learning models have:
    Multiple hidden layers between input and output layers.
    These layers enable the model to learn hierarchical representations of data — from simple features in early layers to complex features in later layers.

"""

In [1]:
"""
+----------------------------+------------------------------------------------------+---------------------------------------------------------+
|         Feature           |                  Machine Learning (ML)              |                  Deep Learning (DL)                     |
+----------------------------+------------------------------------------------------+---------------------------------------------------------+
| Definition                 | Subset of AI where algorithms learn from data.       | Subset of ML that uses multi-layered neural networks.    |
| Data Dependency            | Works well with smaller datasets.                    | Requires large amounts of data to perform well.          |
| Feature Engineering        | Manual — features must be selected and created.      | Automatic — extracts features via neural layers.         |
| Execution Time            | Generally faster for small/medium datasets.          | Slower due to complex architectures and high computation.|
| Interpretability           | More interpretable (e.g., decision trees, SVM).      | Less interpretable (black-box models).                   |
| Hardware Dependency        | Can run on CPU easily.                               | Requires high-end GPU/TPU for training efficiency.       |
| Accuracy with Big Data     | Plateaus or improves marginally.                     | Accuracy improves significantly with more data.          |
| Examples of Algorithms     | Linear Regression, SVM, Decision Trees, KNN.         | CNN, RNN, LSTM, GAN, Transformers.                       |
| Use Cases                  | Fraud detection, recommendation systems, spam filter| Image recognition, speech-to-text, NLP tasks.            |
| Architecture               | Simple models, shallow structures.                  | Deep Neural Networks (many layers).                      |
| Learning Approach          | Supervised/semi-supervised/unsupervised             | Mostly supervised (some unsupervised like autoencoders). |
| Model Size                 | Relatively small models.                             | Large models with millions of parameters.                |
| Training Time              | Shorter training time.                               | Requires longer training time.                           |
| Scalability                | Harder to scale on large datasets.                   | Easily scales with data and computing power.             |
| Performance on Unstructured Data | Limited (needs preprocessing).               | Excellent (especially for images, audio, text).          |
+----------------------------+------------------------------------------------------+---------------------------------------------------------+

"""

'\n+----------------------------+------------------------------------------------------+---------------------------------------------------------+\n|         Feature           |                  Machine Learning (ML)              |                  Deep Learning (DL)                     |\n+----------------------------+------------------------------------------------------+---------------------------------------------------------+\n| Definition                 | Subset of AI where algorithms learn from data.       | Subset of ML that uses multi-layered neural networks.    |\n| Data Dependency            | Works well with smaller datasets.                    | Requires large amounts of data to perform well.          |\n| Feature Engineering        | Manual — features must be selected and created.      | Automatic — extracts features via neural layers.         |\n| Execution Time            | Generally faster for small/medium datasets.          | Slower due to complex architectures and h

In [None]:
"""
Artificial neurons and artificial neural networks

Neurons are fundamental units of the brain and nervous system that are responsible 
for receiving, processing, and transmitting information via electrical and chemical signals.

They are specialized cells found in both biological systems (like human brains) and artificial
systems (like neural networks in machine learning), although the two are conceptually inspired
but structurally very different.

How It Works (Signal Flow): Signal Reception: Dendrites receive chemical/electrical signals from other neurons.
                            Processing: Signals are summed in the cell body. If the total input exceeds a threshold, the neuron fires.
                            Signal Transmission: An action potential is generated and travels down the axon.
                            Synaptic Transmission: Neurotransmitters are released into the synaptic cleft and bind to receptors on the next neuron.
                            This process is known as synaptic communication.

Inspired by biological neurons, artificial neurons (used in artificial neural networks)
are mathematical functions that compute weighted sums and apply activation functions like ReLU, Sigmoid, Tanh, etc.

An artificial neuron takes multiple inputs, applies weights, adds a bias, 
and passes the result through an activation function
E.g. Output = Activation( w1*x1 + w2*x2 + ... + wn*xn + b )



Major types of Deep learning network:1.Perceptron
                                     2.multi layered perceptron (ANN)
                                     3.convolutional neural network 
"""

In [None]:
#Perceptron explanation

![Alt text](7.images\perceptron.png)


In [None]:
"""
A Perceptron is the simplest type of artificial neural network,
and is used for binary classification. It was invented by Frank Rosenblatt in 1958.

It mimics a single biological neuron.

Structure of a Perceptron:

1.Inputs means each column so x1=CGPA and x2=SGPA
2. w means weight whihc each inmput will have (weight is a learnable parameter that determines how much influence an input feature has on the final output.)
3.bias :the bias is a learnable constant that allows the model to shift the decision boundary.(a decision boundary is the surface (line, curve, plane, or hyperplane) 
                                                                                               that separates different classes in the feature space based on the learned model.)

4.Summation Function: computes weighted sum (summation of dot product of inputs and weights) and adds bias to the summation
5.Activation function: An activation function is a mathematical operation applied to the output of a neuron (z) to decide whether it should be activated (fired) or not.
  summation function is passed through activation function to get the output i.e. y=f(sum()) It adds non-linearity to a neural network, enabling it to learn complex patterns
  examples of activation function are sigmoid tanh relu etc
6.ouput has to be binary


How it learns: 
1. weights and bias are initialized (usually to 0 or small random numbers)
2. For a given input x, compute y
3. If prediction is wrong, update weights using:

"""

In [4]:
#Perceptron

data = {
    'CGPA': [9.1, 7.5, 8.3, 6.8, 7.9, 9.0, 6.2, 8.6, 7.2, 5.9,
             8.9, 9.2, 6.5, 7.8, 7.0, 8.1, 9.5, 6.7, 8.0, 5.5],
    'SGPA': [9.3, 7.2, 8.5, 6.7, 8.1, 9.4, 6.0, 8.7, 7.1, 5.8,
             9.1, 9.5, 6.3, 7.9, 6.8, 8.3, 9.6, 6.6, 8.2, 5.2],
    'Placed': ['Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'No',
               'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No']
}

df=pd.DataFrame(data)

d={'Yes':1,'No':0}
df['Placed']=df['Placed'].map(d)
df

Unnamed: 0,CGPA,SGPA,Placed
0,9.1,9.3,1
1,7.5,7.2,0
2,8.3,8.5,1
3,6.8,6.7,0
4,7.9,8.1,1
5,9.0,9.4,1
6,6.2,6.0,0
7,8.6,8.7,1
8,7.2,7.1,0
9,5.9,5.8,0


In [6]:
from sklearn.model_selection import train_test_split


X = df[['CGPA', 'SGPA']]  # Features
y = df['Placed']          # Target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

from sklearn.linear_model import Perceptron

model = Perceptron(max_iter=1000, eta0=0.1, random_state=42)
model.fit(X_train, y_train)

for key,val in d.items():
    if model.predict(pd.DataFrame({'CGPA':[8.5],'SGPA':[8.2]}))[0] ==val:
        print(key)


Yes


In [None]:
#multi layered perceptron (ANN)

![Alt text](7.images\ANN.png)


In [None]:
"""
# Multi-Layered Perceptron (MLP) - Detailed Explanation

# 1. Definition:
# Multi-Layered Perceptron (MLP) is an Artificial Neural Network (ANN) consisting of multiple layers of neurons (also called perceptrons).
# It is designed to model complex, non-linear relationships in data by learning hierarchical feature representations.

# 2. Extension of Single-Layer Perceptron:
# Unlike a single-layer perceptron which can only solve linearly separable problems,
# MLP overcomes this limitation by adding one or more hidden layers and using non-linear activation functions.
# This allows it to learn and approximate non-linear decision boundaries.

# 3. Linear vs Non-linear Separability:
# - Linear separability means classes can be separated by a single straight line (or hyperplane in higher dimensions) without error.
# - If such a hyperplane exists, data is linearly separable.
# - Otherwise, data is non-linearly separable, requiring more complex models like MLP.

# 4. Architecture of MLP:
# - Input Layer:
#   Accepts raw input features (x1, x2, x3, ..., xn). 
#   This layer only passes inputs forward and does not perform computations.
#
# - Hidden Layer(s):
#   One or more layers between input and output layers.
#   Each neuron in these layers:
#     * Receives inputs weighted by learned weights.
#     * Adds a bias term.
#     * Applies a non-linear activation function (e.g., ReLU, sigmoid, tanh).
#   Hidden layers transform input features into intermediate representations, Intermediate representations are the new features or 
#                                                                             abstractions that the network creates inside the hidden
#                                                                             layers by transforming raw input features through learned
#                                                                             weights, biases, and nonlinear activations.
#   enabling the network to learn non-linear and hierarchical patterns in data.

# - Output Layer:
#   Receives activations from the last hidden layer.
#   Combines them linearly (weighted sum + bias).
#   Applies an activation function suitable for the task (e.g., sigmoid for binary classification, softmax for multi-class).
#   Produces the final output/prediction.

# 5. Computation Flow (Forward Propagation):
# For each layer l:
#   z^(l) = W^(l) * a^(l-1) + b^(l)    # Linear transformation
#   a^(l) = f(z^(l))                   # Non-linear activation function
# where:
#   - W^(l) is the weight matrix of layer l
#   - b^(l) is the bias vector of layer l
#   - a^(l-1) is the activation from previous layer (input features for l=1)
#   - f(.) is the activation function (e.g., ReLU, sigmoid)

# 6. Training:
# MLPs are trained using supervised learning.
# - A loss function (e.g., cross-entropy, mean squared error) measures prediction error.
# - Backpropagation computes gradients of loss w.r.t weights and biases using chain rule.
# - Optimizers (e.g., gradient descent, Adam) update parameters to minimize loss.

# 7. Importance of Hidden Layers:
# - Introduce non-linearity to the model.
# - Extract and learn useful intermediate features.
# - Enable learning of complex patterns beyond linear decision boundaries.
# - Facilitate hierarchical feature abstraction (simple features combined into complex ones).

# 8. Summary:
# MLPs are powerful universal function approximators capable of modeling any continuous function given sufficient hidden units and data.
# They form the basis of deep learning when stacked into many layers.

"""


In [8]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import numpy as np

# 1. Load and preprocess the Iris dataset
iris = load_iris()
X = iris.data       # shape (150,4)
y = iris.target     # shape (150,)

# Scale features for better training
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Convert to tensors A tensor is a multi-dimensional array
X_tensor = torch.tensor(X_scaled, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_tensor, y_tensor, test_size=0.3, random_state=42, stratify=y_tensor
)

# 2. Define a simple MLP with one hidden layer
class SimpleMLP(nn.Module):
    def __init__(self, input_dim=4, hidden_dim=5, output_dim=3):
        super(SimpleMLP, self).__init__()
        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.activation = nn.ReLU()
        self.output = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # Forward pass with intermediate representation extraction
        z1 = self.hidden(x)          # Linear transform (hidden layer)
        a1 = self.activation(z1)     # Hidden layer activations (intermediate representation)
        out = self.output(a1)        # Output layer (logits)
        return out, a1               # Return both final output and intermediate reps

# Instantiate model
model = SimpleMLP()

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# 3. Training loop
epochs = 100
for epoch in range(epochs):
    model.train()
    
    optimizer.zero_grad()
    outputs, _ = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 20 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

# 4. Evaluate on test data and print accuracy
model.eval()
with torch.no_grad():
    outputs, intermediate_activations = model(X_test)
    _, predicted = torch.max(outputs, 1)
    acc = accuracy_score(y_test, predicted)
    print(f"\nTest Accuracy: {acc*100:.2f}%")

# 5. Visualize some intermediate representations for test samples
print("\nIntermediate Representations (Hidden layer activations) for test samples:")
print(intermediate_activations.numpy())


Epoch [20/100], Loss: 0.7768
Epoch [40/100], Loss: 0.6019
Epoch [60/100], Loss: 0.4525
Epoch [80/100], Loss: 0.3176
Epoch [100/100], Loss: 0.2221

Test Accuracy: 93.33%

Intermediate Representations (Hidden layer activations) for test samples:
[[3.7363749  1.8109498  2.4422274  1.6650616  0.        ]
 [2.0104938  0.28940052 0.55733836 0.8819067  0.        ]
 [2.6664417  0.7620626  1.1081314  1.109611   0.        ]
 [1.161186   0.31915683 0.41723835 0.9242025  0.        ]
 [2.1518424  1.0848136  1.0675782  1.2661293  0.        ]
 [3.0458632  2.5037713  2.3608027  2.021271   0.        ]
 [3.0107298  0.32392246 0.5312183  0.9194027  0.        ]
 [2.2813072  0.         0.01360255 0.6162778  0.        ]
 [0.         0.         0.         0.         0.        ]
 [2.9028175  2.0751238  2.0483236  1.9226776  0.        ]
 [0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.        ]
 [4.320433   1.9523396  2.4099846  1.7166615  0.        ]
 [

In [None]:
#convolutional neural network 

![Alt text](7.images\CNN.webp)


In [None]:
"""
A **Convolutional Neural Network (CNN)** is a type of deep learning model specifically designed for **processing data with grid-like topology**, such as **images** (2D grids of pixels) or **audio spectrograms** (1D or 2D). CNNs are a subclass of **Artificial Neural Networks (ANNs)** that are particularly effective for **visual tasks**—classification, detection, segmentation, etc.

---

## 🔍 Why Use CNNs?

Traditional ANNs **struggle with high-dimensional inputs** like images. For example, a 256×256 RGB image has 256×256×3 = **196,608 input features**. Fully connecting all neurons would result in **huge computation and overfitting**.

CNNs **reduce the number of parameters**, capture **spatial features**, and provide **translation invariance** using three core concepts:

1. **Local connectivity**
2. **Parameter sharing**
3. **Downsampling (pooling)**

---

## 🧱 CNN Architecture Components (Layer by Layer)

### 1. **Input Layer**

* Accepts the raw image (e.g., 28×28 grayscale image → shape = 28×28×1)

---

### 2. **Convolutional Layer (`Conv2D`)**

* **Purpose**: Detect **features** like edges, corners, textures, shapes, etc.
* **How it works**: Uses small filters (kernels), e.g., 3×3, that **slide over** the input image and perform **element-wise multiplication and summation**.

📌 **Mathematically:**

$$
\text{Feature map} = \text{Input} * \text{Kernel}
$$

* Each kernel produces **one feature map**
* **Parameters** in kernels are learned during training

#### 🔑 Hyperparameters:

* **Number of filters** (feature detectors)
* **Kernel size** (e.g., 3×3, 5×5)
* **Stride** (how many pixels to move the filter each step)
* **Padding** (zero-padding to retain dimensions)

---

### 3. **Activation Layer (ReLU)**

* Applies **non-linearity** to feature maps

📌 **Why?** CNN without non-linearity becomes just a linear model.

* Most common: **ReLU (Rectified Linear Unit)**

  $$
  \text{ReLU}(x) = \max(0, x)
  $$

---

### 4. **Pooling Layer (`MaxPooling2D`)**

* **Purpose**: Downsample the spatial dimension → reduce computation and overfitting

📌 Types:

* **Max pooling**: selects maximum value in a patch
* **Average pooling**: takes average

E.g., a 2×2 max pool reduces 4 values to 1.

---

### 5. **Flatten Layer**

* Converts the final pooled 2D feature maps into a **1D vector** for the dense layers.

---

### 6. **Fully Connected Layer (Dense Layer)**

* Traditional neural network layer
* Learns to map the extracted features to class scores.

E.g., last Dense layer might have `n` neurons for `n` classes with a **Softmax** activation for classification.

---

### 7. **Output Layer**

* Depends on the task:

  * **Binary classification**: 1 neuron with `sigmoid`
  * **Multi-class classification**: `n` neurons with `softmax`
  * **Regression**: linear activation

---

## 🧠 Summary of Data Flow

```
Image → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Dense → Output
```

---

## 🖼️ Visual Example (Simplified)

```
Input Image (28×28×1)
   ↓
Conv Layer (3×3×32 filters) → Output: 26×26×32
   ↓
ReLU
   ↓
MaxPooling (2×2) → Output: 13×13×32
   ↓
Conv Layer (3×3×64) → Output: 11×11×64
   ↓
ReLU
   ↓
MaxPooling (2×2) → Output: 5×5×64
   ↓
Flatten → 1600 units
   ↓
Dense (128)
   ↓
Dense (10) → Softmax (for classification into 10 classes)
```

---

## 🧠 Key Benefits of CNNs

| Feature                    | Benefit                                                             |
| -------------------------- | ------------------------------------------------------------------- |
| **Local receptive fields** | Neurons focus on small regions, making it computationally efficient |
| **Weight sharing**         | Fewer parameters than fully connected layers                        |
| **Translation invariance** | Ability to detect features anywhere in the image                    |
| **Compositionality**       | Early layers detect edges, later layers detect shapes or objects    |

---

## 🧪 CNN in PyTorch (Simple Example)

```python
import torch
import torch.nn as nn
import torch.nn.functional as F

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # Conv layer 1: in_channels=1, out_channels=32, kernel=3
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.fc1 = nn.Linear(64 * 5 * 5, 128)
        self.fc2 = nn.Linear(128, 10)  # For 10 classes

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # Conv1 + ReLU + Pool
        x = self.pool(F.relu(self.conv2(x)))  # Conv2 + ReLU + Pool
        x = x.view(-1, 64 * 5 * 5)            # Flatten
        x = F.relu(self.fc1(x))               # FC1
        x = self.fc2(x)                       # Output layer
        return x
```

---

## 📦 Applications of CNNs

| Domain                | Example                                   |
| --------------------- | ----------------------------------------- |
| **Computer Vision**   | Image classification (e.g., MNIST, CIFAR) |
| **Medical Imaging**   | Tumor detection, X-ray analysis           |
| **Self-driving Cars** | Object and lane detection                 |
| **Face Recognition**  | Feature extraction and classification     |
| **Document Analysis** | OCR (Optical Character Recognition)       |

---

## 🚧 Limitations

* Not rotationally or scale invariant by default
* Still needs large amounts of labeled data
* Struggles with very complex contextual understanding (better handled with transformers today)

---

Want to implement a working CNN on a real dataset like MNIST or CIFAR-10? Let me know and I’ll give you a full working code walkthrough in PyTorch or TensorFlow.

"""