# 🧠 Convolutional Neural Networks (CNNs) - Architecture

---

## 📐 Architecture Components

### 1. 🧭 Input Layer
- Accepts image data, e.g., shape **(28×28×3)** for RGB images.

---

### 2. 🧲 Convolutional Layer
- Applies filters (kernels) to scan the image and detect features like edges, textures, and shapes.
- Each filter generates a **feature map**.

**Mathematically:**

$$(f * x)(i, j) = \sum_m \sum_n x(i + m, j + n) \cdot f(m, n)$$

Where:
- \( x \) = input image
- \( f \) = filter (kernel)
- \( * \) = convolution
- \( (i, j) \) = output location

---

### 3. ⚡ Activation Function (ReLU)
- Applies non-linearity:

  $$\text{ReLU}(x) = \max(0, x)$$

- Helps learn complex patterns.

---

### 4. 🔽 Pooling Layer (Subsampling)
- Reduces spatial dimensions, improves efficiency.
- Commonly uses **Max Pooling** (e.g., 2×2).
- Helps achieve **translation invariance**.

---

### 5. 🧮 Fully Connected (Dense) Layer
- Flattens the feature maps into a vector.
- Passes through one or more dense layers to make predictions.

---

### 6. 🎯 Output Layer
- Outputs class probabilities using **Softmax**:

  $$\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$$

---

## 📈 Why CNNs Work Well for Images

- **Local connectivity**: Focus on small image patches.
- **Weight sharing**: Filters reused across space → fewer parameters.
- **Translation invariance**: Pooling helps recognize patterns in varied positions.
- **Hierarchical learning**: Layers learn from edges → shapes → objects.

---

## 🔧 Key Considerations When Writing CNN Code

### 1. 📥 Input Data
- ✅ Ensure consistent image size (e.g., 224×224×3).
- ✅ Normalize pixel values ([0, 1] or standardize).
- ✅ Use data augmentation for robustness.

---

### 2. 🏗 Model Architecture
- ✅ Use small filters (e.g., 3×3), padding='same'.
- ✅ Add pooling and dropout layers.
- ✅ Batch normalization to stabilize training.

---

### 3. 🔀 Activation Functions
- ✅ Use ReLU after each convolution.
- ❗ Avoid Sigmoid/Tanh in hidden layers unless needed.

---

### 4. 🧾 Output Layer
- ✅ Use `softmax` for multi-class, `sigmoid` for binary.
- ✅ Match loss function to activation:
  - `categorical_crossentropy` for one-hot
  - `sparse_categorical_crossentropy` for integer labels

---

### 5. ⚙️ Optimizer
- ✅ Try Adam, SGD, or RMSprop.
- ✅ Start with learning rate ~0.001.
- ❗ Use learning rate schedulers or early stopping for long training.

---

### 6. 🧪 Training Strategy
- ✅ Track train and val accuracy/loss.
- ✅ Use `EarlyStopping`, `ModelCheckpoint`.
- ✅ Consider `KFold` or cross-validation for small datasets.

---

### 7. 🚀 Hardware Utilization
- ✅ Use GPU for acceleration (e.g., via Colab or CUDA).
- ✅ Use `ImageDataGenerator` or `tf.data` for large datasets.

---

### 8. ⚖️ Model Complexity
- ❗ Don't overbuild models for small datasets.
- ✅ Start small → add layers as needed.

---

### 9. 🔁 Reproducibility
- ✅ Set seeds for random number generators.
- ✅ Save models, use `TensorBoard` or logging tools (e.g., WandB).

---

## 🎯 Real-World Applications

- ✅ Image classification (Cats vs Dogs)
- ✅ Object detection (YOLO, SSD)
- ✅ Face recognition
- ✅ Medical imaging (CT, MRI)
- ✅ OCR and document analysis

---

Would you like a TensorFlow or PyTorch code example to go with this?


In [None]:
# ## 📌 1. Imports and Setup
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import random
import os

# ## 📌 2. Reproducibility
SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)
random.seed(SEED)
os.environ['PYTHONHASHSEED'] = str(SEED)

# ## 📌 3. Data Preparation
# For demonstration, use CIFAR-10 dataset (10 classes, RGB images of 32x32)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0

# One-hot encode labels
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# Data augmentation
train_datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)
train_generator = train_datagen.flow(x_train, y_train, batch_size=64)

# ## 📌 4. CNN Model Architecture
model = Sequential([
    Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=(32, 32, 3)),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),

    Conv2D(64, (3, 3), padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),

    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# ## 📌 5. Compilation
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# ## 📌 6. Callbacks
callbacks = [
    EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
    ModelCheckpoint('best_model.h5', save_best_only=True)
]

# ## 📌 7. Training
history = model.fit(
    train_generator,
    validation_data=(x_test, y_test),
    epochs=30,
    callbacks=callbacks
)

# ## 📌 8. Evaluation
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

# ## 📌 9. Summary
model.summary()


# 🔁 Recurrent Neural Networks (RNNs)

---

## 📐 Architecture Components

### 1. 🧭 Input Layer
- Accepts sequential data (e.g., text, time-series).
- Shape: **(batch_size, time_steps, features)**.

---

### 2. 🔁 Recurrent Layer
- Processes sequence **one time step at a time**, maintaining a hidden state.
- Core idea: **memory of previous inputs** helps predict current/future values.

**Mathematical Representation:**

Let:
- \( x_t \) = input at time t
- \( h_t \) = hidden state at time t
- \( W \), \( U \), \( b \) = learnable parameters

$$
h_t = \tanh(Wx_t + Uh_{t-1} + b)
$$

---

### 3. ⛓ Variants of RNN

- **Vanilla RNN** – simple, but suffers from vanishing gradients.
- **LSTM (Long Short-Term Memory)** – solves long-term dependency problems with gates (input, forget, output).
- **GRU (Gated Recurrent Unit)** – similar to LSTM, simpler, faster to train.

---

### 4. 🧮 Fully Connected (Dense) Layer
- Final hidden state is passed to a dense layer for classification or regression.

---

### 5. 🎯 Output Layer
- Uses `softmax` for classification, `sigmoid` or `linear` for regression.

---

## ⚙️ Why RNNs Are Useful

- Capture **temporal dependencies** and **context** in sequences.
- Great for tasks where **order matters**.

---

## 🎯 Applications

- Natural Language Processing (NLP): sentiment analysis, language modeling.
- Time Series Forecasting: stock prices, weather.
- Speech Recognition.
- Music Generation.
- Anomaly Detection.

---

## 🔧 Key Considerations When Writing RNN Code

### 1. 📥 Input Data
- ✅ Shape: (samples, time_steps, features)
- ✅ Tokenize and pad sequences for text data.
- ✅ Normalize time series inputs.

---

### 2. 🔁 RNN Layer Choice
- ✅ Use `SimpleRNN` for small, simple problems.
- ✅ Use `LSTM` or `GRU` for long sequences or better performance.

---

### 3. 🔀 Activation Functions
- ✅ Use `tanh` and `sigmoid` in LSTM/GRU.
- ✅ Use `relu` in dense layers if needed.

---

### 4. 🧾 Output Layer
- ✅ `softmax` for multi-class output.
- ✅ `sigmoid` for binary classification.
- ✅ `linear` for regression.

---

### 5. ⚙️ Optimizer and Loss
- ✅ `categorical_crossentropy` or `sparse_categorical_crossentropy` for classification.
- ✅ `mean_squared_error` or `mean_absolute_error` for regression.
- ✅ Use `Adam` or `RMSprop` optimizers.

---

### 6. ⏱ Sequence Handling
- ✅ Use `return_sequences=True` if stacking RNNs.
- ✅ Use `return_state=True` if you want to keep states for prediction chaining.
- ✅ Use masking for padded inputs.

---

### 7. 🧪 Training Strategy
- ✅ Shuffle data only if sequence order doesn't matter.
- ✅ Use `EarlyStopping` and `ModelCheckpoint`.

---

### 8. ⚙️ Regularization
- ✅ Use `Dropout` or `recurrent_dropout`.
- ✅ Clip gradients if exploding gradients are observed.

---

### 9. 🧠 Model Complexity
- ❗ Don't over-stack RNNs.
- ✅ Start with 1–2 layers, tune from there.

---

### 10. 🔁 Reproducibility
- ✅ Set random seeds.
- ✅ Log training metrics (TensorBoard, WandB).
- ✅ Save model weights.

---

## 🧠 Sample Use Cases

- Sentiment Analysis on movie reviews.
- Predict next word in a sentence.
- Forecast sales or electricity usage.
- Detect anomalies in system logs.

---

Would you like a working TensorFlow code example for this RNN setup?


In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# 1. Generate Dummy Sequential Data
np.random.seed(42)
time_steps = 10
features = 5
samples = 1000

X = np.random.rand(samples, time_steps, features)
y = np.random.randint(0, 2, size=(samples, 1))  # Binary classification

# 2. Normalize Features
scaler = MinMaxScaler()
X = X.reshape(-1, features)
X = scaler.fit_transform(X)
X = X.reshape(samples, time_steps, features)

# 3. Train/Test Split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# 4. Build RNN Model
model = Sequential([
    LSTM(64, return_sequences=False, input_shape=(time_steps, features), dropout=0.2, recurrent_dropout=0.2),
    Dense(32, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')  # Binary classification
])

# 5. Compile Model
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# 6. Callbacks
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
checkpoint = ModelCheckpoint('best_model.h5', save_best_only=True)

# 7. Train the Model
history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,
    batch_size=32,
    callbacks=[early_stop, checkpoint],
    verbose=1
)

# 8. Evaluate
loss, accuracy = model.evaluate(X_val, y_val)
print(f"Validation Loss: {loss:.4f} - Accuracy: {accuracy:.4f}")