
# Deep Learning Notes with TensorFlow/Keras: ANN, RNN, CNN, LSTM, Transformers

This notebook explains **ANN, RNN, CNN, LSTM, and Transformers** with intuition, equations, and **TensorFlow/Keras implementations**.

> Requirements: `tensorflow`, `numpy`, `matplotlib` (install via `pip install tensorflow numpy matplotlib`)



---
## 1. Artificial Neural Network (ANN / MLP)

### Intuition
- Dense layers with nonlinear activation functions (ReLU, GELU, etc.)
- Use for tabular data or when features are pre-engineered.


In [None]:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Simple ANN
model_ann = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(20,)),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dense(2, activation='softmax')
])

model_ann.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_ann.summary()



---
## 2. Recurrent Neural Network (RNN)

### Intuition
- Good for sequential data, but struggles with long dependencies.


In [None]:

# Simple RNN for sequence classification
model_rnn = keras.Sequential([
    layers.Embedding(input_dim=5000, output_dim=32, input_length=100),
    layers.SimpleRNN(64, return_sequences=False),
    layers.Dense(2, activation='softmax')
])

model_rnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_rnn.summary()



---
## 3. Long Short-Term Memory (LSTM)

### Intuition
- Handles long-term dependencies better than vanilla RNNs.
- Commonly used for text and time series.


In [None]:

# LSTM model
model_lstm = keras.Sequential([
    layers.Embedding(input_dim=5000, output_dim=64, input_length=100),
    layers.LSTM(128, return_sequences=False),
    layers.Dense(2, activation='softmax')
])

model_lstm.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_lstm.summary()



---
## 4. Convolutional Neural Network (CNN)

### Intuition
- Uses convolutional layers to capture spatial patterns in images.


In [None]:

# CNN model for MNIST-like images
model_cnn = keras.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(10, activation='softmax')
])

model_cnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_cnn.summary()



---
## 5. Transformer (Self-Attention)

### Intuition
- Uses self-attention to capture global dependencies in sequences.
- Backbone of models like BERT and GPT.


In [None]:

# Simple Transformer block using Keras
class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super().__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = keras.Sequential([
            layers.Dense(ff_dim, activation="relu"),
            layers.Dense(embed_dim),
        ])
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

# Tiny Transformer for classification
inputs = layers.Input(shape=(100, 64))
x = TransformerBlock(64, 4, 128)(inputs)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(2, activation="softmax")(x)

model_transformer = keras.Model(inputs=inputs, outputs=outputs)
model_transformer.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model_transformer.summary()
