# Deep Learning + NLP

* Goal: Basic neural network understanding + run deep models (not training)

## Neuron

* Ek artificial neuron ek simple computational unit hai jo kuch inputs leta hai, unko weight se scale karta hai, ek bias add karta hai, aur fir ek activation function se pass karta hai. Ye biological neuron ka bahut simplified model hai.

* Why in AI/ML : 

    - Individual neurons simple transformation karte hain. Jab bahut sare neurons ko jodte hain, network complex patterns aur features ko represent kar sakta hai. Neuron se feature extraction aur nonlinearity aati hai.

    - Weights = neuron ki memory / parameters.

    - Bias = threshold ko shift karta hai.

    - Activation = nonlinearity; bina activation ke stacked linear layers ek hi linear function ho jayenge, isliye activation zaroori hai.

In [2]:
import numpy as np

# single input vector (for example ek data point ke features)
x = np.array([0.5, -1.2, 2.0]) #3 features

# weights aur bias (ye model ke parameters hain)
w = np.array([0.4, -0.6, 0.1])
b= 0.2

# linear combination: z = w.x + b
z = np.dot(w , x) + b

# activation: y = relu(z) (ReLU = max(0, z)) 
y = max(0 , z)

print("z (linear output) =", z)
print("y (after ReLU) =", y)

# - np.dot(w, x) computes weighted sum.
# - ReLU adds nonlinearity and helps model complex relationships. The ReLU (Rectified Linear Unit) is a popular activation function in neural networks defined by the formula \(f(x)=\max (0,x)\). It is computationally efficient and introduces non-linearity by outputting the input \(x\) if it is positive and \(0\) if it is negative

z (linear output) = 1.3199999999999998
y (after ReLU) = 1.3199999999999998


## Layer

* Layer neurons ka ek group hota hai jo ek saath operate karte hain. Input layer, hidden layers, aur output layer common hain. Har layer multiple neurons ka weighted sum aur activation perform karti hai.

* Why in AI/ML :

    - Layers hierarchical features banate hain. Pehli layers low-level features detect karte hain (edges, blobs), agle layers higher-level concepts (shapes, objects). Layering se network deep features seekh sakta hai.

* Types :

    - Dense / Fully connected layer: har neuron previous layer ke sab inputs leta hai.

    - Convolutional layer: local patterns pe focus karta hai (images).

    - Recurrent / Transformer layers: sequence data ke liye.

In [None]:
# batch of 2 samples, each with 4 features

X = np.array([[0.1, 0.2, 0.3, 0.4],
              [1.0, -0.5, 0.0, 0.2]])

#  weights shape: (input_dim, units)

w = np.random.randn(4 , 3) * 0.1 #  3 neurons in this layer
b = np.zeros(3)  # biases for 3 neurons

# linear combination: Y = X.w + b
Z = np.dot(X , w) + b

# apply activation (sigmoid for example)
Y = 1 / (1 + np.exp(-Z))

print("Z shape:", Z.shape)
print("Y shape:", Y.shape)

# - Each row of Y is the output for one sample.
# - We used matrix multiplication to compute all neurons in parallel.


Z shape: (2, 3)
Y shape: (2, 3)


## Activation functions

* Activation function neuron ke linear output ko transform karne wali function hoti hai jisse model non-linear patterns learn kar sake.

* Agar layers ke beech nonlinearity nahi hogi, to stacked layers ka combination ek hi linear mapping hi rahega. Nonlinearity hi complex decision boundaries banati hai.

* Common activations :

    - ReLU (Rectified Linear Unit) = max(0, x). Simple, fast, sparse activation. Bahut common for hidden layers.

    - Sigmoid = 1 / (1 + e^-x). Output 0-1. Use: binary probability output (but hidden layers me kabhi vanishing gradient problem kar sakta hai).

    - Tanh = (e^x - e^-x)/(e^x + e^-x). Output -1 to 1. Centered at zero.

    - Softmax = scores ko probability distribution me convert karta hai (multi-class output layer).

In [10]:
def relu(x):
    return np.maximum(0 , x)

def sigmoid(x) :
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def softmax(x):
     # numerically stable softmax for 2D array (batch)
     ex = np.exp(x - np.max(x, axis=1, keepdims=True))
     return ex / np.sum(ex, axis=1, keepdims=True)

x = np.array([[-2.0, -0.5, 0.0, 0.5, 2.0]])

print("ReLU:", relu(x))
print("Sigmoid:", sigmoid(x))
print("Tanh:", tanh(x))
print("Softmax:", softmax(x))

# - Softmax shown for batch input shape (1,5). It converts to probabilities summing to 1.

ReLU: [[0.  0.  0.  0.5 2. ]]
Sigmoid: [[0.11920292 0.37754067 0.5        0.62245933 0.88079708]]
Tanh: [[-0.96402758 -0.46211716  0.          0.46211716  0.96402758]]
Softmax: [[0.01255471 0.0562663  0.09276745 0.15294766 0.68546388]]


# deep learning

* Hierarchical feature learning: Deep networks layers ke through simple features se complex features banate hain. Example: image me pehle layer edges detect karegi, beech ki layers patterns, top layers objects. Is combination se complex tasks solve hote hain.

* Nonlinearity + depth = expressive power: Nonlinear activations allow layers ko complex functions approximate karne me help. Zyada layers se model complicated mappings sikhta hai.

* Data driven representation: Classical ML me features manually design karne padte the. Deep learning raw data se khud features learn karta hai (feature engineering ki zaroorat kam hoti hai).

* Scale with data and computation: Large datasets aur GPU compute se networks ko zyada accurate bana sakte ho.

* Optimization and regularization tricks: Techniques jaise batch normalization, dropout, and good optimizers (Adam) training ko stable karte hain. Ye sab practical techniques hain jo deep learning ko feasible banati hain.

* Transfer learning: Pretrained deep models kisi ek task pe train karke dusre similar tasks me reuse ho sakte hain, jisse training data kam me bhi acha result milega.

In [None]:
# keras_inference_demo.py
# Install requirement: pip install tensorflow
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# 1) Define a simple sequential model
# - Input shape = 20 features
# - Two hidden dense layers with ReLU
# - Output layer with softmax for 3-class classification
model = keras.Sequential([
    layers.Input(shape=(20,), name="input_layer"),
    layers.Dense(64, activation="relu", name="hidden_dense_1"),
    layers.Dense(32, activation="relu", name="hidden_dense_2"),
    layers.Dense(3, activation="softmax", name="output_layer")
])

# 2) Print model summary to see parameters and shapes
model.summary()
# Comments:
# - Dense layers contain weights and biases that will be learned during training.
# - ReLU in hidden layers introduces nonlinearity.
# - Softmax in output layer converts scores to probabilities for 3 classes.

# 3) Create random input and run a forward pass (inference)
# Suppose batch of 5 samples, each with 20 features
x_random = np.random.randn(5, 20).astype(np.float32)

# Predict (forward pass). No training involved here.
predictions = model(x_random, training=False)  # or model.predict(x_random)

print("Predictions shape:", predictions.shape)
print("Predictions (probabilities):\n", predictions.numpy())

# Explanation:
# - model(x_random, training=False) performs computation through all layers.
# - Output is a (5,3) array of probabilities for each sample.
# - We did not call compile or fit because we are only demonstrating forward pass.
