# Neural Network — A Simple Perceptron

*Theory + Practical in one Colab notebook.*

---

## Theory — Short Answers

**Q1. What is deep learning, and how is it connected to artificial intelligence?**

Deep learning is a subset of machine learning (which in turn is a subset of AI) that uses multi-layered neural networks to learn hierarchical representations of data. It enables machines to learn complex patterns directly from raw data with minimal feature engineering.



**Q2. What is a neural network, and what are the different types of neural networks?**

A neural network is a set of layers of interconnected nodes (neurons) that transform input data through weighted connections and activation functions to produce outputs. Common types include: feedforward (MLP), convolutional (CNN), recurrent (RNN, LSTM/GRU), autoencoders, and transformers.



**Q3. What is the mathematical structure of a neural network?**

A neural network is composed of layers where each layer performs an affine transformation followed by a non-linear activation: z = W x + b, a = φ(z). Training adjusts weights W and biases b to minimize a loss function over data.



**Q4. What is an activation function, and why is it essential in neural networks?**

An activation function introduces non-linearity to allow the network to learn complex relationships. Without activation functions, the network would be equivalent to a single linear transformation regardless of depth.



**Q5. Could you list some common activation functions used in neural networks?**

Common activations: Sigmoid, Tanh, ReLU (Rectified Linear Unit), Leaky ReLU, ELU, Softmax (for multiclass output).



**Q6. What is a multilayer neural network?**

A multilayer neural network (also called a multilayer perceptron) has one or more hidden layers between input and output. Each hidden layer allows the model to learn higher-level features.



**Q7. What is a loss function, and why is it crucial for neural network training?**

A loss function measures the difference between predicted outputs and true targets. Training aims to minimize this loss using optimization algorithms; the choice of loss depends on the task (e.g., cross-entropy for classification, MSE for regression).



**Q8. What are some common types of loss functions?**

Common losses: Mean Squared Error (MSE), Mean Absolute Error (MAE), Binary Cross-Entropy, Categorical Cross-Entropy, Hinge loss.



**Q9. How does a neural network learn?**

It learns through iterative optimization: forward pass computes predictions and loss; backward pass (backpropagation) computes gradients of loss w.r.t parameters; an optimizer updates parameters using gradients.



**Q10. What is an optimizer in neural networks, and why is it necessary?**

An optimizer updates network parameters to minimize loss using gradients. It controls step sizes and can include momentum or adaptive learning rates to improve convergence.



**Q11. Could you briefly describe some common optimizers?**

SGD (Stochastic Gradient Descent), SGD with momentum, Adam (adaptive moment estimation), RMSprop, Adagrad. Adam is popular due to adaptive learning rates and good default performance.



**Q12. Can you explain forward and backward propagation in a neural network?**

Forward propagation passes input through layers to compute output. Backward propagation (backprop) computes gradients of loss wrt parameters using chain rule, propagating errors from output back to earlier layers.



**Q13. What is weight initialization, and how does it impact training?**

Weight initialization sets initial parameter values. Proper initialization (e.g., Xavier/Glorot, He initialization) helps avoid vanishing/exploding gradients and leads to faster, more stable training.



**Q14. What is the vanishing gradient problem in deep learning?**

When gradients become extremely small while backpropagating through many layers, early layers learn very slowly. This can make training deep networks difficult, particularly with sigmoid/tanh activations.



**Q15. What is the exploding gradient problem?**

When gradients grow very large during backpropagation, causing unstable updates and possible divergence. Techniques like gradient clipping and careful initialization help mitigate this.



## Practical — Keras / TensorFlow Examples

Run these cells in Google Colab (ensure `tensorflow` is installed). Each example uses a small dataset or synthetic data and short epochs for speed.

In [None]:
# Practical 1: Simple perceptron-like model for binary classification (toy example)
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

# Simple dataset (logical OR)
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([0,1,1,1])

model = Sequential()
model.add(Dense(1, input_dim=2, activation='sigmoid'))
model.compile(optimizer=SGD(learning_rate=0.5), loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X, y, epochs=50, verbose=0)
print('Final loss:', history.history['loss'][-1])
print('Accuracy:', history.history['accuracy'][-1])

In [None]:
# Practical 2: MLP with one hidden layer
from tensorflow.keras.layers import Dense
model2 = Sequential()
model2.add(Dense(8, activation='relu', input_shape=(2,)))
model2.add(Dense(1, activation='sigmoid'))
model2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
h2 = model2.fit(X, y, epochs=50, verbose=0)
print('Accuracy (one hidden layer):', h2.history['accuracy'][-1])

In [None]:
# Practical 3: Xavier (Glorot) initialization example
from tensorflow.keras.initializers import GlorotUniform
model3 = Sequential()
model3.add(Dense(8, activation='relu', kernel_initializer=GlorotUniform(), input_shape=(2,)))
model3.add(Dense(1, activation='sigmoid'))
model3.compile(optimizer='adam', loss='binary_crossentropy')
print('Model with Glorot initialized layers constructed (no training here)')

In [None]:
# Practical 4: Different activation functions in a simple model
model_act = Sequential()
model_act.add(Dense(8, activation='tanh', input_shape=(2,)))
model_act.add(Dense(1, activation='sigmoid'))
model_act.compile(optimizer='adam', loss='binary_crossentropy')
print('Model with tanh activation constructed')

In [None]:
# Practical 5: Dropout example
from tensorflow.keras.layers import Dropout
model_drop = Sequential()
model_drop.add(Dense(16, activation='relu', input_shape=(2,)))
model_drop.add(Dropout(0.5))
model_drop.add(Dense(1, activation='sigmoid'))
model_drop.compile(optimizer='adam', loss='binary_crossentropy')
print('Model with dropout constructed')

In [None]:
# Practical 6: Manual forward propagation for a 2-layer network (numpy)
import numpy as np
def forward(X, W1, b1, W2, b2):
    z1 = X.dot(W1) + b1
    a1 = np.tanh(z1)
    z2 = a1.dot(W2) + b2
    a2 = 1/(1+np.exp(-z2))  # sigmoid
    return a2

# small random weights
np.random.seed(0)
W1 = np.random.randn(2,4)
b1 = np.zeros(4)
W2 = np.random.randn(4,1)
b2 = np.zeros(1)
print('Forward output sample:', forward(np.array([[1,0]]), W1, b1, W2, b2))

In [None]:
# Practical 7: BatchNormalization example
from tensorflow.keras.layers import BatchNormalization
model_bn = Sequential()
model_bn.add(Dense(16, input_shape=(2,)))
model_bn.add(BatchNormalization())
model_bn.add(Dense(1, activation='sigmoid'))
model_bn.compile(optimizer='adam', loss='binary_crossentropy')
print('Model with batch normalization constructed')

In [None]:
# Practical 8: Visualize training (example using model2 history)
import matplotlib.pyplot as plt
h = h2.history
plt.figure(figsize=(8,3))
plt.subplot(1,2,1)
plt.plot(h['loss']); plt.title('Loss')
plt.subplot(1,2,2)
plt.plot(h['accuracy']); plt.title('Accuracy')
plt.tight_layout()
print('Plotted loss and accuracy (run in Colab to display)')

In [None]:
# Practical 9: Gradient clipping using optimizer arguments
from tensorflow.keras.optimizers import Adam
opt = Adam(learning_rate=0.01, clipnorm=1.0)  # or clipvalue
model_clip = Sequential([Dense(8, activation='relu', input_shape=(2,)), Dense(1, activation='sigmoid')])
model_clip.compile(optimizer=opt, loss='binary_crossentropy')
print('Compiled model with gradient clipping')

In [None]:
# Practical 10: Custom loss function example (mean absolute percentage error variant)
import tensorflow.keras.backend as K
def custom_mape(y_true, y_pred):
    return K.mean(K.abs((y_true - y_pred) / (K.clip(K.abs(y_true), K.epsilon(), None)))) * 100

model_cl = Sequential([Dense(8, activation='relu', input_shape=(2,)), Dense(1)])
model_cl.compile(optimizer='adam', loss=custom_mape)
print('Model compiled with custom loss (custom_mape)')

In [None]:
# Practical 11: Visualize model structure (text-based)
model2.summary()  # prints layer info; for graphical plot use plot_model (requires pydot)

---

*Notebook prepared for student assignment: theory + practical examples. Run in Google Colab. Short epochs and tiny datasets used for quick demonstration.*