# Neural Networks Practical Workshop - Part 1: Introduction and Setup

In this workshop, we'll explore neural networks from the ground up. We'll first implement a neural network from scratch using NumPy, then use PyTorch for more efficient implementations.

## Workshop Overview

This workshop is divided into five parts:

1. **Introduction and Setup** (this notebook)
2. **Neural Network from Scratch**: Building a neural network using only NumPy
3. **MNIST Dataset**: Exploring the dataset we'll use for our models
4. **PyTorch Implementation**: Using PyTorch to build and train the same model
5. **Comparison**: Directly comparing the custom and PyTorch implementations

## 1. Setup and Libraries

We'll start by importing the libraries we'll need throughout this workshop.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# For reproducibility
np.random.seed(42)
torch.manual_seed(42)

## 2. Neural Network Fundamentals

### What is a Neural Network?

A neural network is a series of algorithms that attempts to identify underlying relationships in a set of data through a process that mimics how the human brain operates. They are particularly good at recognizing patterns and can be used for classification, regression, and other tasks.

### Key Components

1. **Neurons**: The basic units that process information, applying weights to inputs and passing the result through an activation function.
2. **Layers**: Collections of neurons that process information in stages.
3. **Weights and Biases**: Parameters that are learned during training.
4. **Activation Functions**: Non-linear functions that determine the output of a neuron.
5. **Loss Function**: Measures how well the network's predictions match the target values.
6. **Optimizer**: Algorithm that adjusts the weights and biases to minimize the loss function.

### Forward and Backward Propagation

- **Forward Propagation**: The process of computing outputs by passing inputs through the network's layers.
- **Backward Propagation**: The process of computing gradients and updating weights by propagating errors backward through the network.

![Neural Network Architecture](https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/300px-Colored_neural_network.svg.png)

## 3. Neural Network Mathematics

### Forward Pass

For a simple feedforward neural network with one hidden layer:

1. First layer computation: $z^{[1]} = W^{[1]} \cdot X + b^{[1]}$
2. Apply activation function: $a^{[1]} = g^{[1]}(z^{[1]})$
3. Output layer computation: $z^{[2]} = W^{[2]} \cdot a^{[1]} + b^{[2]}$
4. Apply output activation function: $a^{[2]} = g^{[2]}(z^{[2]})$

Where:
- $X$ is the input
- $W^{[l]}$ is the weight matrix for layer $l$
- $b^{[l]}$ is the bias vector for layer $l$
- $g^{[l]}$ is the activation function for layer $l$
- $z^{[l]}$ is the weighted input to layer $l$
- $a^{[l]}$ is the activation output of layer $l$

### Backward Pass

The backward pass involves:

1. Compute the output error: $dz^{[2]} = a^{[2]} - y$
2. Compute gradients for the output layer: $dW^{[2]} = dz^{[2]} \cdot a^{[1]T}$, $db^{[2]} = sum(dz^{[2]})$
3. Propagate error to hidden layer: $dz^{[1]} = W^{[2]T} \cdot dz^{[2]} * g^{[1]'}(z^{[1]})$
4. Compute gradients for the hidden layer: $dW^{[1]} = dz^{[1]} \cdot X^T$, $db^{[1]} = sum(dz^{[1]})$
5. Update weights: $W^{[l]} = W^{[l]} - \alpha \cdot dW^{[l]}$
6. Update biases: $b^{[l]} = b^{[l]} - \alpha \cdot db^{[l]}$

Where:
- $y$ is the target output
- $dW^{[l]}$ is the gradient of the cost function with respect to $W^{[l]}$
- $db^{[l]}$ is the gradient of the cost function with respect to $b^{[l]}$
- $\alpha$ is the learning rate
- $g^{[l]'}$ is the derivative of the activation function for layer $l$

## 4. Common Activation Functions

Let's implement and visualize some common activation functions:

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(x):
    return np.maximum(0, x)

def tanh(x):
    return np.tanh(x)

def leaky_relu(x, alpha=0.01):
    return np.where(x > 0, x, alpha * x)

# Plot activation functions
x = np.linspace(-5, 5, 100)

plt.figure(figsize=(12, 8))

plt.subplot(2, 2, 1)
plt.plot(x, sigmoid(x))
plt.title('Sigmoid')
plt.grid(True)

plt.subplot(2, 2, 2)
plt.plot(x, relu(x))
plt.title('ReLU')
plt.grid(True)

plt.subplot(2, 2, 3)
plt.plot(x, tanh(x))
plt.title('Tanh')
plt.grid(True)

plt.subplot(2, 2, 4)
plt.plot(x, leaky_relu(x))
plt.title('Leaky ReLU')
plt.grid(True)

plt.tight_layout()
plt.show()

## Next Steps

Now that we understand the basics and have set up our environment, we're ready to implement a neural network from scratch in the next notebook: `02_nn_from_scratch.ipynb`.