# Neural Networks

## What are Neural Networks?

Neural networks are, at their heart, an advanced form of logistic regression that learns hierarchical features, removing the need for manual feature engineering. Instead of manually engineering features (e.g., combining dimensions for house pricing), neural networks learn the best features during training.
- Computational models inspired by the human brain.
- Consist of interconnected units called *neurons*.
- Can automatically learn and extract useful features from raw data.


### Network Architecture

- **Activation ($a$):** The output of a neuron after applying the activation function (e.g., sigmoid). Reflects how strongly a neuron signals to subsequent layers.
- **Input Layer:** The raw input data (the feature vector).
- **Hidden Layers**  Intermediate layers where feature extraction occurs.
- **Output Layer:** Produces the final prediction.
- **Fully Connected Layers:** Each neuron in a layer receives input from every neuron in the previous layer. This design allows the network to automatically learn which features are important.

> **Key Concept:** *The hidden layer is called "hidden" because its intermediate outputs are not directly observed in the training data.*

### Advantages of Neural Networks

1. **Automatic Feature Learning:**
   - Unlike manual feature engineering, neural networks learn the best representations from the data.
   - **Analogy:** Instead of a chef pre-selecting ingredients and recipes, the network experiments with combinations until it finds what makes the dish (prediction) best.
2. **Flexibility in Architecture:**
   - **Multilayer Perceptron (MLP):** A common type of neural network that uses multiple hidden layers.
   - **Design Decisions:**
     - Number of hidden layers.
     - Number of neurons per layer.
   - These choices can significantly affect the performance of the model.

---

## Example: T-Shirt Demand Prediction (Simplest Case)

Suppose we want to predict whether a T-shirt is a top seller (Yes/No) using its price.
- **Input Feature:** Price ($x$).
- **Computation:** 
  - The neuron calculates a weighted sum of the input plus a bias.
  - Passes the result through a sigmoid activation function to produce an output probability.
  
**Mathematical Representation:**

$$
a = \frac{1}{1 + e^{-(w x + b)}}
$$

Here, $w$ is the weight, $b$ is the bias, and $a$ (activation) is the output probability.
- This logistic regression unit is a simplified model of a biological neuron.
- The activation $a$ reflects how strongly the neuron "fires" or sends its output.

---

## Building a Neural Network with Multiple Features

Now let's expanded the example above and add more features.
- **Input Features:**
  - Price
  - Shipping Costs
  - Marketing Spend
  - Material Quality

- **Key Factors Influencing Sales:**
  1. **Affordability:** A function of price and shipping cost.
  2. **Awareness:** Driven primarily by marketing spend.
  3. **Perceived Quality:** Influenced by material quality and price (since higher price can imply higher quality).

**Input Layer:**
- Contains the feature vector: $\mathbf{x} = [\text{Price}, \text{Shipping Cost}, \text{Marketing}, \text{Material Quality}]$.
  
**Hidden Layer** Consists of 3 neurons:
- **Neuron 1:** Estimates **Affordability**.
- **Neuron 2:** Estimates **Awareness**.
- **Neuron 3:** Estimates **Perceived Quality**.

> In practice, every neuron in the hidden layer is connected to all inputs. The network learns which inputs are most relevant.

**Output Layer:**
- Contains 1 neuron.
- Takes the 3 activations (affordability, awareness, perceived quality) and computes the final probability that the T-shirt is a top seller.

### Diagram of the Network

| **Layer**       | **Neurons** | **Description**                                                                          |
|-----------------|-------------|------------------------------------------------------------------------------------------|
| **Input Layer** | 4           | Features: Price, Shipping Cost, Marketing, Material Quality                              |
| **Hidden Layer**| 3           | Intermediate estimates: Affordability, Awareness, Perceived Quality (activations)         |
| **Output Layer**| 1           | Final prediction: Probability of being a top seller                                      |




## Neural Networks in Computer Vision: Face Recognition

Suppose we want to train a neural network that takes an input image and outputs the identity of the person in that image.

**Image Details:**  
- The image is **1000 × 1000 pixels**.
- Each pixel is represented by an intensity (brightness) value in the range **0–255**.

**Data Representation:**  
- The image is a **1000 × 1000 matrix**.
- By "unrolling" this matrix, you get a **vector of 1,000,000 pixel values**.
- The $1000 \times 1000$ grid of pixels is flattened into a single vector $\mathbf{x}$ of length **1,000,000**.  
- The network processes this long vector to extract relevant features for recognizing the face.


### Neural Network Architecture for Face Recognition

**Input Layer:** Contains the pixel intensity vector $\mathbf{x}$ with **1,000,000 elements**.

**Hidden Layers:** Each hidden layer progressively extracts more complex features.

1. **First Hidden Layer:** Detects low-level features such as edges (e.g., vertical or oriented lines).
    - **Window Size:** Small regions of the image.
2. **Second Hidden Layer:** Combines edges to detect facial parts like eyes, noses, and ears.
    - **Window Size:** Larger regions than the first layer.
3. **Third Hidden Layer (if present):** Aggregates facial parts to identify complete face shapes.
   - **Window Size:** Even larger regions.
  
**Output Layer:** Produces the final classification: the identity of the person. Often outputs a probability distribution over possible identities.


| **Layer**         | **Role**                                              | **Learned Features**                        |
|-------------------|-------------------------------------------------------|---------------------------------------------|
| **Input Layer**   | Receives raw pixel intensities                        | N/A                                         |
| **1st Hidden Layer** | Extracts basic features                             | Edges and simple lines (e.g., vertical edges) |
| **2nd Hidden Layer** | Combines edges into facial parts                   | Facial features (eyes, nose, ear parts)     |
| **3rd Hidden Layer** | Aggregates parts into full face shapes (if used)    | Complete facial structure                   |
| **Output Layer**  | Classifies the image into a person's identity         | Identity probability (softmax output)       |

> **Tip:** The network learns these feature detectors on its own from the data, without explicit programming to look for edges, eyes, or face shapes.
  
> **Analogy:** Think of the network as a detective who starts by noticing small clues (edges), then pieces together these clues (facial parts), and finally identifies the whole picture (the face).
