# Lecture 1: Neurons, notes and ELI5
22AUG23

[Suggested book](https://www.deeplearningbook.org/contents/linear_algebra.html)  
[Lecture recording](https://uit.instructure.com/courses/30426/files/2455904?module_item_id=820139)  
[Lecture slides](https://uit.instructure.com/courses/30426/files/2455922?module_item_id=820138)

![Screenshot 2023-08-22 at 20.33.01.png](<attachment:Screenshot 2023-08-22 at 20.33.01.png>)

## II Basic Terminology

### 1. Classification vs. Regression

Given a training set $X_l = \{(x_i, y_i)\}$ for $i = 1, 2, ..., l$:
- $x_i$: Objects or feature vectors residing in an n-dimensional real space (often referred to as the feature space) represented as $\mathbb{R}^n$.
- $y_i$: Responses or target variables corresponding to each $x_i$.

### 2. Classification

Classification is the task where data points are categorized into predefined categories or classes.

- **Output Space**: $Y = \{-1, 1\}$. In binary classification, each data point $x_i$ is assigned to one of two classes, either -1 or +1.
  
- **Algorithm (or model)**:   
  $a(x, w) = \text{sign}(w \cdot x_i)$  
  The prediction for a data point is based on the sign of the dot product between the weight vector $w$ and the data point $x_i$.

- **Loss Function**:   
  $Q(w; X_l) = \sum_{i=1}^l \text{Indicator}[w \cdot x_i y_i < 0]$  
  A cumulative measure of how often the model made an incorrect prediction. The objective is to minimize this loss.

### 3. Regression

Regression predicts a continuous value, not a class label.

- **Output Space**: $Y = \mathbb{R}$. This means the output is a real number.

- **Algorithm (or model)**:  
  $a(x, w) = w \cdot x_i$  
  The prediction for a data point is the dot product between the weight vector $w$ and the data point $x_i$.

- **Loss Function**:  
  $Q(w; X_l) = \sum_{i=1}^l (w \cdot x_i - y_i)^2$  
  This is the sum of squared differences between the predicted values and the actual values, and the goal is to minimize this loss.

**Summary**:  
- **Classification**: Predicts which category or class a data point belongs to.
- **Regression**: Predicts a continuous value for a data point.

---

## ELI5

When the machine is in the learning or training phase:

1. We show the machine an image of the letter "B."
2. The machine might initially guess it's an "A."
3. We then immediately tell the machine, "No, that's a B."
4. The machine then adjusts its internal mechanisms (a combination of weights and biases in the case of neural networks) to try to get closer to the correct answer next time.

Repeat this process thousands or even millions of times, and the machine gets better at identifying A's and B's correctly. Each time it's wrong, it adjusts a little bit based on how wrong it was and in what way.

After this training phase, when the machine sees new images of letters that weren't in the training set, it uses what it has learned to classify them. While it's not perfect and can still make mistakes, the idea is that it has seen enough examples of A's and B's during training that it can make a pretty good guess on new examples.

![Screenshot 2023-08-22 at 20.33.10.png](<attachment:Screenshot 2023-08-22 at 20.33.10.png>)

1. **Input Features**: Given $n$ numerical features denoted by $x_j$, where $j$ ranges from 1 to $n$. 

2. **Linear Combination**:   
   The linear combination of these features with their respective weights is represented as:  
   $$a(x, w) = \sigma(w, x) = \sigma \left( \sum_{j=1}^{n} w_j x_j - w_0 \right)$$  
   Here:
   - $w_j$ are the feature weights, and they're real numbers.
   - $w_0$ is the bias term.

3. **Activation Function**:   
   The symbol $\sigma(z)$ represents an activation function. Examples include:
   - $\text{sign}(z)$.
   - $\frac{1}{1+e^{-z}}$, which is the sigmoid function.

4. **Linear Neuron Model**:


![Screenshot 2023-08-22 at 20.33.10.png](<attachment:Screenshot 2023-08-22 at 20.33.10.png>)

1. **Input Features**: Given $n$ numerical features denoted by $x_j$, where $j$ ranges from 1 to $n$. 

2. **Linear Combination**: 
   The linear combination of these features with their respective weights is represented as:
   $$
   a(x, w) = \sigma(w, x) = \sigma \left( \sum_{j=1}^{n} w_j x_j - w_0 \right)
   $$
   Here:
   - $w_j$ are the feature weights, and they're real numbers.
   - $w_0$ is the bias term.

3. **Activation Function**: 
   The symbol $\sigma(z)$ represents an activation function. Examples include:
   - $\text{sign}(z)$
   - $\frac{1}{1+e^{-z}}$, which is the sigmoid function.

4. **Linear Neuron Model**:
   - Inputs are multiplied by their weights.
   - Weighted inputs are summed.
   - The bias is subtracted.
   - The result passes through an activation function to produce the output $a$.


![Screenshot 2023-08-22 at 20.59.49.png](<attachment:Screenshot 2023-08-22 at 20.59.49.png>)

### ELI5: Neural Networks Concepts

1. **Weights**:
   - Imagine you and your friends are trying to balance on a seesaw. Some of you are heavier than others. The weights in a neural network are like how heavy each of you is. If you change who's on the seesaw or move closer/further from the center, it tilts differently. Similarly, adjusting the weights in the network changes its behavior.

2. **Adders**:
   - Think of adders like summer vacations when you collect seashells from the beach. At the end of the day, you combine (or add up) all the seashells you and your friends found. In a neural network, an adder is a spot where all the multiplied weights (or seashells) are added together.

3. **Activation Functions**:
   - Imagine you have a toy that lights up when you put enough coins inside it. If you don't put enough, it stays off. Activation functions work in a similar way. They decide whether the neuron should "fire" (like the toy lighting up) based on the total weight (or the coins you added).

### ELI: 2nd Year Computer Student

1. **Weights**:
   - In a neural network, weights are parameters that determine the strength or importance of input features. They're adjusted during training to minimize the error between the predicted and actual output. It's like tuning the dials on a machine until it works just right.

2. **Adders**:
   - These are summation units in the neuron. After each input value is multiplied by its corresponding weight, all these products are summed together in the adder. This cumulative sum then gets passed on to the activation function.

3. **Activation Functions**:
   - Activation functions introduce non-linearity into the network. They determine the neuron's output based on the summed input. Common functions include the sigmoid, tanh, and ReLU. Think of it as the logic gate in a neuron, deciding how much of the signal to pass forward, if any.


![Screenshot 2023-08-22 at 21.00.50.png](<attachment:Screenshot 2023-08-22 at 21.00.50.png>)

### TOREAD: Maths to learn quickly 😰

1. **Linear Algebra**:
    - **Vectors and Matrices**: Understand basic vector and matrix operations like addition, multiplication, and inversion.
    - **Eigenvalues and Eigenvectors**: These concepts often come up in more advanced topics and optimizations.
    - **Matrix Factorizations**: LU, QR, and Singular Value Decomposition (SVD).

2. **Calculus**:
    - **Differential Calculus**: Understand derivatives and the rules for differentiation.
    - **Integral Calculus**: Basics of integration.
    - **Partial Derivatives**: Since neural networks deal with functions of many variables, understanding partial derivatives is crucial.
    - **Chain Rule**: Essential for backpropagation in neural networks.

3. **Probability and Statistics**:
    - **Probability Basics**: Understand discrete and continuous probability distributions.
    - **Bayes' Theorem**: Comes in handy when understanding probabilistic models.
    - **Expectation, Variance, and Standard Deviation**: Essential statistical measures.
    - **Common Distributions**: Normal (Gaussian), Bernoulli, and Binomial distributions.
    - **Central Limit Theorem**: Often used implicitly in deep learning theories.

4. **Optimization**:
    - **Gradient Descent**: The primary optimization technique used in training neural networks.
    - **Convex Optimization**: While not all optimization problems in neural networks are convex, understanding the basics helps.
