<a id="pytorch-transpose"></a>
## `.T` PyTorch Transpose in details:

##### 1. **What Does `weights.T` Do?**

- If `weights` is a matrix (or tensor), `.T` transposes it:
  - Rows become columns, and columns become rows.
  - For example:
    ```python
    weights = [[1, 2],
               [3, 4]]
    weights.T  # Result: [[1, 3],
               #          [2, 4]]
    ```

- In PyTorch, `.T` is shorthand for `.transpose(0, 1)` (swapping the first and second dimensions).

---

##### 2. **Why Use `weights.T` Here?**

In the expression `(train_x[0] * weights.T).sum() + bias`, the transpose ensures that the dimensions of `weights` align correctly with `train_x[0]` for matrix multiplication.

##### Dimensions Breakdown:
- `train_x[0]`: A single data point (e.g., an image flattened into a vector). Shape: `(784,)` (1D tensor with 784 features).
- `weights`: The learnable parameters for the model. Shape: `(784, 1)` (2D tensor with 784 rows and 1 column).

If you directly multiply `train_x[0]` (shape `(784,)`) with `weights` (shape `(784, 1)`), the dimensions won’t align properly for element-wise multiplication or dot product.

By using `weights.T`, the shape becomes `(1, 784)`, which allows the computation to proceed correctly.

1. **`train_x[0]`**:
   - A single data point (flattened image). Shape: `(784,)`.

2. **`weights.T`**:
   - Transposed weights. Shape: `(1, 784)`.

3. **`train_x[0] * weights.T`**:
   - Element-wise multiplication between `train_x[0]` and `weights.T`.
   - This computes the weighted contribution of each feature.

4. **`.sum()`**:
   - Sums up all the weighted contributions to produce a single scalar value.

5. **`+ bias`**:
   - Adds the bias term to shift the result.

The final output is the prediction for the input `train_x[0]`.

<a id="sgd-illustrate-with-code"></a>
## SGD Illustrate with code

For simplicity, let’s assume the image has only 4 pixels instead of 784 (this makes it easier to visualize).
```python
one_image = [0.1, 0.5, 0.3, 0.9]  # Pixel values of the image (scaled between 0 and 1).

#The weights represent how important each pixel is for making a prediction.
# Since the input has 4 pixels, the weights will also have 4 values (one for each pixel).
weights = [[0.2],  # Weight for pixel 1
           [0.4],  # Weight for pixel 2
           [-0.1], # Weight for pixel 3
           [0.3]]  # Weight for pixel 4

#To align the dimensions for element-wise multiplication, we transpose the weights.
# Transposing converts weights from shape (4, 1) to (1, 4).
weights_transpose = [[0.2, 0.4, -0.1, 0.3]]  # Shape: (1, 4)

# Multiply each pixel value in one_image by the corresponding weight in weights_transpose.
element_wise_product = [0.1 * 0.2,  # Pixel 1 × Weight 1
                        0.5 * 0.4,  # Pixel 2 × Weight 2
                        0.3 * -0.1, # Pixel 3 × Weight 3
                        0.9 * 0.3]  # Pixel 4 × Weight 4

# Result
element_wise_product = [0.02, 0.2, -0.03, 0.27]

# Add up all the values in the element_wise_product to get a single scalar value.
weighted_sum = 0.02 + 0.2 + (-0.03) + 0.27 = 0.46

# The bias is a single number that shifts the result up or down.
bias = 0.1
prediction = weighted_sum + bias = 0.46 + 0.1 = 0.56

# Final Prediction
one_image_prediction = 0.56
```

<a id="linear-transformation-matrix-multiplication"></a>
## Linear Transformation: Matrix Multiplication `batch @ weights + bias`

This equation, `batch@weights + bias`, is one of the two fundamental equations of any neural network. The second one is **Activation Function**.
This expression represents a **linear transformation** commonly used in machine learning, especially in neural networks. Here's what each part means:

1. **`batch`**:
   - This is a **matrix** (or tensor) containing multiple input data points. Each row corresponds to one data point, and each column corresponds to a feature (e.g., pixel values, sensor readings, etc.).
   - Shape: Typically `(n, m)`, where:
     - `n` = number of data points in the batch,
     - `m` = number of features per data point.

2. **`weights`**:
   - This is a **matrix** of learnable parameters that the model uses to transform the input data.
   - Shape: Typically `(m, p)`, where:
     - `m` = number of input features (must match the second dimension of `batch`),
     - `p` = number of output features (e.g., neurons in the next layer).

3. **`@`**:
   - The `@` operator performs **matrix multiplication** between `batch` and `weights`.
   - Resulting shape: `(n, p)` (number of data points × number of output features).

4. **`bias`**:
   - This is a **vector** (or 1D tensor) of learnable parameters added to the result of the matrix multiplication.
   - Shape: `(p,)` (must match the number of output features).

5. **`+ bias`**:
   - After the matrix multiplication, the `bias` is added element-wise to each row of the resulting matrix.

### Why Is This Important?

The operation `batch @ weights + bias` is the core of a **linear layer** in neural networks. It transforms the input data into a new representation by applying a weighted sum of the inputs and adding a bias term. This transformation is fundamental for tasks like classification, regression, and more.



### Calculation Example

In [None]:
batch = [[1, 2, 3],    # Data point 1
        [4, 5, 6]]    # Data point 2

  Shape: `(2, 3)` (2 data points, 3 features).

- A **weights** matrix that maps 3 input features to 2 output features:

In [None]:
weights = [[0.1, 0.2],  # Weights for output feature 1
            [0.3, 0.4],  # Weights for output feature 2
            [0.5, 0.6]]  # Weights for output feature 3


  Shape: `(3, 2)` (3 input features, 2 output features).

- A **bias** vector for the 2 output features:

In [None]:
bias = [0.1, 0.2]  # Bias for output feature 1 and 2

Shape: `(2,)`.

#### Step 1: Define the Inputs



>**Side note:** A feature is like a descriptive attribute or dimension of your data. For example: If we're working with images, features could be pixel values.

Let’s say we have:
- A **batch** of 2 data points, each with 3 features:
  ```python
  batch = [[1, 2, 3],    # Data point 1
           [4, 5, 6]]    # Data point 2
  ```
  Shape: `(2, 3)` (2 data points, 3 features).

- A **weights** matrix that maps 3 input features to 2 output features:
  ```python
  weights = [[0.1, 0.2],  # Weights for output feature 1
             [0.3, 0.4],  # Weights for output feature 2
             [0.5, 0.6]]  # Weights for output feature 3
  ```
  Shape: `(3, 2)` (3 input features, 2 output features).

- A **bias** vector for the 2 output features:
  ```python
  bias = [0.1, 0.2]  # Bias for output feature 1 and 2
  ```
  Shape: `(2,)`.




#### Step 2: Perform Matrix Multiplication (`batch @ weights`)

We compute the matrix multiplication between `batch` and `weights`. For each data point, this calculates a weighted sum of the input features.

Mathematically:
$$
\text{result}[i, j] = \sum_{k} \text{batch}[i, k] \cdot \text{weights}[k, j]
$$

For our example:
```python
result = batch @ weights
```

Step-by-step:
1. First data point (`[1, 2, 3]`) multiplied by `weights`:
   $$
   [1 \cdot 0.1 + 2 \cdot 0.3 + 3 \cdot 0.5, \quad 1 \cdot 0.2 + 2 \cdot 0.4 + 3 \cdot 0.6]
   = [2.2, 2.8]
   $$

2. Second data point (`[4, 5, 6]`) multiplied by `weights`:
   $$
   [4 \cdot 0.1 + 5 \cdot 0.3 + 6 \cdot 0.5, \quad 4 \cdot 0.2 + 5 \cdot 0.4 + 6 \cdot 0.6]
   = [4.9, 6.4]
   $$

So, the result of `batch @ weights` is:
```python
[[2.2, 2.8],  # Output for data point 1
 [4.9, 6.4]]  # Output for data point 2
```
Shape: `(2, 2)`.



#### Step 3: Add the Bias (`+ bias`)


Now, we add the `bias` vector `[0.1, 0.2]` to each row of the result.

Mathematically:
$$
\text{final}[i, j] = \text{result}[i, j] + \text{bias}[j]
$$

For our example:
```python
final = result + bias
```

Step-by-step:
1. Add bias to the first row:
   $$
   [2.2 + 0.1, \quad 2.8 + 0.2] = [2.3, 3.0]
   $$

2. Add bias to the second row:
   $$
   [4.9 + 0.1, \quad 6.4 + 0.2] = [5.0, 6.6]
   $$

So, the final result is:
```python
[[2.3, 3.0],  # Final output for data point 1
 [5.0, 6.6]]  # Final output for data point 2
 ```

 Shape: `(2, 2)`.

#### Final Answer


The operation `batch @ weights + bias` transforms the input data into a new representation by applying a weighted sum and adding a bias. For our example:

In [None]:
# Input

batch = [[1, 2, 3],
         [4, 5, 6]]
weights = [[0.1, 0.2],
           [0.3, 0.4],
           [0.5, 0.6]]
bias = [0.1, 0.2]

<a id="pytorch-dataset"></a>
## Pytorch Dataset

In PyTorch, a `Dataset` is a class that helps you manage your data. It provides a way to:
1. Store your input data (`x`) and labels (`y`).
2. Access individual data points (e.g., one image and its label) when needed.

When you’re training a machine learning model, you typically use a `Dataset` to feed data into the model in small batches.

### What Does "Return a Tuple of `(x, y)`" Mean?

A `Dataset` in PyTorch is required to return two things for each data point:
- `x`: The input data (e.g., an image or feature vector).
- `y`: The corresponding label (e.g., the category or target value).

These are returned as a **tuple** `(x, y)`.

For example:
- If you have an image of a handwritten digit `3`, then:
  - `x` might be the pixel values of the image.
  - `y` might be the label `1` (indicating it’s a `3`).

### What Does "When Indexed" Mean?

The phrase **"when indexed"** refers to how you access individual elements from the `Dataset`. In Python, indexing means accessing an element by its position using square brackets (`[]`).

For example:
```python
dataset[0]  # Access the first element in the dataset
```


In PyTorch, when you index a `Dataset` (e.g., `dataset[0]`), it must return a tuple `(x, y)` containing:
- `x`: The input data for that specific index.
- `y`: The corresponding label for that specific index.

### Example: How Indexing Works in a Dataset

Let’s say we have a simple `Dataset` with three images and their labels:

| Index | Image (`x`)       | Label (`y`) |
|-------|-------------------|-------------|
| 0     | Image of a `3`    | 1           |
| 1     | Image of a `7`    | 0           |
| 2     | Image of another `3` | 1       |

If you index this dataset:
```python
dataset[0]  # Returns (image_of_3, 1)
dataset[1]  # Returns (image_of_7, 0)
dataset[2]  # Returns (another_image_of_3, 1)
```
Each time you index the dataset, it gives you a tuple `(x, y)` for the corresponding data point.

### Why Is This Important?

When training a model, PyTorch uses a `DataLoader` to iterate over the dataset in batches. The `DataLoader` relies on the fact that the `Dataset` returns `(x, y)` when indexed. This ensures that:
1. The input data (`x`) and labels (`y`) are paired correctly.
2. The data can be fed into the model for training or evaluation.

In short: **"When indexed" means accessing a specific data point, and the `Dataset` must return the input data and its label as a tuple `(x, y)`.** 😊
a = torch.tensor([[1.,2.,3.], [4.,5.,6.]])
b = torch.tensor([[74.,7.,8.], [9.,19.,22.]])
cat = torch.cat((a,b))
cat.view(-1, 2*3)

train_y = torch.tensor([1]*len(a) + [0]*len(b)).unsqueeze(1)
train_y