### Key Concepts

1. **Multiple Linear Regression**

* Instead of a single predictor (X), multiple independent variables are used (*X₁, X₂, …, Xₙ*).
* Each variable has a **weight (w)** that measures its influence on Y, plus a **bias (b)**.
* The equation becomes:

$$
\hat{y} = w_1X_1 + w_2X_2 + … + w_nX_n + b
$$

2. **Tensor Representation**

* An input sample is a **(1×D) vector**.
* The weights are a **(D×1) vector**.
* The prediction is calculated as the **dot product** of X and w, plus the bias.
* For multiple samples, use a **matrix X (N×D)** where N = number of samples and D = number of features.

3. **Visualization**

* Each sample can be colored to show how it is transformed.
* Directed graphs (nodes = features, edges = weights) help understand the mechanism, also useful for neural networks.

4. **Using PyTorch**

* Use `nn.Linear(in_features, out_features)`, which directly implements the linear function.
* The parameters (weights and biases) are initialized randomly.
* We can inspect them with `parameters()` or `state_dict()`.
* Input → tensor (samples as rows). Output → tensor with predictions.

5. **Custom Module**

* In PyTorch, you can create custom modules by inheriting from `nn.Module`.
* It implements a constructor (`__init__`) and the `forward` method.
* Although similar to `nn.Linear`, it serves as a foundation for building more complex models (e.g., neural networks).

---

### In Summary

* Multiple linear regression is a linear transformation that combines independent variables to predict an output.
* In PyTorch, it's easily managed with `nn.Linear`, which automates weights and biases.
* Creating a custom module is a crucial intermediate step in understanding how neural network building blocks work.§

In [None]:
# Import the libraries and set the random seed
from torch import nn
import torch
torch.manual_seed(1)

# In PyTorch, weights and biases for models (nn.Linear, neural networks, etc.) are initialized randomly.
# If you don't specify a seed, the values ​​change every time you run the program → different results.
# If you specify torch.manual_seed(1), you always get the same sequence of random numbers, and therefore the same initial weights.

<torch._C.Generator at 0x2631642acf0>

### **Prediction**

Set weight and bias.

In [6]:
w = torch.tensor([[2.0], [3.0]], requires_grad=True)
b = torch.tensor([[1.]], requires_grad=True)

Define the parameters. torch.mm uses matrix multiplication instead of scaler multiplication.

In [11]:
def forward(x):
    yhat = torch.mm(x, w) + b
    return yhat

$$ y = xw + b $$

<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/chapter2/2.6.1_matrix_eq.png" width="600" alt="Matrix Linear Regression">


1. **x** → the input data (features).

* It can be a single sample (vector) or multiple samples (matrix).
* Ex: in the case of multiple linear regression, each column is an independent variable $X_1, X_2, ... X_D$.

2. **w (weights)** → the model weights.

* These are **trainable parameters** that change during training.
* They indicate the importance of each variable in influencing the output.

3. **b (bias)** → offset term.

* This is not an "error", but a constant that **translates** the line (or hyperplane) with respect to the origin.
* It is used to allow the model to better fit the data even when the inputs are zero.

4. **yhat** → the predicted output.

* This is the linear combination of the inputs with the weights plus the bias.

In [15]:
# Calculate yhat
x = torch.tensor([[1.0, 2.0]])
yhat = forward(x)
print("The result: ", yhat)

The result:  tensor([[9.]], grad_fn=<AddBackward0>)


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/JBkvPoMCCa-PDXCF_4aQfQ/image%20-1-.png" width="300" alt="Linear Regression Matrix Sample One">


# Each row of the following tensor represents a sample:

In [16]:
# Sample tensor X

X = torch.tensor([[1.0, 1.0], [1.0, 2.0], [1.0, 3.0]])

# Make the prediction of X 

yhat = forward(X)
print("The result: ", yhat)

The result:  tensor([[ 6.],
        [ 9.],
        [12.]], grad_fn=<AddBackward0>)


In [None]:
# Linear Regression Class

class linear_regression(nn.Module):
    
    # Constructor
    def __init__(self, input_size, output_size):
        # input size are features, the output it's just the predicted value
        super(linear_regression, self).__init__()
        self.linear = nn.Linear(input_size, output_size)
        
        
    # Prediction Function
    def forward(self, x):
        yhat = self.linear(x)
        return yhat

In [21]:
# Practice: Build a model to predict the follow tensor.
X = torch.tensor([[11.0, 12.0, 13, 14], # first row 
                  [11, 12, 13, 14]])  # second row

# It has the form (2, 4) → 2 rows and 4 columns.
# Each row is a sample.
# Each column is a feature (independent variable).


model = linear_regression(4, 1)


In [23]:
# Prediction 
yhat = model(X)
yhat

tensor([[6.2828],
        [6.2828]], grad_fn=<AddmmBackward0>)

### Cost function

1. **Cost function**

* It is used to measure how far the predictions $\hat{y}$ are from the actual values ​​$y$.
* In regression, the **MSE (Mean Squared Error)** is commonly used:

$$
\text{MSE} = \frac{1}{N} \sum (y - \hat{y})^2
$$

2. **Model parameters**

* If you have $d$ inputs (features), the model has **d weights + 1 bias**.
* Ex: 2-dimensional input → 3 parameters (w₁, w₂, b).
* 3-dimensional input → 4 parameters (w₁, w₂, w₃, b).

3. **Gradient descent**

* Calculate the derivative of the loss with respect to weights and biases.
* Update the parameters in the direction that reduces the error:

$$
w := w - \eta \frac{\partial L}{\partial w}, \quad b := b - \eta \frac{\partial L}{\partial b}
$$

where $\eta$ is the learning rate.

---

### Implementation in PyTorch

1. **Dataset**

* A `Data2D` class is used (two input variables, one output).
* The data is loaded with a **DataLoader** that returns batch (here batch\_size=2).

2. **Model**

* Created with `nn.Linear(2,1)` → 2 input features, 1 output.

3. **Criterion (loss function)**

* Use `nn.MSELoss()` to calculate the mean squared error.

4. **Optimizer**

* Ex: `torch.optim.SGD(model.parameters(), lr=0.1)` to update the weights.

5. **Training loop**
For each epoch:

* Make the prediction (`yhat = model(x)`).
* Calculate the loss.
* Set the gradients to zero (`optimizer.zero_grad()`).
* Perform backpropagation (`loss.backward()`).
* Update the parameters (`optimizer.step()`).

---

### Result

* Initially, the plane (the learned linear function) **does not fit the data well**.
* After ~100 epochs, the plan adapts and follows the dataset points much better.

---

In summary:
The algorithm is used to **teach the model the optimal values ​​of weights and biases** to reduce the error between predictions and actual data, using **gradient descent** and a **cost function**.