# Multiple Linear Regression (MLR)

- Decission function: $\hat{y} = b_0 + w_1x_1 + \ldots + w_Dx_D$

Let input data $\vec{x}$ be a tensor $D\times 1$ and $\vec{w}$ is a weight vector that is $D\times 1$ and $\hat{y}$ is the dependent variable.

$$\vec{x} = \begin{bmatrix}
x_1\\
\vdots\\
x_D
\end{bmatrix}$$

$$\vec{w} = \begin{bmatrix}
w_1\\
\vdots\\
w_D
\end{bmatrix}$$

$$\hat{y} = \vec{x}^T\vec{w} + b$$

- Input data matrices: $X$

$$X = \begin{bmatrix}
x_{11} & x_{12} & \cdots &x_{1D}\\
x_{21} & x_{22} & \cdots &x_{2D}\\
\vdots & \vdots & \ddots &\vdots\\
x_{K1} & x_{K2} & \cdots &x_{KD}
\end{bmatrix}$$

- bias vector: $\vec{b}$
$$\vec{b} = \begin{bmatrix}
b\\
\vdots\\
b
\end{bmatrix}_{D\times 1}$$
where $K$ is a number of sample and $D$ is a number of feature of input data.

$$\hat{Y} = X\vec{w} + \vec{b}$$

In [1]:
import torch
from torch.nn import Linear
torch.manual_seed(1)
import torch.nn as nn

## Model

In [2]:
model = Linear(in_features = 2, out_features = 1) #D = 2

In [3]:
list(model.parameters())

[Parameter containing:
 tensor([[ 0.3643, -0.3121]], requires_grad=True),
 Parameter containing:
 tensor([-0.1371], requires_grad=True)]

### Input vector $\vec{x}$

In [4]:
x = torch.tensor([[1.0, 3.0]]) #K = 1
yhat = model(x)

In [5]:
x

tensor([[1., 3.]])

In [6]:
yhat

tensor([[-0.7090]], grad_fn=<AddmmBackward0>)

### Input matrices $X$

In [7]:
X = torch.tensor([[1.0, 1.0], [1.0, 2.0], [1.0, 3.0]]) #K = 3
Yhat = model(X)

In [8]:
X

tensor([[1., 1.],
        [1., 2.],
        [1., 3.]])

In [9]:
Yhat

tensor([[-0.0848],
        [-0.3969],
        [-0.7090]], grad_fn=<AddmmBackward0>)

### Cost function

Let $$\vec{w} = \begin{bmatrix}
w_1\\
\vdots\\
w_D\\
b
\end{bmatrix}$$
and
$$\vec{x} = \begin{bmatrix}
x_1\\
\vdots\\
x_D\\
1
\end{bmatrix}$$

- cost fucntion: $l(\vec{w}, b) = \frac{1}{N}\sum_{n=1}^N (y_n - (\vec{x}_n^T\vec{w}))^2$

- vector gradient: $\nabla l(\vec{w}, b) =
\begin{bmatrix}
\frac{\partial l(\vec{w}, b)}{\partial w_1}\\
\vdots\\
\frac{\partial l(\vec{w}, b)}{\partial w_D}\\
\frac{\partial l(\vec{w}, b)}{\partial b}
\end{bmatrix}$

### Training

$$\vec{w}^{k+1} = w^k - \eta\nabla l(\vec{w}, b)$$

where $\eta$ is a learning rate.

In [10]:
from torch import nn, optim
import torch
from torch.utils.data import Dataset, DataLoader

In [11]:
#Model
class LR(nn.Module):
    def __init__(self, input_size, output_size):
        super(LR, self).__init__()
        self.linear = Linear(input_size, output_size)
        
    def forward(self, X):
        out = self.linear(X)
        return out

In [12]:
#Data
class Data2D(Dataset):
    def __init__(self):
        self.x = torch.zeros(20, 2)
        self.x[:, 0] = torch.arange(-1, 1, 0.1)
        self.x[:, 1] = torch.arange(-1, 1, 0.1)
        self.w = torch.tensor([[1.0], [1.0]])
        self.b = 1
        self.f = torch.mm(self.x, self.w) + self.b
        self.y = self.f + 0.1 * torch.randn((self.x.shape[0], 1))
        self.len = self.x.shape[0]
        
    def __getitem__(self, index):
        return self.x[index], self.y[index]
    
    def __len__(self):
        return self.len

In [13]:
data_set = Data2D()
criterion = nn.MSELoss()
trainloader = DataLoader(dataset = data_set, batch_size = 2)
model = LR(input_size= 2, output_size = 1)
optimizer = optim.SGD(model.parameters(), lr = 0.1)

In [14]:
for epoch in range(100):
    for x, y in trainloader:
        
        yhat = model(x)
        loss = criterion(yhat, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

In [15]:
for p in model.parameters():
    if p.requires_grad:
         print(p.name, p.data)

None tensor([[0.4891, 1.5115]])
None tensor([1.0040])
