# ODS NLP Course: seminar 2
## LogReg, TFiDf ,  Classification


# [1] Linear models

## [1.1] Regression Problem: Linear Regression

### [1.1.1] Problem Statement

Let's predict the behavior of a linear function
$$y = 2x_1 + 5x_2 - 3$$

Linear Equation of Multiple Variables for  ùëò
 The observation will look like this:
$$y^{[k]} = x^{[k]}_1 \cdot w_1 + \dots + x^{[k]}_m \cdot w_m + b$$

The same equation in vector form via **dot product**:
$$y^{[k]} = {\bf x}^{[k]} \cdot {\bf w} + b$$

The same equation in matrix form through **the matrix product**:
$$y^{[k]} = X^{[k]}_{[1;\ m]} \times W_{[m;\ 1]} + b$$

The same equation when generalized to all  ‚àó
  observations, moreover,  ùë°
  different targets:
$${Y}_{[*;\ t]} = {X}_{[*;\ m]} \times {W}_{[m;\ t]} + {b}_{[*;\ t]}$$

In [1]:
import numpy as np

In [12]:
# given parameters
W_true = [[2],[5]]
b_true = [[-3]]

# generated observations

np.random.seed(42)
X = np.random.randint(low=-20,high=20,size=(20,len(W_true))) + 0.0
y = X@W_true+b_true
X, y


(array([[ 18.,   8.],
        [ -6., -13.],
        [  0.,  18.],
        [ -2.,   2.],
        [-10., -10.],
        [  3.,  15.],
        [ 19.,   3.],
        [-18.,   1.],
        [-19.,   3.],
        [  9.,  17.],
        [-19.,   0.],
        [ 12.,  -9.],
        [  1.,   4.],
        [  6.,   7.],
        [ -5.,  -6.],
        [-18.,  16.],
        [-14.,   0.],
        [-12.,  18.],
        [ -3., -17.],
        [  4.,  -7.]]),
 array([[ 73.],
        [-80.],
        [ 87.],
        [  3.],
        [-73.],
        [ 78.],
        [ 50.],
        [-34.],
        [-26.],
        [100.],
        [-41.],
        [-24.],
        [ 19.],
        [ 44.],
        [-43.],
        [ 41.],
        [-31.],
        [ 63.],
        [-94.],
        [-30.]]))

Let's consider the contribution of the parameter to the specular derivative
$$\frac{\Delta L}{\Delta p} ‚âà
\frac{\partial L}{\partial p} =
\frac{\partial L}{\partial \bar{y}}
\frac{\partial \bar{y}}{\partial p} =
L^{'}_{\bar{y}} \cdot \frac{\partial \bar{y}}{\partial p}
$$

Accordingly, derivatives of  $ùëè$ and $ùë§_ùëñ$:
$$\begin{align}
\frac{\partial L}{\partial b} &= L^{'}_\bar{y} \cdot 1 \\
\frac{\partial L}{\partial w_i} &= L^{'}_{\bar{y}} \cdot x_i
\end{align}$$

In matrix form, you get:
$$\begin{align}
\frac{\partial L}{\partial b}_{[*;\ t]} &= {L^{'}_\bar{Y}}_{[*;\ t]}\\
\frac{\partial L}{\partial W}_{[m;\ t]} &= X^T_{[m;\ *]} \times {L^{'}_\bar{Y}}_{[*;\ t]}
\end{align}$$




In [7]:
def equation(w=np.array([2, 5]), b=np.array(3)):
    return 'y = ' + ' + '.join(f'{w[i].item():.2f}*x{i+1}' for i in range(len(w))) + f' + {b.item():.2f}'
equation()

'y = 2.00*x1 + 5.00*x2 + 3.00'

## [1.1.2] NumPy solution

In [10]:
# —Å–≥–µ–Ω–µ—Ä–∏—Ä–æ–≤–∞–Ω–Ω—ã–µ –¥–∞–Ω–Ω—ã–µ
np.random.seed(42)
W_true = np.array([[2], [5]])
b_true = -3
X = np.random.randint(-20, 20, (20, len(W_true))) + 0.0
y = X @ W_true + b_true

def np_train(X, y, lr=0.005, max_iter=1000):
    # –ø—Ä–µ–¥—Å–∫–∞–∑—ã–≤–∞–µ–º—ã–µ –ø–∞—Ä–∞–º–µ—Ç—Ä—ã
    W = np.zeros((X.shape[-1], y.shape[-1]))
    b = np.zeros((1, y.shape[-1]))

    for i in range(1, max_iter+1):
        # –ø—Ä–µ–¥—Å–∫–∞–∑–∞–Ω–∏–µ –∏ –æ—à–∏–±–∫–∞
        y_pred = X @ W + b
        loss = np.sum((y_pred - y) ** 2) / len(y)

        # —Ä–∞—Å—á—ë—Ç –≥—Ä–∞–¥–∏–µ–Ω—Ç–æ–≤
        L_grad = (2/len(y)) * (y_pred - y)
        b_grad = np.sum(L_grad)
        W_grad = np.sum(X.T @ L_grad, axis=1, keepdims=True)

        # —à–∞–≥ –≥—Ä–∞–¥–∏–µ–Ω—Ç–∞
        W -= lr * W_grad
        b -= lr * b_grad

        # –ø—Ä–æ–≥—Ä–µ—Å—Å
        if i == 1 or i % 100 == 0:
            print(f"step {i:3}:", equation(W, b), f"loss: {loss.item():.6f}", sep='\t')

np_train(X, y)

step   1:	y = 2.90*x1 + 5.65*x2 + 0.04	loss: 3390.100000
step 100:	y = 2.02*x1 + 4.97*x2 + -1.75	loss: 1.423337
step 200:	y = 2.01*x1 + 4.99*x2 + -2.49	loss: 0.236051
step 300:	y = 2.00*x1 + 5.00*x2 + -2.79	loss: 0.039147
step 400:	y = 2.00*x1 + 5.00*x2 + -2.92	loss: 0.006492
step 500:	y = 2.00*x1 + 5.00*x2 + -2.97	loss: 0.001077
step 600:	y = 2.00*x1 + 5.00*x2 + -2.99	loss: 0.000179
step 700:	y = 2.00*x1 + 5.00*x2 + -2.99	loss: 0.000030
step 800:	y = 2.00*x1 + 5.00*x2 + -3.00	loss: 0.000005
step 900:	y = 2.00*x1 + 5.00*x2 + -3.00	loss: 0.000001
step 1000:	y = 2.00*x1 + 5.00*x2 + -3.00	loss: 0.000000


### [1.1.3] Torch Solution

In [11]:
import torch

In [None]:
X = torch.tensor(X).to(torch.float32)
y = X@W_true + b_true

def torch_train(X,y,lr =0.005,max_iter=1000):
    W = torch.zeros(X.shape[1],y.shape[1],requires_grad = True)
    b = torch.zeros(y.shape[1],requires_grad = True)
    
    for i in range(1, max_iter):
        
        y_pred= X@W+b;
        loss = torch.mean((y_pred-y)**2)
        
        loss.backward()
    


### [2.1.2] Precision & Recall

The problem may be such that we are only interested in predicting the label<<$+$>>

Then two metrics stand out:
- **Precision** - accuracy, pinpoint aiming (like grafting)
- **Recall** ‚Äì completeness, complete solution to the problem (like an antibiotic)


$$Precision = \frac{TP}{TP+FP} = \frac{TP}{[y_{pred}^+]} = \frac{[y_{real}^+==y_{pred}^+] }{[y_{pred}^+]} = \frac{correctly\predicted\ +}{all\predicted\ +}$$
$$Recall = \frac{TP}{TP+FN} = \frac{TP}{[y_{real}^+]} = \frac{[y_{real}^+==y_{pred}^+] }{[y_{real}^+]} = \frac{correctly\ predicted\ +}{all\ real\ +}$$