# The Model
$
\huge y_{i} 
= \alpha + \beta_1{x_{i1}} + \beta_2{x_{i2}} +...+ \beta_k{x_{ik}} + \epsilon_{i}
$

* 예) minutes = $\alpha$ + $\beta_1$friends + $\beta_2$work_hours + $\beta_3$phd + $\epsilon$
* Simple Linear Regression의 확장
* 입력 변수가 여러개 ($x_{i1},x_{i2},...,x_{ik}$)
* 회귀계수도 여러개 ($\alpha, \beta_1, \beta_2, ...,\beta_k$)
  + 예측에 필요한 계수. 머신러닝을 통해 구해야 할 값


____
##### Simple Linear Regression

  + $y_{i} = \alpha + \beta {x_{i}} + \epsilon_{i} $
  + 회귀계수: $\alpha$, $\beta$
  + 하나의 입력값 ($x_i$)
____


## Vector 표현
* 입력 변수와 회귀계수가 여러개 &rarr; Vector로 다루면 편리하다
  + beta = [alpha, beta_1, beta_2, ..., beta_k]
  + x_i = [1, x_i1, x_i2, ..., x_ik]


## 예측값
* Vector의 dot 연산 이용
  - x_i 벡터의 첫번째 요소로 상수값 1을 사용한 이유
* 회귀계수 벡터를 구한 상태에서, 입력변수 벡터가 주어졌을때의 예측값
  - dot(x_i, beta): $\alpha + \beta_1{x_{i1}} + \beta_2{x_{i2}} +...+ \beta_k{x_{ik}}$

In [1]:
# 'scratch' package가 설치된 디렉토리로 작업 디렉토리 바꿈 (package를 import하기 위함)
import os, sys
os.chdir('..')

from scratch.linear_algebra import dot, Vector

def predict(x: Vector, beta: Vector) -> float:
    """assumes that the first element of x is 1"""
    return dot(x, beta)

In [3]:
# Example - x_i
x_i= [1,    # constant term
      49,   # number of friends
      4,    # work hours per day
      0]    # doesn't have PhD

## Further Assumptions of the Least Squares Model
* 입력변수($x_{ij}$)는 다른 입력변수들의 Weighted Sum으로 구할 수 없다 (Linearly Independent)
* 입력변수($x_{ij}$)는 오류($\epsilon_{i}$)와 서로 상관관계가 없다
* 위의 두 가정이 만족되지 않으면 정확한 결과를 얻지 못한다
  + 실제로 위의 두 가정을 제대로 만족하는 모형을 만들기는 쉽지 않다
  + "정확한" 결과를 얻지는 못하지만 "괜찮은" 수준의 결과는 얻을 수 있다

# Fitting the Model
* Simple Linear Regression의 확장
* Gradient Descent 알고리즘을 이용하여 회귀계수를 구할 것임
* 다만, 입력변수와 회귀계수가 모두 벡터로 표현되기 때문에 일부 함수의 구현이 달라진다

In [5]:
from typing import List

def error(x: Vector, y: float, beta: Vector) -> float:
    return predict(x, beta) - y

def squared_error(x: Vector, y: float, beta: Vector) -> float:
    return error(x, y, beta) ** 2

x = [1, 2, 3]
y = 30
beta = [4, 4, 4]  # so prediction = 4 + 8 + 12 = 24

assert error(x, y, beta) == -6
assert squared_error(x, y, beta) == 36

### Gradient 함수
* [Remind] Simple Linear Regression에서
  + $SSE = \sum_{i=1}^{n}(y_i - \alpha - \beta x_i)^2$
  + $\frac{\partial SSE}{\partial \alpha} = 2(y_i- \alpha - \beta x_i) = 2\epsilon_i$
  + $\frac{\partial SSE}{\partial \beta} = 2(y_i- \alpha - \beta x_i)x_i = 2\epsilon_i x_i$
* Multiple Regression에서는 $\alpha$ 값이 벡터 beta에 포함되도록 모형을 만들었음
  + beta = [alpha, beta_1, beta_2, ..., beta_k]
    - 입력 벡터 $x_i$의 첫번째 값이 1이므로
    - $\frac{\partial SSE}{\partial \alpha} = 2(y_i- \alpha - \beta x_i) = 2\epsilon_i$
    - $\frac{\partial SSE}{\partial \beta} = 2(y_i- \alpha - \beta x_i)x_i = 2\epsilon_i x_i = 2\epsilon_i=\frac{\partial SSE}{\partial \alpha}$, when $x_i=1$
  + 따라서, $\beta$에 대한 Gradient만 있으면 됨

In [6]:
def sqerror_gradient(x: Vector, y: float, beta: Vector) -> Vector:
    err = error(x, y, beta)
    return [2 * err * x_i for x_i in x]

assert sqerror_gradient(x, y, beta) == [-12, -24, -36]

### Fitting (Least Squares Fit)
* Gradient Descent 알고리즘을 이용하여 오류 제곱의 합이 최소가 되는 $\beta$(벡터)를 찾는다

In [7]:
import random
import tqdm
from scratch.linear_algebra import vector_mean
from scratch.gradient_descent import gradient_step


def least_squares_fit(xs: List[Vector],
                      ys: List[float],
                      learning_rate: float = 0.001,
                      num_steps: int = 1000,
                      batch_size: int = 1) -> Vector:
    """
    Find the beta that minimizes the sum of squared errors
    assuming the model y = dot(x, beta).
    """
    # Start with a random guess
    guess = [random.random() for _ in xs[0]]

    for _ in tqdm.trange(num_steps, desc="least squares fit"):
        for start in range(0, len(xs), batch_size):
            batch_xs = xs[start:start+batch_size]
            batch_ys = ys[start:start+batch_size]

            gradient = vector_mean([sqerror_gradient(x, y, guess)
                                    for x, y in zip(batch_xs, batch_ys)])
            guess = gradient_step(guess, gradient, -learning_rate)

    return guess

In [14]:
# 1(고정값), num_friends, work_hours, phd(Yes/No)
inputs: List[List[float]] = [[1.,49,4,0],[1,41,9,0],[1,40,8,0],[1,25,6,0],[1,21,1,0],[1,21,0,0],[1,19,3,0],[1,19,0,0],[1,18,9,0],[1,18,8,0],[1,16,4,0],[1,15,3,0],[1,15,0,0],[1,15,2,0],[1,15,7,0],[1,14,0,0],[1,14,1,0],[1,13,1,0],[1,13,7,0],[1,13,4,0],[1,13,2,0],[1,12,5,0],[1,12,0,0],[1,11,9,0],[1,10,9,0],[1,10,1,0],[1,10,1,0],[1,10,7,0],[1,10,9,0],[1,10,1,0],[1,10,6,0],[1,10,6,0],[1,10,8,0],[1,10,10,0],[1,10,6,0],[1,10,0,0],[1,10,5,0],[1,10,3,0],[1,10,4,0],[1,9,9,0],[1,9,9,0],[1,9,0,0],[1,9,0,0],[1,9,6,0],[1,9,10,0],[1,9,8,0],[1,9,5,0],[1,9,2,0],[1,9,9,0],[1,9,10,0],[1,9,7,0],[1,9,2,0],[1,9,0,0],[1,9,4,0],[1,9,6,0],[1,9,4,0],[1,9,7,0],[1,8,3,0],[1,8,2,0],[1,8,4,0],[1,8,9,0],[1,8,2,0],[1,8,3,0],[1,8,5,0],[1,8,8,0],[1,8,0,0],[1,8,9,0],[1,8,10,0],[1,8,5,0],[1,8,5,0],[1,7,5,0],[1,7,5,0],[1,7,0,0],[1,7,2,0],[1,7,8,0],[1,7,10,0],[1,7,5,0],[1,7,3,0],[1,7,3,0],[1,7,6,0],[1,7,7,0],[1,7,7,0],[1,7,9,0],[1,7,3,0],[1,7,8,0],[1,6,4,0],[1,6,6,0],[1,6,4,0],[1,6,9,0],[1,6,0,0],[1,6,1,0],[1,6,4,0],[1,6,1,0],[1,6,0,0],[1,6,7,0],[1,6,0,0],[1,6,8,0],[1,6,4,0],[1,6,2,1],[1,6,1,1],[1,6,3,1],[1,6,6,1],[1,6,4,1],[1,6,4,1],[1,6,1,1],[1,6,3,1],[1,6,4,1],[1,5,1,1],[1,5,9,1],[1,5,4,1],[1,5,6,1],[1,5,4,1],[1,5,4,1],[1,5,10,1],[1,5,5,1],[1,5,2,1],[1,5,4,1],[1,5,4,1],[1,5,9,1],[1,5,3,1],[1,5,10,1],[1,5,2,1],[1,5,2,1],[1,5,9,1],[1,4,8,1],[1,4,6,1],[1,4,0,1],[1,4,10,1],[1,4,5,1],[1,4,10,1],[1,4,9,1],[1,4,1,1],[1,4,4,1],[1,4,4,1],[1,4,0,1],[1,4,3,1],[1,4,1,1],[1,4,3,1],[1,4,2,1],[1,4,4,1],[1,4,4,1],[1,4,8,1],[1,4,2,1],[1,4,4,1],[1,3,2,1],[1,3,6,1],[1,3,4,1],[1,3,7,1],[1,3,4,1],[1,3,1,1],[1,3,10,1],[1,3,3,1],[1,3,4,1],[1,3,7,1],[1,3,5,1],[1,3,6,1],[1,3,1,1],[1,3,6,1],[1,3,10,1],[1,3,2,1],[1,3,4,1],[1,3,2,1],[1,3,1,1],[1,3,5,1],[1,2,4,1],[1,2,2,1],[1,2,8,1],[1,2,3,1],[1,2,1,1],[1,2,9,1],[1,2,10,1],[1,2,9,1],[1,2,4,1],[1,2,5,1],[1,2,0,1],[1,2,9,1],[1,2,9,1],[1,2,0,1],[1,2,1,1],[1,2,1,1],[1,2,4,1],[1,1,0,1],[1,1,2,1],[1,1,2,1],[1,1,5,1],[1,1,3,1],[1,1,10,1],[1,1,6,1],[1,1,0,1],[1,1,8,1],[1,1,6,1],[1,1,4,1],[1,1,9,1],[1,1,9,1],[1,1,4,1],[1,1,2,1],[1,1,9,1],[1,1,0,1],[1,1,8,1],[1,1,6,1],[1,1,1,1],[1,1,1,1],[1,1,5,1]]

In [13]:
from scratch.statistics import daily_minutes_good
from scratch.gradient_descent import gradient_step

random.seed(0)
# I used trial and error to choose niters and step_size.
# This will run for a while.
learning_rate = 0.001

beta = least_squares_fit(inputs, daily_minutes_good, learning_rate, 5000, 25)
assert 30.50 < beta[0] < 30.70  # constant
assert  0.96 < beta[1] <  1.00  # num friends
assert -1.89 < beta[2] < -1.85  # work hours per day
assert  0.91 < beta[3] <  0.94  # has PhD

print('Beta', beta)

least squares fit: 100%|██████████████████████████████████████████████████████████| 5000/5000 [00:07<00:00, 634.12it/s]


Beta [30.514795945185586, 0.9748274277323267, -1.8506912934343662, 0.91407780744768]


### 학습결과
* Beta: [30.514795945185586, 0.9748274277323267, -1.8506912934343662, 0.91407780744768]
* 최종 예측 모델
  + minutes = 30.51 + 0.975*friends -1.85*work_hours + 0.91*phd

In [16]:
print("100 friends, 8 work_hours, has PhD ==> Minutes online", predict([1, 100, 8, 1], beta) )

100 friends, 8 work_hours, has PhD ==> Minutes online 114.10608617839101


# Interpreting the Model
* all-else-being-equal
  + 예) 다른 모든 입력 값이 똑같을 때, 친구 1명이 추가되면 인터넷에서 보내는 시간이 1분 정도 증가한다
  + 그러나, 실제로는 입력 변수들 간에 "간섭"이 있을 수 있다
    - 예) 친구가 많고 적음에 따라 work_hours가 달라질 수 있음
  + 보정 예) num_friends x work_hours 입력 변수를 추가 (서로 간섭, 또는 영향을 미칠 수 있는 변수들의 곱)
* 친구 수가 많으면 많을수록 인터넷에서 보내는 시간이 길어질까?
  + 어느 정도까지는 길어질 수 있지만, 일정 수준(up to a point)을 넘으면 의미가 없어짐
  + 보정 예) 이러한 변수는 제곱(Square)을 취해준다
  + 제곱을 취하면 입력변수의 값이 제곱만큼 커질 수 있음 &rarr; 입력값이 커지면 회귀계수는 작아지는 방향으로 보정될 것임
    - Gradient Descent 알고리즘에 의해 오류를 0에 가깝게 만들는 방향으로 회귀계수를 조정하므로