
# Linear regression
---
                                                             written by Yang, Soyoung
                                                                      2017.06.29 v1.0
                                                                      2017.08.09 v1.1
                                                                      2017.10.31 v1.2

__위젯(widget)__으로 인풋 샘플들과 매개 변수(hyperparameter, ex. 학습률, epoch의 횟수)를 조정하고

__텐서플로우(tensorflow)__를 사용해 선형회기의 방법과 과정을 살펴본다.



#### STEPS
1. import
2. widgets - get parameters
3. Graphing
4. Session run
5. Result
6. Run with batch step
    

#### References
+ [linear regression by Sung Kim](https://www.youtube.com/watch?v=TxIVr-nk1so&list=PLlMkM4tgfjnLSOjrEJN31gZATbcj_MpUm&index=6)
+ [jupyternote widget lists](http://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html)
+ [TensorFlow](https://www.tensorflow.org/get_started/get_started)
+ [least square regression with train/test set](https://stackoverflow.com/questions/43170017/linear-regression-with-tensorflow)
+ [bokeh-io plotting](http://bokeh.pydata.org/en/latest/_images/notebook_interactors.png) : another plotting method. possible to save or crop the plot image.


---
## 선형회귀 Linear regression 

<img src="./imgs/linreg_line.gif" style="width: 400px;">
- 데이터를 가장 잘 표현하는 선을 긋는 머신러닝 방법이다. 위의 이미지에서 하늘색 선이 linear regression line이다.
- 기존에 있던 데이터를 반영하는 연산식, 가설(Hypothesis)을 만든다. 이후 들어오는 정보를 연산식에 넣어 결과값을 도출하는데 이 결과값(result, 예측값)이 목표값(target)과 가깝게 해서, 전체 에러를 줄이며 더 좋은 가설(연산식)을 만든다.
- 위의 사진과 같이 복잡한 데이터를 간단하게 표현해서 앞으로를 예측한다. 즉 훈련된 연산식을 통해서 "좋은 예측값"을 뽑는 것이 목표이다.

### 1) 비용 cost  
<img src="./imgs/linreg_cost.png" style="width: 700px;">

- Y = AX + B 을 가설, 연산식이라고 보면 A는 Weight, B는 Bias, X는 인풋, Y는 목표값이다. 이상적으로는 AX+b 와 Y가 같아야 하지만 데이터가 많다면 같기 어렵고 항상 차이가 발생한다.
- Y'= AX+B, 즉 Y'를 예측값이라고 본다면 Y와 Y'의 차이(error)를 줄이는 것이 linear regression의 목표이다.
- 즉 데이터를 잘 표현하는 선을 긋기 위해선 선(예측값, Y')과 각 점들(목표값, Y) 사이의 거리(에러)를 줄여야 한다. [L2-norm](http://mathworld.wolfram.com/L2-Norm.html)으로 각각의 거리값을 더한 전체 에러를 줄이도록 A weight와 B bias를 훈련한다.

### 2) 기울기 하강법 Gradient descent  
<img src="./imgs/linreg_convex.png" style="width: 600px;">

- 에러를 최소화 시킬 때 쓰는 방법이다. Weight와 bias의 원래 값에서 각각 예측값에 미친 영향력만큼(편미분)을 빼 Weight와 Bias를 업데이트 시킨다. Linear regression의 에러는 극솟값(local minimum)이 없는 볼록한 모양(convex)이기 때문에 Gradient Descent를 하면 global minima를 찾을 수 있다.


### 3) 정규화 regularization  
<img src="./imgs/linreg_overfit.png" style="width: 500px;">
<br>
<img src="./imgs/linreg_reg.png" style="width: 600px;">

- 하지만 에러가 0이 된다고 해서 항상 좋은 것은 아니다. 트레이닝 데이터에만 너무 최적화되어 있다면, 새로운 데이터를 제대로 처리하지 못할 수 있다. 따라서 cost를 구할 때 정규화 식(regularization form)을 넣어서 연산식(Hypothesis)의 차원을 제한한다. 

## 1. import

In [1]:
from __future__ import print_function
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display
from ipywidgets import *

## 2. widget
+ Make input samples, controlling noises.
+ Decide learning rate and epoch.
+ every variables are decided by users


In [2]:
## defining widgets
# make samples using widgets
w_sample_size = widgets.Dropdown(options=[50,100,300,1000],value=100,\
                                 description='sample size : ')
w_sample_noises = widgets.FloatSlider(value=0.50, min=0.00, max=2.00,step=0.01,\
                                      description='sample noises : ', readout=True)

# make parameters using widgets
w_lr = widgets.FloatSlider(min=0.000, max=1.000, step=0.001, value=0.001, \
                           description='learning rate : ', readout=True, readout_format='0.3f')
w_epoch = widgets.Dropdown(options=[100,200,500], value=200, description='epoch : ', disabled=False)
w_batch_size = widgets.Dropdown(options=[10,50,100],value=50,\
                                 description='batch size : ')

# toggle button widget for initialization
w_init = widgets.ToggleButton(value=False, description='Initialize', disabled=False,
                             button_style='info',icon='check')

## function to merge widgets
# make data samples by getting parameters from widgets
def merge_widget_for_data_plot(sample_size, noises):
    print('Sample size is {}, Noises is {}'.format(sample_size, noises))
    X = np.linspace(0,10,sample_size)
    Y = 2*X + noises*np.random.randn(sample_size)
    plt.plot(X,Y,'bo', label='origin data')
    plt.show()
    return(X, Y, sample_size)

def merge_widget_for_model_plot(lr, epoch, batch):
    print('Learning rate is {}, Epoch is {}, and Batch size is : {}'.format(lr, epoch,batch))
    return(lr, epoch, batch)

def initialize(value):
    if value: # run when value==true`
        w_sample_size.value = 200
        w_sample_noises.value = 0.50
        w_lr.value=0.001
        w_epoch.value=200

## interactive 
merge_widget_data = interactive(merge_widget_for_data_plot, \
                             sample_size=w_sample_size, noises=w_sample_noises)
merge_widget_model = interactive(merge_widget_for_model_plot, lr=w_lr, epoch=w_epoch,\
                                batch = w_batch_size)
restart = interactive(initialize, value=w_init)

## display interactive widgets
display(merge_widget_data, merge_widget_model, restart)

+ __get data / parameters from preceded cell__
+ __devide data into 2 sets (train set, test set)__

In [3]:
# defining widget outputs to global variable
(input_data, target_data, sample_size) = merge_widget_data.result
(lr, epoch_size, batch_size) = merge_widget_model.result

print('lr is {}, epoch_size is {}, batch is {}'.format(lr, epoch_size,batch_size)) # check


## devide data to 2 sets (for train, tests)
# use when sample size is big enoug
# to_train_data = round(len(input_data)*0.7)
# train_input = input_data[:to_train_data]
# train_target = target_data[:to_train_data]
# test_input = input_data[to_train_data:]
# test_target = target_data[to_train_data:]

lr is 0.001, epoch_size is 200, batch is 50


## 3. Graphing

+ Make graph of tensorflow
+ optimizer can be Gradient Descent optimizer or Adam optimizer

In [4]:
## placeholder for input and target samples
X = tf.placeholder(name='input_data', dtype=tf.float32)
Y = tf.placeholder(name='target_data', dtype=tf.float32)

## if devided train set and test set
# X_train = tf.placeholder(name='train_input', dtype=tf.float32)
# Y_train = tf.placeholder(name='train_target', dtype=tf.float32)
# X_test = tf.placeholder(name='test_input', dtype=tf.float32)
# Y_test = tf.placeholder(name='test_target', dtype=tf.float32)

## Model parameters
w = tf.Variable(name='weight', initial_value=0, dtype=tf.float32)
b = tf.Variable(name='bias', initial_value=1, dtype=tf.float32)

## Model w*X + b
Y_predict = tf.add(tf.multiply(w, X), b)

## loss function, sse
loss = tf.reduce_sum(tf.square(Y-Y_predict))

## optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=lr)
trainer = optimizer.minimize(loss)
# or
# optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)

## initializer
init = tf.global_variables_initializer()

## 4. Session run

+ Session.run the graph we made
+ stack weight and bias to see training process in 5
+ show progress when epoch is times of 10

In [5]:
# stack weights and biases while trainig
w_stack = []
b_stack = []

with tf.Session() as sess:
    sess.run(init) # initialize before start
    print('Initialized')
    # start training
    for epoch in range(epoch_size):
        total_loss = 0
        for step, data in enumerate(zip(input_data, target_data)):
            x, y = data # step-th sample in input_data, target_data
            _, train_loss = sess.run([trainer, loss], feed_dict={X:x, Y:y})
            total_loss += train_loss
            current_w, current_b = sess.run([w, b])
            w_stack.append(current_w)
            b_stack.append(current_b)
            total_loss = total_loss/sample_size
        if epoch%10==0:
            print('After {} epochs: total_loss={:0.4f}, weight={:0.4f}, bias={:0.4f}\
             '.format(epoch, total_loss, current_w, current_b))
    print('\nTraining is DONE.')

Initialized
After 0 epochs: total_loss=0.0033, weight=1.8298, bias=1.4279             
After 10 epochs: total_loss=0.0028, weight=1.8794, bias=0.9677             
After 20 epochs: total_loss=0.0026, weight=1.9110, bias=0.6676             
After 30 epochs: total_loss=0.0025, weight=1.9316, bias=0.4720             
After 40 epochs: total_loss=0.0024, weight=1.9450, bias=0.3445             
After 50 epochs: total_loss=0.0024, weight=1.9537, bias=0.2614             
After 60 epochs: total_loss=0.0023, weight=1.9594, bias=0.2073             
After 70 epochs: total_loss=0.0023, weight=1.9632, bias=0.1720             
After 80 epochs: total_loss=0.0023, weight=1.9656, bias=0.1490             
After 90 epochs: total_loss=0.0023, weight=1.9672, bias=0.1340             
After 100 epochs: total_loss=0.0023, weight=1.9682, bias=0.1242             
After 110 epochs: total_loss=0.0023, weight=1.9689, bias=0.1178             
After 120 epochs: total_loss=0.0023, weight=1.9693, bias=0.1137            

## 5. Results

- Control weight and bias from untraind to trained
- or Drag third slider through the process step

In [6]:
widget_w_stack = widgets.SelectionSlider(options=w_stack, value=w_stack[0], \
                                         description='w : untrained--->trained ', readout=True)
widget_b_stack = widgets.SelectionSlider(options=b_stack, value=b_stack[0], \
                                         description='b : untrained--->trained ',readout=True)

def merge_widgets_plot_results(w_value, b_value):
    plt.plot(input_data,target_data,'bo', label='origin data')
    reg_line = w_value * input_data + b_value
    plt.plot(input_data,reg_line, 'r-')
    plt.show()

show_lsreg = interactive(merge_widgets_plot_results, w_value=widget_w_stack, b_value=widget_b_stack)
display(show_lsreg)

In [7]:
step_lists = list(range(len(w_stack)))
widget_train_process = widgets.SelectionSlider(options=step_lists, value=0,\
                                              description='untrained ---> trained', readout=True)

def show_process_following_widgets(step):
    w_value = w_stack[step]
    b_value = b_stack[step]
    plt.plot(input_data,target_data,'bo', label='origin data')
    reg_line = w_value * input_data + b_value
    plt.plot(input_data,reg_line, 'r-')
    plt.show()

show_process = interactive(show_process_following_widgets, step=widget_train_process)
display(show_process)

---
## 6. Run with batch


### 1) generate batch sample

In [9]:
def generate_batch(x, y, batch_size, shuffle=True):
    data_size = len(x)
    batch_x = []
    batch_y = []
    if shuffle:
        shuffle_indices = np.random.permutation(np.arange(data_size))
        shuffled_x = x[shuffle_indices]
        shuffled_y = y[shuffle_indices]
    else :
        shuffled_x = x
        shuffled_y = y

    num_batches = int((data_size-1)/batch_size)+1
    for batch_num in range(num_batches):
        start_index = batch_num * batch_size
        end_index = min((batch_num+1)*batch_size,data_size)
        x_s = shuffled_x[start_index:end_index]
        batch_x.append(x_s)
        y_s = shuffled_y[start_index:end_index]
        batch_y.append(y_s)
    return(batch_x, batch_y, num_batches)

(batch_input, batch_target, num_batches) = generate_batch(input_data, target_data, batch_size, shuffle=True)

### 2) Graphing

In [10]:
## placeholder for input and target samples
X = tf.placeholder(shape=[None],name='input_data', dtype=tf.float32)
Y = tf.placeholder(shape=[None],name='target_data', dtype=tf.float32)

## Model parameters
w = tf.Variable(name='weight', initial_value=0, dtype=tf.float32)
b = tf.Variable(name='bias', initial_value=1, dtype=tf.float32)

## Model w*X + b
Y_predict = tf.add(tf.multiply(w,X),b)

## loss function, sse
loss = tf.reduce_mean(tf.square(Y-Y_predict))

## optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=lr)
trainer = optimizer.minimize(loss)
    # or
    # optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)

## initializer
init = tf.global_variables_initializer()

### 3) session.run

In [11]:
# stack weights and biases while trainig
w_batch_stack = []
b_batch_stack = []

with tf.Session() as sess:
    sess.run(init) # initialize before start
    print('Initialized')
    # start training
    for epoch in range(epoch_size):
        total_loss = 0
        for step, data in enumerate(zip(batch_input, batch_target)):
            x_batch, y_batch = data # step-th sample in input_data, target_data
            _, train_loss = sess.run([trainer, loss], feed_dict={X:x_batch, Y:y_batch})
            input_size = len(x_batch)
            total_loss += train_loss/input_size
            current_w, current_b = sess.run([w, b])
            w_batch_stack.append(current_w)
            b_batch_stack.append(current_b)
        total_loss = total_loss/num_batches
        if epoch%10==0:
            print('\nAfter {} epochs: total_loss={:0.4f}, weight={:0.4f}, bias={:0.4f}\
             '.format(epoch, total_loss, current_w, current_b))
    print('\nTraining is DONE.')

Initialized

After 0 epochs: total_loss=2.0965, weight=0.2364, bias=1.0344             

After 10 epochs: total_loss=0.1330, weight=1.4125, bias=1.1996             

After 20 epochs: total_loss=0.0179, weight=1.6982, bias=1.2311             

After 30 epochs: total_loss=0.0111, weight=1.7686, bias=1.2304             

After 40 epochs: total_loss=0.0105, weight=1.7868, bias=1.2220             

After 50 epochs: total_loss=0.0104, weight=1.7925, bias=1.2118             

After 60 epochs: total_loss=0.0103, weight=1.7951, bias=1.2012             

After 70 epochs: total_loss=0.0102, weight=1.7969, bias=1.1906             

After 80 epochs: total_loss=0.0100, weight=1.7985, bias=1.1801             

After 90 epochs: total_loss=0.0099, weight=1.8001, bias=1.1697             

After 100 epochs: total_loss=0.0098, weight=1.8017, bias=1.1594             

After 110 epochs: total_loss=0.0097, weight=1.8032, bias=1.1492             

After 120 epochs: total_loss=0.0096, weight=1.8047, bias=1.139

### 4) plot result

In [12]:
step_lists = list(range(len(w_batch_stack)))
widget_train_process = widgets.SelectionSlider(options=step_lists, value=0,\
                                              description='untrained ---> trained', readout=True)

def show_process_following_widgets(step):
    w_value = w_batch_stack[step]
    b_value = b_batch_stack[step]
    plt.plot(input_data,target_data,'bo', label='origin data')
    reg_line = w_value * input_data + b_value
    plt.plot(input_data,reg_line, 'r-')
    plt.show()

show_process = interactive(show_process_following_widgets, step=widget_train_process)
display(show_process)