# Multivariate Linear Regression

---

## Multiple Features

- ##### Notation

  - **n**	:	number of features
  - **m** = number of training examples
  - **x<sup>i</sup>** = input (features) of i<sup>th</sup> training examples
  - **x<sub>j</sub><sup>i</sup>** = value of feature j in i<sup>th</sup> training examples

| Size<br>x<sub>1</sub> | Number of<br>bedrooms<br>x<sub>2</sub> | Number of<br>floors<br>x<sub>3</sub> | Age of<br>home<br>x<sub>4</sub> | Price<br>y |
| :-------------------: | :------------------------------------: | :----------------------------------: | :-----------------------------: | :--------: |
|         2104          |                   5                    |                  1                   |               45                |    460     |
|         1416          |                   3                    |                  2                   |               40                |    232     |
|         1534          |                   3                    |                  2                   |               30                |    315     |
|          ...          |                  ...                   |                 ...                  |               ...               |    ...     |

  - E.g.
      - x<sup>(2)</sup> = $ \begin{bmatrix} 1416 \\\ 3 \\\ 2 \\\ 40 \end{bmatrix} \qquad$   x<sup>(3)</sup> = $ \begin{bmatrix} 1534 \\\ 3 \\\ 2 \\\ 30 \end{bmatrix}$
  
  <br><br>
  
  
- ### Hypothesis

    - #### H<sub>&theta;</sub>(x) = &theta;<sup>T</sup>x = &theta;<sub>0</sub>x<sub>0</sub> +  &theta;<sub>1</sub>x<sub>1</sub> +  &theta;<sub>2</sub>x<sub>2</sub> + &middot;&middot;&middot; +  &theta;<sub>n</sub>x<sub>n</sub>

    - For convenience of notation, define x<sub>0</sub> = 1 (x<sub>0</sub><sup>(i)</sup> = 1) 

        <br>

        - $ x = \begin{bmatrix} x_0 \\\ x_1 \\\ x_2 \\\ \cdot  \\\ \cdot \\\ \cdot \\\ x_n \end{bmatrix} \qquad \theta =  \begin{bmatrix} \theta_0 \\\ \theta_1 \\\ \theta_2 \\\ \cdot  \\\ \cdot \\\ \cdot \\\ \theta_n \end{bmatrix} $

        <br>

  - H<sub>&Theta;</sub>(x) : multivariate linear regression
      - x :  n + 1 dimensional vector
      - &theta; : n + 1 dimensional vector
      <br><br>


- ### Cost Function

    - $ J(\theta) = {1\over 2m}{\sum_{i=1}^m } $(H<sub>&theta;</sub>(x<sup>(i)</sup>) - y<sup>(i)</sup>)<sup>2</sup> $ \; = \; {1\over 2m}{\sum_{i=1}^m } $( &theta;<sup>T</sup>x<sup> (i)</sup> - y<sup> (i)</sup> )<sup>2</sup>

    $\qquad\quad\;\;  = {1\over 2m}{\sum_{i=1}^m }(( \sum_{j=0}^n$ &theta;<sub>j </sub>x<sub>j</sub><sup>(n)</sup> ) - y<sup>(i)</sup> )<sup>2</sup>

- ### Gradient Descent
    $\quad$\- **simultaneously update** for every j = 0,1,2, $\cdots$,n

   - $\theta_j := \theta_j - \alpha{\delta\over \delta\theta_j}J(\theta)$<br>
$\quad\;\; = \theta_j - \alpha{1 \over m}{\sum_{i=1}^m } $(H<sub>&theta;</sub>(x<sup>(i)</sup>) - y<sup>(i)</sup>)x<sub>j</sub><sup>(i)</sup>

   - E.g.
        - $\theta_0 := \theta_0 - \alpha{1 \over m}{\sum_{i=1}^m } $(H<sub>&theta;</sub>(x<sup>(i)</sup>) - y<sup>(i)</sup>)x<sub>0</sub><sup>(i)</sup>
        - $\theta_1 := \theta_1 - \alpha{1 \over m}{\sum_{i=1}^m } $(H<sub>&theta;</sub>(x<sup>(i)</sup>) - y<sup>(i)</sup>)x<sub>1</sub><sup>(i)</sup>

#### Example

|  X1  |  X2  |  X3  |  Y   |
| :--: | :--: | :--: | :--: |
|  73  |  80  |  75  | 152  |
|  93  |  88  |  93  | 185  |
|  89  |  91  |  90  | 180  |
|  96  |  98  | 100  | 196  |
|  73  |  66  |  70  | 142  |

### No Matrix

In [19]:
import tensorflow as tf

x1_data = [73., 93., 89., 96., 73.]
x2_data = [80., 88., 91., 98., 66.]
x3_data = [75., 93., 90., 100., 70.]
y_data = [152., 185., 180., 196., 142.]

x1 = tf.placeholder(tf.float32)
x2 = tf.placeholder(tf.float32)
x3 = tf.placeholder(tf.float32)

Y = tf.placeholder(tf.float32)

w1 = tf.Variable(tf.random_normal([1]), name='w1')
w2 = tf.Variable(tf.random_normal([1]), name='w2')
w3 = tf.Variable(tf.random_normal([1]), name='w3')
b = tf.Variable(tf.random_normal([1]), name='b')

# hypothesis
hypothesis = x1 * w1 + x2 * w2 + x3 * w3 + b

# cost function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# gradient descent algorithm
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.00001)
train = optimizer.minimize(cost)

# Training Start

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for step in range(2001):
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], feed_dict={x1: x1_data, x2: x2_data, x3: x3_data, Y: y_data})
    if step % 200 == 0:
        print(step, "\tCost: ", cost_val, "\thypothesis: ", hy_val)

0 	Cost:  45084.387 	hypothesis:  [-29.119387 -46.974102 -40.01806  -45.541195 -37.040848]
200 	Cost:  24.283691 	hypothesis:  [158.83186 179.67203 182.91368 197.27237 135.97934]
400 	Cost:  21.810026 	hypothesis:  [158.45894 179.92746 182.79912 197.19174 136.31262]
600 	Cost:  19.590466 	hypothesis:  [158.10564 180.16942 182.69057 197.1155  136.62827]
800 	Cost:  17.598837 	hypothesis:  [157.77092 180.39865 182.5877  197.04335 136.92719]
1000 	Cost:  15.81179 	hypothesis:  [157.45383 180.61581 182.49026 196.9751  137.21028]
1200 	Cost:  14.208174 	hypothesis:  [157.1534  180.82156 182.39792 196.91055 137.4784 ]
1400 	Cost:  12.769322 	hypothesis:  [156.86876 181.01645 182.31041 196.84947 137.73227]
1600 	Cost:  11.478229 	hypothesis:  [156.5991  181.20108 182.22748 196.7917  137.9727 ]
1800 	Cost:  10.319711 	hypothesis:  [156.34363 181.376   182.14891 196.73709 138.2004 ]
2000 	Cost:  9.280136 	hypothesis:  [156.10156 181.54172 182.07446 196.68542 138.41602]


### Use Matrix

In [10]:
import tensorflow as tf

x_data = [[73, 80, 75],
          [93, 88, 93], 
          [89, 91, 90], 
          [96, 98, 100],
          [73, 66, 70]]

y_data = [[152],
          [185],
          [180],
          [196],
          [142]]

X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3, 1]), name='Weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

# hypothesis
hypothesis = tf.matmul(X, W) + b

# cost function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# gradient descent algorithm
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.00001)
train = optimizer.minimize(cost)

# Training Start

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for step in range(2001):
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train], feed_dict={X: x_data, Y: y_data})
    if step % 200 == 0:
        print(step, "\tCost: ", cost_val, "\nPrediction:\n", hy_val, "\n")

0 	Cost:  32848.746 
Prediction:
 [[-10.408987]
 [ -8.236517]
 [-10.556303]
 [ -8.331167]
 [ -7.879247]] 

200 	Cost:  3.543628 
Prediction:
 [[150.37996]
 [185.05687]
 [179.8811 ]
 [199.03442]
 [139.5775 ]] 

400 	Cost:  3.5007064 
Prediction:
 [[150.36264]
 [185.07048]
 [179.878  ]
 [199.01498]
 [139.6099 ]] 

600 	Cost:  3.4592838 
Prediction:
 [[150.34662]
 [185.08322]
 [179.87532]
 [198.99593]
 [139.64104]] 

800 	Cost:  3.4192185 
Prediction:
 [[150.3318 ]
 [185.09511]
 [179.87297]
 [198.9772 ]
 [139.67099]] 

1000 	Cost:  3.3804092 
Prediction:
 [[150.31812]
 [185.10617]
 [179.87094]
 [198.95882]
 [139.69983]] 

1200 	Cost:  3.3427098 
Prediction:
 [[150.30556]
 [185.11652]
 [179.86926]
 [198.94075]
 [139.72762]] 

1400 	Cost:  3.3060856 
Prediction:
 [[150.294  ]
 [185.1261 ]
 [179.86787]
 [198.92297]
 [139.75436]] 

1600 	Cost:  3.2703812 
Prediction:
 [[150.28346]
 [185.13503]
 [179.86679]
 [198.90547]
 [139.78017]] 

1800 	Cost:  3.2356153 
Prediction:
 [[150.27382]
 [185.14

## Feature Scaling

- ### Feature Scaling

  - Need to speed up for find global optimum

  - Idea : Make sure features are on a **similar scale** (approximately)

    - if each feature are **similar scale**, to find more easier

  - $ x_i = {x_i \over (maxScale - minScale)} $

  - E.g.

    - x<sub>1</sub> = size ( 0 - 2000 feet<sup>2</sup>)
    - x<sub>2</sub> = number of bedrooms
    - //todo : draw graph

    

  - range : $ -1 \le x_i \le 1 $

  - $x_i$ 가 정확히 -1, 1 사이에 존재할 필요 없음

  - **approximately same scale** of all features is ok
  - E.g
      - $ 0 \le x_i \le 3 $	    ( o )
      - $ -2 \le x_i \le 0.5 $   ( o )
      - $ -100 \le x_i \le 100 $ ( x ) too big
      - $ -0.0001 \le x_i \le 0.0001 $ (x)  too small
      - Andrew ng professor say, $ -3 \sim 3, {-1\over 3} \sim {1\over 3} $ is fine

In [3]:
import tensorflow as tf
import numpy as np