Goals:  
Extend our regression model routines to support multiple features   
Rewrite prediction, cost and gradient routines to support multiple features   
Utilize NumPy np.dot to vectorize their implementations for speed and simplicity  

In [60]:
import copy, math
import numpy as np



 # Problem Statement

We will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below.
| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  

We will build a linear regression model using these values so we can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old.  


In [61]:
X_Train = np.array([[2104, 5, 1, 45],[1416, 3, 2, 40],[852, 2, 1, 35]])
y_train = np.array([460, 232, 178])


## Matrix X containing our examples
Examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).


$$\mathbf{X} = 
\begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ 
 x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\
 \cdots \\
 x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} 
\end{pmatrix}
$$
m=satır  
n=sütun  
X[satır,sütun] ----> rows indicate examples(such as houses in our example) and columns indicates features(such as size, age,..)


##  Parameter vector w, b

* $\mathbf{w}$ is a vector with $n$ elements.
  - Each element contains the parameter associated with one feature.
  - in our dataset, n is 4.



* $b$ is a scalar parameter.

* Note: For demonstration, $\mathbf{w}$ and $b$ will be loaded with some initial selected values that are near the optimal.

In [62]:
b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])


# Model Prediction With Multiple Variables
The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`



 
The dot product multiplies the values in two vectors element-wise and then sums the result.   
Vector dot product requires the dimensions of the two vectors to be same.   
The dot product is expected to return a scalar value

Note: The NumPy dot function is able to use parallel hardware in your computer. The ability of the NumPy dot function to use
parallel hardware makes it much more efficient than the for loop or the sequential calculation. If you have very large training sets,
vectorized implementation will make a huge difference in the running time of your learning algorithm.


## Using a for loop to make a single prediction
Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.

In [63]:
def prediction_loop(w,b,x):
    n = x.shape[0]
    p=0
    for i in range(n):
        p_i = w[i] * x[i]
        p = p + p_i
    p = p + b
    return p 
    

In [64]:
# get a row from our treaining data 
x_vec = X_Train[0,:] #İlk satırı ve tüm sütunları seçtik. (Slicing). Yani ilk evin özelliklerini aldık
#make a prediction
f_wb = prediction_loop(w_init, b_init, x_vec)
print(f" Prediction: {f_wb}")


 Prediction: 459.9999976194083



## Using vectorization to make a single prediction

 We can make use of vector operations to speed up predictions.

Recall from the Python/Numpy lab that NumPy `np.dot()` can be used to perform a vector dot product. 

In [65]:
def prediction_vectorized(w,b,x):
    p = np.dot(w,x) + b 
    return p
 
    

In [66]:
x_vec = X_Train[0,:]
f_wb = prediction_vectorized(w_init, b_init, x_vec)
print(f"Prediction: {f_wb}")

Prediction: 459.99999761940825



# Compute Cost With Multiple Variables
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 


In contrast to previous labs, $\mathbf{w}$ and $\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features!! (there is dot product)

In [67]:
def compute_cost(w,b,x,y):
    m = x.shape[0] #bu bizim satır yani örnek sayımızı gösteriyor
    cost = 0.0
    for i in range(m):
        f_wb_i = np.dot(x[i],w) + b 
        cost = cost + (f_wb_i - y[i])**2
    cost = cost / (2*m)
    return cost
        

In [68]:
cost = compute_cost(w_init, b_init, X_Train, y_train)
print (f" Cost at optimal w : {cost}")

 Cost at optimal w : 1.5578904880036537e-12



# Gradient Descent With Multiple Variables
Gradient descent for multiple variables:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value


In [69]:
def compute_gradient(w,b,x,y):
    m,n = x.shape #number of examples, number of features
    dj_dw = np.zeros((n)) #feature sayısı kadar w olacağı için 1 boyutlu n elemanlı dizi oluşturduk (w0,w1,w2,w3)
    dj_db = 0

    for i in range (m): #Bu dış döngü veri setindeki her bir evi işleyecek
        err = ((np.dot(x[i],w) + b ) - y[i])
        for j in range (n): # bu döngü w0,w1,..wn bulmamız için 
            dj_dw[j] = dj_dw[j] + (err * x[i,j])
        dj_db = dj_db + err
        
    dj_dw = dj_dw / m
    dj_db = dj_db / m 

    return dj_dw, dj_db
             

In [70]:
def gradient_descent(w_in,b_in,x,y,cost_function, gradient_function, alpha, num_iters):
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in

    for i in range (num_iters):
        # Calculate the gradient and update the parameters
        dj_dw, dj_db = gradient_function(w,b,x,y)
        #Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw
        b = b - alpha * dj_db
         # Save cost J at each iteration
        if i < 100000: # prevent resource exhaustion
            J_history.append(cost_function(w,b,x,y))
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f" Iteration: {i:4d} : Cost {J_history[-1]:8.2f} ")
    return w, b, J_history
    

In the next cell you will test the implementation. 

In [75]:
# initialize parameters
initial_w = np.zeros_like(w_init) #w_init ile aynı şekle sahip, ama tüm elemanları sıfır olan bir dizi olarak oluşturuluyor.
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent
w_final , b_final, J_hist =  gradient_descent(initial_w, initial_b, X_Train, y_train, compute_cost, compute_gradient, alpha, iterations)
print(f" w, b found by gradient descent: {np.round(w_final,2)}, {b_final:0.2f}")
m,_ = X_Train.shape
for i in range (m):
    print(f" Prediction: {np.dot(X_Train[i],w_final) + b_final : 0.2f}, target value: {y_train[i]}")


 Iteration:    0 : Cost  2529.46 
 Iteration:  100 : Cost   695.99 
 Iteration:  200 : Cost   694.92 
 Iteration:  300 : Cost   693.86 
 Iteration:  400 : Cost   692.81 
 Iteration:  500 : Cost   691.77 
 Iteration:  600 : Cost   690.73 
 Iteration:  700 : Cost   689.71 
 Iteration:  800 : Cost   688.70 
 Iteration:  900 : Cost   687.69 
 w, b found by gradient descent: [ 0.2   0.   -0.01 -0.07], -0.00
 Prediction:  426.19, target value: 460
 Prediction:  286.17, target value: 232
 Prediction:  171.47, target value: 178


These results are not inspiring! Cost is still declining and our predictions are not very accurate. The next lab we will explore how to improve on this.

In this lab we:
- Redeveloped the routines for linear regression, now with multiple variables.
- Utilized NumPy `np.dot` to vectorize the implementations
