## Creating Linear Regression from Start

In [6]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

**Steps:** 
1. Define function for prediction
2. Define Cost Function
3. Define function to calculate gradients
4. Define function to get gradient descent
5. Run prediction function on weights and bias found using Gradient Descent

### Step 1.

<a name="toc_15456_3"></a>
# Model Prediction With Multiple Variables
The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`

To demonstrate the dot product, we will implement prediction using (1) and (2).

In [7]:
def predict(w, b, x):

    """
    Function to calculate the predictions for Linear Regression Model
    Args:
    x : Example with multiple features
    b : model bias
    w : model weights
    Returns:
        p (scalar): The prediction for the given inputs
    """
    p = np.dot(x,w) + b
    return p

### Step 2: 


<a name="toc_15456_4"></a>
# Compute Cost With Multiple Variables
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 


In contrast to previous labs, $\mathbf{w}$ and $\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features.

In [8]:
def cost(w,b,x,y):
    m = len(x)
    cost = 0.0
    for i in range(m):
        f_wb_i = np.dot(w, x[i]) +b
        cost = cost + (f_wb_i - y[i])**2
    cost = cost/(2*m)
    return cost

### Step 3 and 4:

<a name="toc_15456_5"></a>
# Gradient Descent With Multiple Variables

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value


In [9]:
def compute_gradient(w,b,x,y):
    """
    Computes the gradient for linear regression 
    Args:
      x (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = x.shape
    dj_dw = np.zeros((n,))
    dj_db = 0

    for i in range(m):
        f_wb_i = np.dot(x[i], w) + b
        err = f_wb_i - y[i]

        for j in range(n):
            dj_dw[j] = dj_dw[j] + err * x[i,j]
        dj_db = dj_db + err
    dj_db = dj_db /m
    dj_dw = dj_dw /m

    return dj_db, dj_dw

Gradient descent for multiple variables:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

In [10]:
import copy
def gradient_descent(x,y,w_in,b_in, alpha, num_iters, cost = cost, compute_gradient = compute_gradient):
        """
    Performs batch gradient descent to learn w and b. Updates w and b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost       : function to compute cost
      compute_gradient   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """

        w = copy.deepcopy(w_in)
        b = b_in

        for i in range(num_iters):
            dj_db, dj_dw = compute_gradient(w,b,x,y)

            w = w - alpha * dj_dw
            b = b - alpha * dj_db

        
        return w,b


In [11]:
import pandas as pd

In [12]:
df = pd.read_csv('train.csv')

In [13]:
df.head()

Unnamed: 0,trip_duration,distance_traveled,num_of_passengers,fare,tip,miscellaneous_fees,total_fare,surge_applied
0,748.0,2.75,1.0,75.0,24,6.3,105.3,0
1,1187.0,3.43,1.0,105.0,24,13.2,142.2,0
2,730.0,3.12,1.0,71.25,0,26.625,97.875,1
3,671.0,5.63,3.0,90.0,0,9.75,99.75,0
4,329.0,2.09,1.0,45.0,12,13.2,70.2,0


In [14]:
X = df.drop('total_fare', axis = 1).to_numpy()
y =df['total_fare'].to_numpy()

In [15]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X,y, random_state=234, test_size=0.3)

In [16]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

In [17]:
m, n = X.shape

In [18]:
w_in = np.zeros((n,))
b_in = 3.23423

In [19]:
w_final, b_final = gradient_descent(x_train_scaled,y_train,w_in,b_in, alpha=0.01, num_iters=1000 )

In [20]:
w_final, b_final

(array([ 1.96466143e-02, -5.72482812e-04,  6.84564677e-04,  8.64246015e+01,
         2.09789179e+01,  1.27920267e+01, -2.15199371e-01]),
 127.85254068247805)

In [25]:
y_preds_model = predict(w_final, b_final, x_test_scaled)

In [26]:
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error


print("R2 Score is: ", r2_score(y_test, y_preds_model))
print("Mean Absolute Error Score is: ", mean_absolute_error(y_test, y_preds_model))
print("Mean Squared Error Score is: ", mean_squared_error(y_test, y_preds_model))

R2 Score is:  0.999996876245515
Mean Absolute Error Score is:  0.09662777434687678
Mean Squared Error Score is:  0.029067723008694302


In [23]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression()

lr.fit(x_train_scaled, y_train)

y_preds = lr.predict(x_test_scaled)

In [24]:
print("R2 Score is: ", r2_score(y_test, y_preds))
print("Mean Absolute Error Score is: ", mean_absolute_error(y_test, y_preds))
print("Mean Squared Error Score is: ", mean_squared_error(y_test, y_preds))

R2 Score is:  1.0
Mean Absolute Error Score is:  9.30521226614631e-14
Mean Squared Error Score is:  1.543381464766739e-26
