<H1>Linear Regression with One Independent Variable</H1>

In this exercise, we are going to load some (x, y) data and perform a curve fitting. As there is one independent and one dependent variable, the linear regression equation fitting such data points should be of the form of $y = mx + c$ where m is the slope and c is the y-intercept

As the first step, let us load some useful packages and the data points. As always, visualization is a great step to get started.

In [None]:
# Importing libraries
from IPython.display import clear_output
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (8.0, 5.0)

In [None]:
# Preprocessing Input data
data = pd.read_csv('wk2_data.csv')
X = data.iloc[:, 0]
Y = data.iloc[:, 1]
plt.scatter(X, Y)
plt.show()

We are going to solve the problem using two different approaches. The first approach is to compute the optimal m and c mathematically. The second is using gradient descent.

To start, we need to define the optimization objective. Let us set the objective to be minimizing the sum of squares of the error

\\[ L(x) = \sum_{i=1}^n (y_i - p_i)^2\\] 

where $y_i$ is the truth value associated with the $i^{th}$ sample and $p_i$ is the associated prediction using $y = mx + c$

<H2>Find the optimal solution analytically</H2>

To find the solution analytically, we need to <i>minimize</i> $L(x)$. Could you figure out the equation for $m$ and $c$? (Hint: using calculus to find the coefficients minimizing $L(x)$)

In [None]:
# Building the model
X_mean = np.mean(X)
Y_mean = np.mean(Y)

c = # your code here

num = 0
den = 0
for i in range(len(X)):
    num += # your code here
    den += # your code here
m = num / den


print (m, c)

In [None]:
# Making predictions
Y_pred = m*X + c

plt.scatter(X, Y) # actual
plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)], color='red') # predicted
plt.show()

<H3>Discussions:</H3>

<li>Why do we need alternative methods?</li>

<H2>Finding an optimal solution using gradient descent</H2>

Gradient descent counts on updating $m$ and $c$ iteratively. Could you figure out the the equations updating both $m$ and $c$?

In [None]:
# Building the model
m = 0
c = 0

L = 0.00001  # The learning Rate
epochs = 1000  # The number of iterations to perform gradient descent

n = float(len(X)) # Number of elements in X

# Performing Gradient Descent 
for i in range(epochs):     
    clear_output(wait=True)
    plt.show()
    Y_pred = m*X + c  # The current predicted value of Y
    D_m = # Your code here - Derivative wrt m
    D_c = (-2# Derivative wrt c
    m = m - L * D_m  # Update m
    c = c - L * D_c  # Update c
    plt.scatter(X, Y)
    plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)], color='red') # predicted
    plt.show()   
    #time.sleep(.5)
print (m, c)

<H3>Discussions:</H3>
    <li>Do the answer match with the analytical answer? 
    <li>Why or why not? If not, could you make it match?
    <li>How about the learning rate? Please try incrase it by 10X and reduce it by 10X. What do you observe?