# Lecture 4.1: Intro to Linear Regression

This lecture, we are going implement a univariate linear regression model on a toy example.

**Learning goals:**

- understand how to fit a dataset with a linear regression model
- understand how to calculate the loss function of a model
- predict using a linear regression model

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Section 1
## Dataset


In [None]:
data = pd.read_csv("procrastination.csv", index_col=0)
data.head()

### Let's separate our variables 

💪‍**#1: Do it!** : Assign X and y correctly

In [None]:
# ASSIGN THESE VARIABLES
X = 
y = 

### Visualize our dataset 

In [None]:
def plot_dataset(X, y, theta=None):
    def add_line(ax, theta_0, theta_1):
        x_vals = np.array(ax.get_xlim())
        y_vals = theta_0 + x_vals * theta_1
        ax.plot(x_vals, y_vals, linewidth=2, color='g')
    
    sns.set()
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(X, y)
    
    if theta is not None:
        add_line(ax, theta[0], theta[1])
    
    ax.set_ylim(ymin=0)
    ax.set_xlim(xmin=0)
    ax.set_xlabel('Time spent procrastinating online (h)')
    ax.set_ylabel('Number of cats seen')
    ax.set_title('Feline Analysis of Internet Procrastination')
    
plot_dataset(X, y)

## Modelling with a Regression line 

Our first model will have two parameters: theta0 and theta1.  
These two parameters are used to form a linear regression model with **one** feature variable.  
Together they can be placed into one `theta` vector.

👆‍**#2 - Try it out!**: Try out different values for `theta0` and `theta1` and find the values that best fit the data.  
🧠 Can you figure out what each of the these parameters do?

In [None]:
# PLAY AROUND WITH THESE TWO PARAMETERS
theta0 = 0
theta1 = 0

theta = np.array([theta0, theta1])
plot_dataset(X, y, theta=theta)

# Section 2
## Hypothesis 

💪‍ **#3 - Exercise**: Write the **hypothesis** function for linear regression

In [None]:
def hypothesis(X, theta):
    # YOUR CODE HERE
        
    return y_predicted

Try your hypothesis on different values of X

In [None]:
X =  # TRY A NUMBER

# APPLY THE HYPOTHESIS ON X


💪‍ **Exercise**: Visualize what your hypothesis looks like by assigning X to an array of arbitrary values, applying your hypothesis on it and visualizing the points on a graph.

In [None]:
# YOUR CODE HERE
X = 
y_predicted = 

plot_dataset(X, y_predicted)

# Section 3
## Using a Regression Model as a Python class 

Here is a class that implements part of univariate linear regression

In [None]:
class MyUnivariateLinearRegression:
    def __init__(self, theta):
        self.theta = np.asarray(theta).reshape(-1, 1)
    
    def predict(self, X):
        assert len(X.shape) == 2 and X.shape[1] == 1, "X needs to have shape (n, 1)"
        m, n = X.shape
        x0 = np.ones((m, 1))
        X = np.concatenate([x0, X], axis=1)
        y_predicted = np.dot(X, self.theta)
        return y_predicted
    
    def fit(self, X, y):
        print("This method is not implemented.")
        pass

💪‍**Do it!**: Format your numpy arrays X and y from the initial dataset with the correct shapes.  
$$X:(m, n)$$  $$y:(m, 1)$$

In [None]:
# YOUR CODE HERE
X =
y =
print(X.shape, y.shape)

See if the linear regression model can make predictions when given X.

In [None]:
reg = MyUnivariateLinearRegression(theta)
reg.predict(X)

# Section 4 
## Cost function 

💪‍ **Exercise**: Write the cost function for your linear regression model

$$
J(\theta) = \frac{1}{m}\sum_{i=1}^{m}(h(x)^{(i)} - y^{(i)})^2
$$


In [None]:
def cost(y, y_predicted):
    
    # YOUR CODE HERE
    

Use your function to calculate the cost of your current model

In [None]:
y_predicted =
cost(y, y_predicted)

# Resources 

## Core Resources

- [Machine Learning on Coursera - Linear Regression](https://www.coursera.org/lecture/machine-learning/model-representation-db3jS)  
Andrew Ng's always excellent course is particularly insightful for this section on Linear Regression.