###Welcome

In this project we will be building a Linear Regression algorithm from scrath while at the same time explaning the concepts behind it

With linear regression we are trying to find the line of best fit for our data, doing so will allow us to predict a given outcome based on the relationship between two variables. We can for example forecast sales based on previous months performance, try to predict the price for a property based on variables such as the number of square feet, number of bathrooms / bedrooms, estimate demand for a given product, forecast production capabilities, understand the relationship between customer complains and customer service wait times and many many other practical applications. 

Linear Regression is a great way of supporting business decisions by improving the accuracy and effectiveness of them.

We are trying to find the line of best fit, a line is determined by its Slope and its Intercept given y = mx + b

Where y = given point on y axis, m = slope and b = intercept

We are therefore tring to find the best m and b for our data

To find the model of best fit we must calculate loss, which is the squared distance from the line to each data point, our objective is to minimize the loss or in other words find the line which has the lowest sum of squared errors

Lets dive into a quick example

In [10]:
x = [3, 5, 7]
y = [3, 8, 11]

#Let's set the slope and intercept
m1 = 1
b1 = 0

#Now we'll find the y values that a line with the previous weights (m1 and b1) would predict. 
#We do this using a list comprehension and store these values in the variable y_prediction1
y_prediction1 = [m1 * x + b1 for x in x] 

In [11]:
#Let's set a second set to compare the results
m2 = 0.5
b2 = 1

#And go through the same steps for this second set
y_prediction2 = [m2 * x + b2 for x in x]

In [12]:
#Now we find the sum of the squared distance between the predicted y values and the actual y values

#A simple loop wil do the trick, this will calculate the difference between y and y prediction1, square the difference and add the total to the variable loss1 

loss1 = 0
for i in range(len(y)):
  loss1 += (y[i] - y_prediction1[i]) ** 2

#And do the same for the second set

loss2 = 0
for i in range(len(y)):
  loss2 += (y[i] - y_prediction2[i]) ** 2

In [13]:
#Let's find out which line has the lowest loss
print(loss1, loss2)

25 62.75


###Great, line 1 has definitely a lower average squared error = Is the line of best fit

Onto our next steps: Gradient Descent

Gradient means the slope of the curve, we want to move in the direction that minimizes loss the most, gradient descent will change/update each parameter to to minimize the cost function until it reaches a minimum point, in other words the model will predict the values of x in the best way it can

###To get the gradient for b we use the following function. This used used to find the way the loss changes as the intercept of the line also changes

In [40]:
#First we take a set of x and y values, a slope and an intercept
def gradient_b(x, y, m, b):
# Our goal is to go through all x and y values calculating (y - (m * x + b)) for each
    sum_values = 0
    N = len(x)
    for i in range(N):
        y_value = y[i]
        x_value = x[i]
        sum_values += x_value * (y_value - ((m * x_value) + b))
    b_gradient = -2/N * sum_values
    return b_gradient
#For reference m = current gradient guess, b = current intercept guess, and N = number of points in the dataset

Now lets do the same for the slope

In [41]:
def gradient_m(x, y, m, b):
    sum_values = 0
    N = len(x)
    for i in range(N):
        y_value = y[i]
        x_value = x[i]
        sum_values += x_value * (y_value - ((m * x_value) + b))
    m_gradient = -2/N * sum_values
    return m_gradient

###We will now multiply the gradients by a learning rate

In [78]:
#This function will move our values towards a smallest loss each time 
def gradients(b_atm, m_atm, x, y, learning_rate):
    b_gradient = gradient_b(x, y, b_atm, m_atm)
    m_gradient = gradient_m(x, y, b_atm, m_atm)
    b = b_atm - (0.01 * b_gradient)
    m = m_atm - (0.01 * m_gradient)
    return b, m
#For reference: 0.01 = learning rate, b_atm and m_atm = the guess for the b and m values location and b_gradient and m_gradient are the gradients of the loss curve at the current guess

###Now how do we know when our model has learned enough or in other words when to stop changing the parameters?

We do this by understanding convergence, which is when the loss changes very slowly or simply stops changing. We therefore want our model to converge at some point.

In order for our model to run based on any given number of iterations, we want to move the b and m values in the direction of the gradients and in doing so find the best values

To do this we will find the "approximate" best learning rate which will allow our model to converge. Based on the learning rate and number of iterations we define

We will now create the gradient descent function which is the last step of our model

In [82]:
def gradient_descent(x, y, learning_rate, iterations):
    b = 0
    m = 0
#The loop will run a number of times at each step, we define the number and learning rate below
    for i in range(iterations):
        b, m = gradients(b, m, x, y, learning_rate)
        return [b,m]  

Great that concludes our linear regression algorithm. 

This was just a step by a step project to build the algorithm from scratch, in the future we can import it from scikit learn with 2 simple lines

from sklearn.linear_model import LinearRegression

###Key takeaways:
    
The objective of linear regression is to minimize loss by finding the line of best fit

We need both the slope (m) and intercept (b) to minimize the loss

The algorithm reaches convergence when loss stops changing or changes very slowly. 

We change both the number of iterations and the learning rate to determine how much parameters are changed on each iteration

Linear regression has a wide variety of practical business applications.

###There are different kinds of regression algorithms which are best applied depending on the problem at hand, the most common are: 

Multiple linear regression

Logistic regression 

Polynomial regression 

Ridge regression 

Lasso regression

ElasticNet regression 

Stepwise regression

We will cover some of these on future projects