# Project: Linear Regression

Reggie is a mad scientist who has been hired by the local fast food joint to build their newest ball pit in the play area. As such, he is working on researching the bounciness of different balls so as to optimize the pit. He is running an experiment to bounce different sizes of bouncy balls, and then fitting lines to the data points he records. He has heard of linear regression, but needs your help to implement a version of linear regression in Python.

_Linear Regression_ is when you have a group of points on a graph, and you find a line that approximately resembles that group of points. A good Linear Regression algorithm minimizes the _error_, or the distance from each point to the line. A line with the least error is the line that fits the data the best. We call this a line of _best fit_.

We will use loops, lists, and arithmetic to create a function that will find a line of best fit when given a set of data.


## Part 1: Calculating Error

In [2]:
def get_y(m, b, x):
    pass
    y = m*x +b
    return y

print(get_y(1, 0, 7))
print(get_y(5, 10, 3))


7
25


In [3]:
#Write your calculate_error() function here
def calculate_error(m,b,point):
    x_point = point[0]
    y_point = point[1]
    return abs(get_y(m, b, x_point)-y_point)

In [4]:
#this is a line that looks like y = x, so (3, 3) should lie on it. thus, error should be 0:
print(calculate_error(1, 0, (3, 3)))
#the point (3, 4) should be 1 unit away from the line y = x:
print(calculate_error(1, 0, (3, 4)))
#the point (3, 3) should be 1 unit away from the line y = x - 1:
print(calculate_error(1, -1, (3, 3)))
#the point (3, 3) should be 5 units away from the line y = -x + 1:
print(calculate_error(-1, 1, (3, 3)))

0
1
1
5


In [4]:
datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]

In [5]:
#Write your calculate_all_error function here
def calculate_all_error(m, b, points):
    total = 0
    for point in points:
        total = total + calculate_error(m, b, point)
    return total    

In [6]:
#every point in this dataset lies upon y=x, so the total error should be zero:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, 0, datapoints))

#every point in this dataset is 1 unit away from y = x + 1, so the total error should be 4:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, 1, datapoints))

#every point in this dataset is 1 unit away from y = x - 1, so the total error should be 4:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, -1, datapoints))


#the points in this dataset are 1, 5, 9, and 3 units away from y = -x + 1, respectively, so total error should be
# 1 + 5 + 9 + 3 = 18
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(-1, 1, datapoints))

0
4
4
18


## Part 2: Try a bunch of slopes and intercepts!


In [13]:
possible_ms = [m * 0.1 for m in range(-10,11)]

In [14]:
possible_bs = [b * 0.1 for b in range(-20,21)]

In [15]:
smallest_error = float("inf")
best_m = 0
best_b = 0

for m in possible_ms:
    for b in possible_bs:
        if calculate_all_error(m, b, datapoints) < smallest_error:
            best_m = m
            best_b = b
            smallest_error = calculate_all_error(m, b, datapoints)
            
print(best_m, best_b, datapoints)            

1.0 0.0 [(1, 1), (3, 3), (5, 5), (-1, -1)]


In [16]:
get_y(0.3, 1.7, 6)

3.5