# Make Predictions with Linear Regression
## Mini-Lab: Linear Regression

Welcome to your next mini-lab! Go ahead an run the following cell to get started. You can do that by clicking on the cell and then clickcing `Run` on the top bar. You can also just press `Shift` + `Enter` to run the cell.

In [1]:
from datascience import *
import numpy as np
import otter

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

grader = otter.Notebook("m10_l1_tests")

For this mini-lab and the next mini-lab, we'll be looking at something a bit lighter when compared to COVID-19 data. We'll instead be looking (and trying to establish correlation) between the the various scores of students who took the SAT in 2014. Run the next cell to import this data.

In [2]:
sat = Table().read_table("../datasets/sat2014.csv").select("Critical Reading", "Math", "Writing", "Combined")
sat.show(5)

Critical Reading,Math,Writing,Combined
612,620,584,1816
599,616,587,1802
605,611,578,1794
604,609,579,1792
598,610,578,1786


Next we'll be recreating the standard set of statistical functions that will be used for linear regression. First up is the `standard_units` and `correlation` functions. The `standard_units` function converts an array of numbers into...well, standard units! The `correlation` function utilizes the `standard units` function in order find the correlation coefficient between two different arrays - the `x_array` and the `y_array`. Implement these functions below!

In [3]:
def standard_units(array):
    return (array - np.mean(array))/np.std(array)


def correlation(x_array, y_array):
    return np.mean(standard_units(x_array)*standard_units(y_array))

In [None]:
grader.check("q1")

Next up is the `slope` and `intercept` functions which calculate the slope and intercept between two arrays. Again, they take the `x_array` and `y_array` as input and utilize the `correlation` function that you implemented above. Continue implementing these functions in the cell below.

In [4]:
def slope(x_array, y_array):
    r = correlation(x_array, y_array)
    return r*np.std(y_array)/np.std(x_array)


def intercept(x_array, y_array):
    return np.mean(y_array) - slope(x_array, y_array)*np.mean(x_array)

In [None]:
grader.check("q2")

Finally we'll be putting all of this together in order to predict values given a certain x! Fill in the missing code for the `regression_line` function. This function may seem a little strange - there's a function within a function! But don't worry too much about how it's strucutred, as long as `a` and `b` are assigned correctly, the rest of the lab should flow smoothly.

*Note*: You may have noticed that we used functions inside functions before, specifically in the bootstrapping and hypothesis testing labs. These are examples of [higher-order functions](https://en.wikipedia.org/wiki/Higher-order_function)!

In [5]:
def regression_line(x_array, y_array):
    a = slope(x_array, y_array)
    b = intercept(x_array, y_array)
    
    def prediction_function(x_value):
        return (a * x_value) + b
    
    return prediction_function

In [None]:
grader.check("q3")

Last but not least, we'll be setting up our regression-line so that we can start predicting points. Replace the `...` below with the columns present in the `sat` table that interest you. After doing this, run the cell below to set up our prediction function.

In [6]:
x_array = sat.column("Critical Reading")
y_array = sat.column("Writing")

predict = regression_line(x_array, y_array)

Now start predicting! Feel free to change around the columns above as well as the prediction below.

In [7]:
predict(720)

693.5580515908565

Now that we have set up a prediction function, are there any limits to this function? For example, what if we input a score out of range? Does this data actually mean anything? What if the output is out of range, what would we do then? Linear regression is an amazing and powerful tool but like everything else in life it isn't perfect. Nonetheless, it's a basis of data science and rightfully so. Congratulations on finishing! Run the next cell to make sure that you passed all of the test cases.

In [None]:
grader.check_all()