# Assignment 8: Simple Linear Regression in a Class

In this assignment, you will create a class to facilitate Simple Linear Regression (SLR) analyses.  At the end, you will have created a tool you can easily apply to any dataset (with two columns of continuous data) to make predictions and visualize the "best fit" line produced by the SLR analysis.

 * Relevant textbook chapter: [Chapter 8: Classes (Defining New Kinds of Objects)](https://snakebear.science/08-Classes/toctree.html)
 * Also see the linear regression code written during class (link on Moodle)

### Saving Drafts and Submitting

Before you do anything else, execute the cell below. This will prompt you to log in and then save your work via an online submission system.

You can re-run the cell and to submit your work as many times as you want before the deadline. We will only grade your final submission.

Any time you want to submit your work, select "Save Notebook" in the File menu (or press the Save icon, or press <kbd>Ctrl+S</kbd>) and then execute the cell again.  The result will contain a link that you can use to check that your assignment has been submitted successfully.

*[Executing this may print some errors saying "Javascript Error: IPython is not defined"; those may safely be ignored.]*

In [None]:
# This cell is just for submitting your work.
# Each time you execute it, a copy of this notebook will be uploaded to the submission system.
from client.api.notebook import Notebook
ok = Notebook('A8.ok')
import os
if not os.path.exists(os.path.join(os.environ.get("HOME"), ".config/ok/auth_refresh")):
    ok.auth(force=True)
else:
    ok.auth(inline=True)
_ = ok.submit()

### Authorship and Resources Used
* Include your name here
* If you received any assistance from anyone else, state who you consulted and specifically how they helped
* If you used any other resources, state what they were and specifically how they helped, include links to the resources. [Markdown links use this formatting.](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#links)

***

## Imports

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

***

## Defining the Class

Define the class here, matching the following specification.  The specificaction is very detailed; make sure you follow it exactly!

As you develop the class, test each part of it *as you write it*.  Do not wait until the entire class is written to test it.  That is likely to result in far more trouble debugging and making things work than you would have in total if you test small pieces as you go.

### Class Specification

- Class name: `SimpleLinearRegression`
- ``__init__()`` method
  - Define ``__init__()`` to take three arguments:
    1. A Pandas DataFrame
    2. The column label of the predictor values (the independent variable)
    3. The column label of the outcome values (the dependent variable)
  - Store the three arguments in object attributes with descriptive names
- ``calc_fit()`` method
  - No arguments
  - No return value
  - Using the simple linear regression formulas presented in class, calculate the best-fit ``slope`` and ``intercept`` values for the columns that were specified when instantiating this object.
  - Store ``slope`` and ``intercept`` as attributes of the object.
- ``predict()`` method
  - One argument: a numeric value
  - Returns: the predicted value, calculated using the ``slope`` and ``intercept`` attributes
- ``plot()`` method
  - No arguments
  - No return value
  - Plots a scatterplot comparing the two columns that were specified when instantiating this object
  - Plots (over the scatterplot) a red line displaying the best-fit prediction (based on ``slope`` and ``intercept``)
  - Labels the x-axis and y-axis with the relevant column labels

In [None]:
# Define the SimpleLinearRegression class here

***

## Testing on simple data

It's always good to test something on simple cases first.  It's easier to spot what might be going wrong (if anything) when the inputs are simple.

The following cell creates a dataframe with two columns and three rows that will be used for the first test (do not change it).

In [None]:
df = pd.DataFrame([[1,4],[2,5],[3,5]], columns=["A","B"])
df

Using the class defined above, create a `SimpleLinearRegression` object to predict "B" values from "A" values using the data in this dataframe.

Using that object:
1. Calculate the best fit slope and intercept for the given data.
2. Print out the slope and the intercept.
3. Print a prediction for an "A" value of 100.
4. Plot the best fit line over a scatterplot of the data.

In [None]:
# Test SimpleLinearRegression on the dataframe defined above.

The output should look something like this, with the calculated slope and intercept values printed, the prediction for A=100 printed, and the plot shown:

<img src="test1_result.png" alt="Results of first test" width="400"/>

***

## Testing on real data

If that's working, let's try it out on some real data!

Load the `fandango_score_comparison.csv` file into a dataframe, then calculate and plot regression lines for the following pairs of columns:
- "RottenTomatoes" and "RottenTomatoes_User"
- "Metacritic" and "Metacritic_User"
- "Fandango_Stars" and "Fandango_Ratingvalue"
- "IMDB" and "Fandango_Ratingvalue"

Performing and plotting the regression for each pair should only take three new lines of code.  That's the beauty of defining classes: they can be defined once and reused many times (just like functions, but more powerful).

In [None]:
# Test SimpleLinearRegression on the movie ratings data.

***

## Testing on more real data

Now open another dataset into a new dataframe, then calculate and plot regression lines for at least three different pairs of columns in that data.  Make sure the plots are included in the notebook when you submit it.

You've made a tool you can apply to any two numeric columns you can find!