A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer in anywhere else other than where it says `YOUR CODE HERE`. Anything you write anywhere else will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_)

5. You are allowed to submit an assignment multiple times, but only the most recent submission will be graded.

# Problem 1. Linear Regression Using Seaborn

In this problem, we will use Seaborn to fit a linear regression model that predicts `AirTime` from `Distance`.

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# tools for testing
from nose.tools import assert_equal, assert_is_instance, assert_is_not
from numpy.testing import assert_array_equal

Suppose we are given the air times (in minutes) and distances traveled (in miles) of 20 different flights, and we want to visualize the relationship between the air time and distance by fitting a linear regression model. In the following code cell, `data` is given in the form of `pandas.DataFrame`.

```python
>>> print(data)
```
```
    AirTime  Distance
0        60       361
1        84       569
2        95       588
3       182      1172
4       337      2565
5       119       861
6        87       665
7       103       787
8        55       228
9        47       197
10      127       978
11      215      1745
12      213      1605
13       59       373
14       31       156
15       57       209
16       88       505
17       42       224
18       45       282
19      102       862
```

In [None]:
data = pd.DataFrame(
    {"AirTime": [60, 84, 95, 182, 337, 119, 87, 103, 55, 47,
        127, 215, 213, 59, 31, 57, 88, 42, 45, 102],
     "Distance": [361, 569, 588, 1172, 2565, 861, 665, 787, 228, 197,
        978, 1745, 1605, 373, 156, 209, 505, 224, 282, 862]}
)

In [None]:
print(data)

## Use Seaborn to plot a linear regression model

- Use [seaborn.regplot](http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.regplot.html) to write a function named `plot_seaborn_linear_regression()` that creates a scatter plot with `Distance` in the $x$-axis and `AirTime` in the $y$-axis. The function shuold also fit a linear regression model in the same plot. 

Here is an example plot. (You don't have to make your plot look exactly like this example. If your plot looks visually OK, and if the test code cell doesn't produce any errors, your solution is correct.)

![](seaborn_linear_regression.png)

Hints:

- The function should return an instance of [matplotlib.Axes](http://matplotlib.org/users/artists.html) object. Note `seaborn.regplot()` returns a matplotlib Axes instance, so you can assign the return value of the `seaborn.regplot()` function to a variable named `ax` and return this variable `ax`.

- You plot should also have a title and labels for the $x$ and $y$ axes. To do this, use one or more of the following: [ax.set()](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set), [ax.set_title()](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set_title), [ax.set_xlabel()](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set_xlabel), or [ax.set_ylabel()](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set_ylabel).

- If you are not sure how to do this, there is an example of using `seaborn.regplot()` in the [Introduction to Regression](https://github.com/UI-DataScience/accy571-fa16/blob/master/Week5/notebooks/intro2regress.ipynb) notebook.

In [None]:
def plot_seaborn_linear_regression(df):
    """
    Uses Seaborn to create a scatter plot of "AirTime" vs "Distance" columns in "df".
    Also fits a linear regression model in the same plot.
    
    Parameters
    ----------
    df: A pandas.DataFrame. Should have columns named "AirTime" and "Distance".
    
    Returns
    -------
    A matplotlib Axes object
    """
    
    # YOUR CODE HERE

    return ax

In [None]:
ax1 = plot_seaborn_linear_regression(data)

In [None]:
assert_is_instance(
    ax1, mpl.axes.Axes,
    msg="Your function should return a matplotlib.axes.Axes object."
)
assert_equal(len(ax1.lines), 1)
assert_equal(
    len(ax1.collections), 2,
    msg="Your plot doesn't have a regression line."
)
assert_is_not(
    len(ax1.title.get_text()), 0,
    msg="Your plot doesn't have a title."
)
assert_is_not(
    ax1.xaxis.get_label_text(), "Distance",
    msg="Change the x-axis label to something more descriptive."
)
assert_is_not(
    ax1.yaxis.get_label_text(), "AirTime",
    msg="Change the y-axis label to something more descriptive."
)
    
x, y = ax1.collections[0].get_offsets().T
assert_array_equal(x, data["Distance"].values)
assert_array_equal(y, data["AirTime"].values)

# If your function can only plot the delays and
# cannot handle other data sets, the following test will fail.
df1 = pd.DataFrame({
    "AirTime": np.random.randint(100, size=100),
    "Distance": np.random.randint(100, size=100)
    })
ax1 = plot_seaborn_linear_regression(df1)

x1data, y1data = ax1.collections[0].get_offsets().T
assert_array_equal(x1data, df1["Distance"].values)
assert_array_equal(y1data, df1["AirTime"].values)

plt.close()