# Lecture 1 Exercises
The following exercises are intended to complement the lecture.  They can be used to clarify important steps from the lecture.  These exercises are not graded.  However, you are strongly encouraged to do them.

In the first exercise below, we give you a lot of the necessary code to help familiarize you with `python` and its primary libraries.  You still have to fill in some of the code on your own.

## Exercise 1
Lecture contained an example of the $k$ nearest neighbors algorithm for a prediction problem.  The initial example considered $k=1$ and was applied to the advertising dataset in an attempt to predict sales from TV budgets.  You should reproduce the results from the lecture.  In particular, generate the following figure:

![](fig/Reg_4.png)

### Part 1:  Load Basic `Python` Modules

In [None]:
# Access to pandas dataframes
import pandas as pd # (see https://pandas.pydata.org/ for documentation)

# Access to matplotlib for plotting capabilities
# The inline command below allows you to display plots 
# in the notebook.  Without this command, the plots wouldn't 
# show up
%pylab inline
import matplotlib.pylab as plt # (see https://matplotlib.org/ for documentation)

# Access to numpy for basic numerical algorithms
import numpy as np # (see http://www.numpy.org/ for documentation)

### Part 2:  Load data

In [None]:
df_adv = pd.read_csv('data/Advertising.csv') # Read data

Now get a subset of the data.  The code below is incomplete and will not work.  You need to introduce a stride of data to access.  For example, if you want to access indices $2$ through $5$ of an array called `x`, you would write 
```python
x[2:5]
```
Note that if you just wanted to access index $2$, you would write 
```python
x[2]
```
In place of `your_stride1` below, you should introduce your own stride.  Keep it small (e.g. 5-10 numbers).

In [None]:
data_x = df_adv.TV[your_stride] # Get the TV budget
data_y = df_adv.sales[your_stride] # Get the sales

### Part 3:  Sort the Data
We need to sort the data in order to apply the KNN algorithm.

In [None]:
# Sort the array and get the values
idx = np.argsort(data_x).values

# Reorder the x and y data frames
data_x  = data_x.iloc[idx]
data_y  = data_y.iloc[idx]

### Part 4:  Write a function to find the nearest neighbor to a point
#### Input:  The array of values and the point you want to check
#### Output:  The index of the nearest neighbor *and* the value of the array at that index
This function must be your own function.  **Do not use any external libraries to do this.**

In [None]:
def find_nearest(array, xi):
    # Your code here

### Part 5:  Predict!
Use your `find_nearest()` function to predict missing values.  First, generate an array of $x$ values  These values correspond to the TV budget, but they might not be in your data array.  You want to predict the sales at each value of $x$ using the nearest neighbor.  

In [None]:
# Your code here

### Part 6:  Plot
Now you will plot your results using `matplotlib`.

In [None]:
plt.plot(x,y, '-.') # Basic plotting with a dash-dot line
plt.plot(df_adv.TV[5:13], df_adv.sales[5:13], 'kx') # Plot the data points (black x markers)
plt.title('') # Leave title blank
plt.xlabel('TV budget in $1000') # Label the x-axis
plt.ylabel('Sales in $1000') # Label the y-axis

plt.savefig('nearest-neighbor.png',dpi=300) # save the figure

## Exercise 2

The lecture covered basic linear regression and tried to minimize the loss function given by $$L\left(\beta_{0}, \beta_{1}\right) = \frac{1}{n}\sum_{i=1}^{n}{\left(y_{i} - \left(\beta_{1}x_{1} + \beta_{0}\right)\right)^{2}}.$$

### Part 1:  Derive the result from lecture that
$$\widehat{\beta}_{1} = \dfrac{\displaystyle\sum_{i=1}^{n}{\left(x_{i} - \overline{x}\right)\left(y_{i} - \overline{y}\right)}}{\displaystyle\sum_{i=1}^{n}{\left(x_{i} - \overline{x}\right)^{2}}}$$ and $$\widehat{\beta}_{0} = \overline{y} - \beta_{1}\overline{x}$$
minimize the loss function where $$\overline{x} = \frac{1}{n}\sum_{i=1}^{n}{x_{i}} \quad \textrm{and} \quad \overline{y} = \frac{1}{n}\sum_{i=1}^{n}{y_{i}}.$$

### Part 2:  Plots
Fix $\beta_{0}$ and plot $$L\left(\beta_{1}\right)$$ using the sales data from Exercise 1.

For reference, using $$\beta_{0} = 7.104,$$ I got the following plot for $$L:$$ ![](fig/Exercise1-2.png)