# Linear Regression
## Template for solution, Christian Igel, 2020
### Load and transform data 

#### Loading data using NumPy
First, we load the data using [NumPy](https://numpy.org/). First, we import the package:

In [1]:
import numpy as np

Now we load the data from the data file. The colums are separated by tab characters (a format called TSV, tab-separated values):

In [2]:
data = np.genfromtxt('PCB.dt', delimiter='\t')
print(data)

[[ 1.   0.6]
 [ 6.   3.4]
 [ 1.   1.6]
 [ 6.   9.7]
 [ 1.   0.5]
 [ 6.   8.6]
 [ 1.   1.2]
 [ 7.   4. ]
 [ 2.   2. ]
 [ 7.   5.5]
 [ 2.   1.3]
 [ 7.  10.5]
 [ 2.   2.5]
 [ 8.  17.5]
 [ 3.   2.2]
 [ 8.  13.4]
 [ 3.   2.4]
 [ 8.   4.5]
 [ 3.   1.2]
 [ 9.  30.4]
 [ 4.   3.5]
 [11.  12.4]
 [ 4.   4.1]
 [12.  13.4]
 [ 4.   5.1]
 [12.  26.2]
 [ 5.   5.7]
 [12.   7.4]]


Remember, if you would like to know more about a function, for example `np.genfromtxt`, you can call `?np.genfromtxt` in you notebook.

Now we split that data into inputs and labels. We would like the inputs to be represented as a matrix (and not just a vector), therefore we reshape them:

In [None]:
x = data[...,0].reshape(-1, 1)  # Take first column, and reshape it to 2D vector
y = data[...,1]  # Take second column
print("x:", x)
print("y:", y)

Note that you are supposed to transform the labels in the assignment.
#### Loading data using pandas
Alternatively, we can load the data using [pandas](https://pandas.pydata.org/). First, we import the package:

In [None]:
import pandas as pd

Now we load the data from the data file into a data frame and give the columns names: 

In [None]:
df = pd.read_csv("PCB.dt", sep='\t', header=None, names=['X', 'Y'])
df

The argument `header=None` indicates that the data file itself does not contain a header line with the column names.  Now we can do stuff like sorting according to the first value:

In [None]:
df = df.sort_values(by='X')

You get NumPy arrays from the data frame like this:

In [None]:
x = df['X'].to_numpy().reshape(-1, 1)
y = df['Y'].to_numpy()
print("x:", x)
print("y:", y)

Note that you are supposed to transform the labels in the assignment.

### Fit regression model
This you have to figure out yourself.

### Plotting results
We plot using [Matplotlib](https://matplotlib.org/):

In [None]:
import matplotlib.pyplot as plt

Here is an example how you can plot the measurements (note that in the assignment you should plot the logarithm of the PCB concentration):

In [None]:
fig, ax = plt.subplots()
ax.plot(x, y, 'o', label='measurements')
ax.set_xlabel('Age (yrs.)')
ax.set_ylabel('PCB Conc. (ppm)')
ax.legend();

Figures should be shown in your report, it is not sufficient to have them in a notebook. Thus, we save the figure:

In [None]:
fig.savefig('Assignment1_Question6_Plot1.pdf')