# Introduction

We will be using the scikit-learn library (Python) in our lab sessions throughout the term - https://scikit-learn.org/stable/. It is one of the well-used machine learning libraries. In today's lab, the goal is to familiarize yourself with the scikit-learn library as well as numpy (which is a very common library for mathematical tasks in Python - https://numpy.org/) and matplotlib (for creating plots - https://matplotlib.org/).

### Linear regression

Let's start with exploring linear regression modelling that we looked at in the Week 1 lecture.

#### Section 1

In [None]:
import numpy

random_seed = 1
rng =  numpy.random.default_rng(random_seed)
training_data_x = rng.random((4, 1))
training_labels_y = rng.random((4, 1))

print('A randomly generated n x d input data where n=4 and d=1, i.e. 4 data instances each with 1 feature: \n')
print(training_data_x)

#### Section 2

In [None]:
%matplotlib inline

from sklearn import linear_model
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Train a linear regression model
lr_model = linear_model.LinearRegression()
lr_model.fit(training_data_x, training_labels_y)
print("The weights:",  lr_model.coef_)
print("The bias:",  lr_model.intercept_)

# Check the performance of the model on the data used to train it
training_pred_y = lr_model.predict(training_data_x)
print("\n Mean squared error (training error): %.2f " % mean_squared_error(training_labels_y, training_pred_y))


# Plot data and model
plt.scatter(training_data_x, training_labels_y, color="blue")
plt.plot(training_data_x, training_pred_y, color="black", linewidth=3)

plt.xlabel('x', size=20)
plt.ylabel('y', size=20)
plt.title('Linear regression model visualization \n', size=20)

plt.show()

#### Section 3

In [None]:
# Check the performance of the model on test data not seen by the model in training
random_seed = 1
rng =  numpy.random.default_rng(random_seed)
test_data_x = rng.random((1, 1))
test_labels_y = rng.random((1, 1))
test_pred_y = lr_model.predict(test_data_x)
print("\n Mean squared error (test error): %.2f " % mean_squared_error(test_labels_y, test_pred_y))


# Plot data and model
plt.scatter(training_data_x, training_labels_y, color="blue")
plt.scatter(test_data_x, test_labels_y, color="red")
plt.plot(training_data_x, training_pred_y, color="black", linewidth=3)

plt.xlabel('x', size=20)
plt.ylabel('y', size=20)
plt.title('Linear regression model visualization \n', size=20)

plt.show()

### Try these out yourself

#### A. REPRODUCIBILITY
Trying running Section 1 without the random seed, i.e. comment out the lines *random_seed = 1* and replace *rng =  numpy.random.default_rng(random_seed)* with *rng =  numpy.random.default_rng()*:

```python
#random_seed = 1
rng =  numpy.random.default_rng()
```

1. Run it twice and take note of the values of *training_data_x* each time. What do you notice? What does that tell you?
2. Now put back the random seed (i.e. as it was originally). Also run twice and take note of the values of *training_data_x* each time. What do you notice? What does that tell you?
3. Change the value of the random seed from  1 to any integer value and run again, taking note of the value of *training_data_x*.


#### B. REGULARIZATION - RIDGE REGRESSION
Instead of the linear regression model (*linear_model.LinearRegression()*) used in Section 2, replace with a ridge regression model(*linear_model.Ridge()*) -  https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge:

```python
#lr_model = linear_model.LinearRegression()
alpha=1.0
lr_model = linear_model.Ridge(alpha=alpha)
```

1. Now run Sections 1 and 2 again. What do you notice?
2. Change *alpha* to a new non-negative value and run again. What do you notice?
3. Try a different alpha.


####  C. REGULARIZATION - LASSO REGRESSION 
Instead of the linear regression model (*linear_model.LinearRegression()*) used in Section 2, replace with a lasso regression model(*linear_model.Lasso()*) -  https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso:

```python
#lr_model = linear_model.LinearRegression()
alpha=1.0
lr_model = linear_model.Lasso(alpha=alpha)
```

1. Now run Sections 1 and 2 again. What do you notice?
2. Change *alpha* to a new non-negative value and run again. What do you notice?
3. Try a different alpha.
