## Week 1: Introduction

Instructor: Cornelia Ilin <br>
Email: cilin@ischool.berkeley.edu <br>

#### ``Objectives``

1. Introduce you to a typical workflow for using Machine Learning in predictive modeling.

2. Read through the commands, try making changes, and make sure you understand how the functions work.

3. Focus on making your code as organized and readable as possible. Use lots of comments!!

#### ``Motivation``

1. Machine learning is an exciting field that can help you turn massive data into knowledge.

2. Use powerful algorithms to learn patterns from data and make predictions about future events.

3. Easyto break into the field, thanks to the many open source libraries (sklearn, tensorflow, etc.)

#### ``Data``

1. Generated using numpy
2. Size = (200, 1)

---

### Step 1: Import packages

In [2]:
# standard 
import numpy as np

# plots
import matplotlib.pyplot as plt
import seaborn as sb

# images
from IPython.display import Image

# prediction
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

# This tells matplolib not to try opening a new window for each plot
%matplotlib inline

# silence warnings
import warnings
warnings.filterwarnings('ignore')

# set working directory (CHANGE HERE)
import os
os.chdir('/Users/cilin/Postdoc/teaching/cilin-coursework2/Live_Sessions/week01/')

### Step 2: Set working directories

In [None]:
# ADD HERE

### Step 3: Define functions

In [None]:
# ADD HERE

### Step 4: Read data

This time we will generate our own data, y and X, by using a random number generator.

generate X:

In [None]:
# set a randomizer seeds (this will ensure the results are the same each time)
np.random.seed(100)

# set len(X)
len_X = 50

# generate evenly spaced X values in [0, 1]. Set len(X) = 20
X = np.linspace(0, 1, len_X)

In [None]:
print("Question 1: What is the data type of X?")
print(type(X))
print(X)

generate y:

In [None]:
# create a "true" function (a piece of a cosine curve) that we will try to approximate with a model
true_function = lambda x: np.cos(1.5 * np.pi * x)

# try this function out. Notice that you can apply it to a scalar, an array, or you can use it even in pandas
print(true_function(0))
print(true_function(0.5))
print(true_function(np.array([0, 0.5])))

In [None]:
# generate true y values
y = true_function(X)

# print the values of y to the nearest hundredth
print (['%.2f' %i for i in y])

In [None]:
# add random noise to y
# the randn function samples random numbers from the standard Normal distribution
# multiplying adjusts the standard deviation of the distribution
noise = np.random.randn(len_X) * 0.2
y += noise

# print the noise-added values of y for comparison.
print (['%.2f' %i for i in y])

Next, we want to predict y, using the feature vector X. 

In this course, our outputs (y) will always be 1-dimensional. Our inputs (X) will usually have more than 1 dimension. Today, for simplicity, we have just a single feature. 

Since the machine learning classes in sklearn expect input feature vectors, we need to turn each input x in X into a feature vector [x].

### Step 5: Preprocess data

``labels and features``

y -> labels <br>
X -> features (1 in this case)

In [None]:
# transform X into a vector (an alternative command is X = np.transpose(X))
X = X[:, np.newaxis]

``create training and test sets``

In [None]:
# split data
X_train, X_test, y_train, y_test = train_test_split(
       X, y, test_size=0.30
)

# print size
print('Size of X_train', X_train.shape)
print('Size of y_train', y_train.shape)

print('Size of X_test', X_test.shape)
print('Size of y_test', y_test.shape)

### Step 6: Learning Model

###### Linear model

In [None]:
# model fit
lm = LinearRegression(fit_intercept = True)
lm.fit(X_train, y_train)
print ('Estimated function: y = %.2f + %.2fx' %(lm.intercept_, lm.coef_[0]))

Approximating a cosine function with a linear model doesn't work so well. By adding polynomial transformations of our feature(s), we can fit more complex functions. This is often called polynomial (nonlinear) regression. 

###### Nonlinear model (poly degree==4)

In [None]:
# create polinomial transformations
poly = PolynomialFeatures(degree=4, include_bias=False)
X4_train = poly.fit_transform(X_train)
print(X4_train[0:10])

In [None]:
# model fit
lm4 = LinearRegression(fit_intercept=True)
lm4.fit(X4_train, y_train)

print ('Estimated function: y = %.2f + %.2fx + %.2fx^2 + %.2fx^3 + %.2fx^4' %(lm4.intercept_, lm4.coef_[0], lm4.coef_[1], lm4.coef_[2], lm4.coef_[3]))

###### Nonlinear model (poly degree==15)

In [None]:
# create polinomial transformations
poly = PolynomialFeatures(degree=15, include_bias=False)
X15_train = poly.fit_transform(X_train)
print(X15_train[0:3])

In [None]:
# model fit
lm15 = LinearRegression(fit_intercept=True)
lm15.fit(X15_train, y_train)
print('Print intercept:', lm15.intercept_)
print('\nPrint slope coefficients:', lm15.coef_)

<span style="color:orange">What is the estimated function?</span>

### Step 7: Evaluation

In [None]:
degrees = [1, 4, 15]

# Initialize a new plot and set plot size
plt.figure(figsize=(14, 4)) 

for i in range(len(degrees)):
    # create sublots that are all on the same row
    ax = plt.subplot(1, len(degrees), i+1)
    
    # create the polynomial feature vector (or matrix)
    poly = PolynomialFeatures(degree = degrees[i], include_bias = False)
    temp_X_train = poly.fit_transform(X_train)
    temp_X_test = poly.fit_transform(X_test)
    
    # model fit
    lm = LinearRegression()
    lm.fit(temp_X_train, y_train)
    lm_yhat_train = lm.predict(temp_X_train)
    lm_yhat_test = lm.predict(temp_X_test)
    
    
    # plot the true function
    #sb.lineplot(np.squeeze(X_train), np.squeeze(true_function(X_train)), label="True function");
    
    # plot the true function with noise added
    sb.scatterplot(np.squeeze(X_train), y_train, label="Function with noise");

    # Show the fitted function for the linear model using training data
    sb.lineplot(np.squeeze(X_train), lm_yhat_train, color='black', label='Evaluation, train data')
    
    sb.lineplot(np.squeeze(X_test), lm_yhat_test, color='red', label='Evaluation, test data')

    
    # Add labels, title, legend to the plot
    plt.xlabel("x")
    plt.ylabel("y")
    plt.xlim((-.05, 1.05))
    plt.ylim((-2, 2))
    plt.legend(loc="best")
    plt.title("Degree %d" %degrees[i])
   

### Conclusions

The lesson here is that we are interested in the model that generalizes well. 

Clearly, the degree 1 model, while very small (only 2 parameters), doesn't fit the observed training data well. The degree 15 model fits the observed training data extremely well, but is unlikely to generalize to new (test) data.

This is a case of "over-fitting", which often happens when we try to estimate too many parameters from just a few examples. The degree 4 model appears to be a good blend of small model size and good generalization.

---
Exaplain what we did in this notebook by using the "Roadmap for building ML systems"

In [None]:
Image(filename='./images/roadmap_ml_systems.png', width = 600)