# Machine Learning Algorithms - Predict Solutions

Complete the following functions using the Machine Learning techniques you have covered in the training notebooks.

## Pre-processing

### Import Data

In [40]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support as score

df = pd.read_csv('data.csv').drop('Unnamed: 0', axis=1)

In [41]:
matplotlib.__version__

'2.1.0'

### Pre-process Data

In [42]:
# Regression labels
y_r = df['target_return']

# Classification labels
y_c = df['target_return'].apply(lambda x: 1 if x > 0 else 0)

# Features
X = df.drop(['Date', 'company', 'target_return'], axis=1)

In [43]:
# Standardize data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_standardize = pd.DataFrame(X_scaled,columns=X.columns)

In [44]:
# Regression train/test split
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_standardize, y_r, test_size=0.3, random_state=101)

# Classification train/test split
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_standardize, y_c, test_size=0.3, random_state=101)

## Function 1

Write a function to return the intercept as a float (rounded to the nearest 3 integers) of a linear regression model

* Given the training features (X_train) and labels (y_train)

In [45]:
def lin_reg_intercept(X_train, y_train):
    
    "Returns intercept (float) of linear regression model"

    # Your code here
    lm = LinearRegression()
    lm.fit(X_train,y_train)
    intercept = lm.intercept_
    return intercept.round(3)
    

In [46]:
lin_reg_intercept(X_train_r, y_train_r)

0.027

## Function 2

Write a function to return the number of coefficients greater than 0 in a lasso model (as an integer)

* Given the training features (X_train) and labels (y_train)
* For a specific value of the regularisation parameter (alpha)

In [47]:
def lasso_predictors(X_train, y_train, alpha):
    
    "Returns number (integer) of coefficients in lasso model that are greater than 0"
    
    # Your code here
    lm = LinearRegression()
    lm.fit(X_train,y_train)
    lasso= Lasso(alpha=alpha)
    lasso.fit(X_train,y_train)
    lasso_coef = pd.DataFrame(lasso.coef_,index=X.columns,columns=['Lasso'])
    
        
    return len(lasso_coef[lasso_coef['Lasso']>0]) 
    

In [48]:
lasso_predictors(X_train_r, y_train_r, 0.005)

2

## Function 3

Write a function to return the mean squared error as a float (rounded to the nearest 3 integers) of a linear regression model 

* Given the training features (X_train) training labels (y_train), testing features (X_test) and testing labels (y_test)

In [58]:
def lnr_mse(X_train, y_train, X_test, y_test):
    
    "Returns the MSE (float) of a linear regression model"
     
    
    # Your code here
    lm = LinearRegression()
    lm.fit(X_train,y_train)
    predictions = lm.predict(X_test)
    
    return round(float(mean_squared_error(y_test, predictions)), 3)

In [60]:
lnr_mse(X_train_r, y_train_r, X_test_r, y_test_r)

0.032

## Function 4

Write a function to return the mean absolute error as a float (rounded to the nearest 3 integers) of a ridge regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)
* For a specific value of the regularisation parameter (alpha)

In [64]:
def ridge_mae(X_train, y_train, X_test, y_test, alpha):
    
    "Returns the MAE (float) of the ridge regression model"
    
    # Your code here;
    
    lr = Ridge(alpha=alpha)
    lr.fit(X_train, y_train)
    pred = lr.predict(X_test)
    
    return round(float(mean_absolute_error(y_test, pred)),3)

In [65]:
ridge_mae(X_train_r, y_train_r, X_test_r, y_test_r, 1)

0.096

## Function 5

Write a function to return the root mean squared error as a float (rounded to the nearest 3 integers) of a linear regression model

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [62]:
def lnr_rmse(X_train, y_train, X_test, y_test):
    
    "Returns the root mean squared error (float) of a linear regression model"
    
    # Your code here
    lasso = LinearRegression()
    lasso.fit(X_train,y_train)
    pred_lasso= lasso.predict(X_test)
    return round(float(mean_squared_error(y_test, pred_lasso)**0.5), 3)
    

In [63]:
lnr_rmse(X_train_c, y_train_c, X_test_c, y_test_c)

1.106

## Function 6

Write a function to return the highest coefficient in a logistic regression model as a float (rounded to the nearest 3 integers)

* Given the training features (X_train) and labels (y_train)

In [18]:
def highest_coef(X_train, y_train):
    
    "Returns the highest coefficient in a logistic regression model as a float (rounded to the nearest 3 integers)"
    
    # Your code here
    log_reg = LogisticRegression()
    log_reg.fit(X_train,y_train)
    log_reg_hg = [list(x) for x in list(log_reg.coef_)]
    for y in log_reg_hg:
        log_reg_hgh = max(y)
    
    return round(float(log_reg_hgh),3)
    

In [19]:
highest_coef(X_train_c, y_train_c)

0.977

## Function 7

Write a function to return the number of true positives (as an integer) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [68]:
def log_reg_tp(X_train, y_train, X_test, y_test):
    
    "Returns the number (integer) of true positives for a logistic regression model"
    
    # Your code here
    lr = LogisticRegression()
    lr.fit(X_train,y_train)
    lr_pred = lr.predict(X_test)
    return confusion_matrix(y_test,lr_pred)[0][0]
    

In [69]:
log_reg_tp(X_train_c, y_train_c, X_test_c, y_test_c)

16

## Function 8

Write a function to return the precision as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [70]:
def lgr_precision(X_train, y_train, X_test, y_test):
    
    "Returns the precision (float) for a logistic regression model"
    
    # Your code here
    lr= LogisticRegression()
    lr.fit(X_train,y_train)
    lr_pred = lr.predict(X_test)
    return round(float(score(y_test,lr_pred,average='weighted')[0]),3)
    
    

In [71]:
lgr_precision(X_train_c, y_train_c, X_test_c, y_test_c)

0.608

## Function 9

Write a function to return the f1-score as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [72]:
def lgr_f1_score(X_train, y_train, X_test, y_test):
    
    "Returns the f1-score (float) for the logistic regression model"
     
    # Your code here
    lr = LogisticRegression()
    lr.fit(X_train,y_train)
    lr_pred = lr.predict(X_test)
    return round(float(score(y_test,lr_pred,average='weighted')[2]),3)
    

In [73]:
lgr_f1_score(X_train_c, y_train_c, X_test_c, y_test_c)

0.577

## Function 10

Write a function to return a specific metric (precision, recall or f1-score) as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [77]:
def lgr_metric_output(X_train, y_train, X_test, y_test, metric):
    
    "Returns the chosen metric (float) for the logistic regression model"
    
    # Your code here
    if metric == 'F1_score':
        metric = 2
    elif metric == 'Precision':
        metric = 0
    elif metric == 'Recall':
        metric = 1
        
    lr= LogisticRegression()
    lr.fit(X_train,y_train)
    lr_pred = lr.predict(X_test)
    return round(float(score(y_test,lr_pred,average='weighted')[metric]),3)
    
    

In [78]:
lgr_metric_output(X_train_c, y_train_c, X_test_c, y_test_c, 'F1_score')

0.577