# Machine Learning Algorithms - Predict Solutions

Complete the following functions using the Machine Learning techniques you have covered in the training notebooks.

## Pre-processing

### Import Data

In [808]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support as score

df = pd.read_csv('C:/Users/JF51/Desktop/data.csv').drop('Unnamed: 0', axis=1)

### Pre-process Data

In [809]:
# Regression labels
y_r = df['target_return']

# Classification labels
y_c = df['target_return'].apply(lambda x: 1 if x > 0 else 0)

# Features
X = df.drop(['Date', 'company', 'target_return'], axis=1)

In [810]:
# Standardize data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_standardize = pd.DataFrame(X_scaled,columns=X.columns)

In [811]:
# Regression train/test split
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_standardize, y_r, test_size=0.3, random_state=101)

# Classification train/test split
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_standardize, y_c, test_size=0.3, random_state=101)

## Function 1

Write a function to return the intercept as a float (rounded to the nearest 3 integers) of a linear regression model

* Given the training features (X_train) and labels (y_train)

In [812]:
def lin_reg_intercept(X_train, y_train):
    
    "Returns intercept (float) of linear regression model"

    # Your code here
    
    # Create linear regression object
    lm = LinearRegression()

    # Train the model using the training sets
    lm.fit(X_train, y_train)

    #Equation  Intercept
    int_lm = float(round(lm.intercept_,3))
    
    return int_lm
    
    

In [813]:
lin_reg_intercept(X_train_r, y_train_r)

0.027

## Function 2

Write a function to return the number of coefficients greater than 0 in a lasso model (as an integer)

* Given the training features (X_train) and labels (y_train)
* For a specific value of the regularisation parameter (alpha)

In [814]:
def lasso_predictors(X_train, y_train, alpha):
    
    "Returns number (integer) of coefficients in lasso model that are greater than 0"
    
    # Your code here
    
    # Create lasso object
    lasso = Lasso(alpha=alpha)
    
    # Train the model using the training sets
    lasso.fit(X_train,y_train)
    
    coef_lm = int(np.sum(lasso.coef_ > 0))
        
    return coef_lm
 

In [815]:
lasso_predictors(X_train_r, y_train_r, 0.005)

2

## Function 3

Write a function to return the mean squared error as a float (rounded to the nearest 3 integers) of a linear regression model 

* Given the training features (X_train) training labels (y_train), testing features (X_test) and testing labels (y_test)

In [816]:
def lnr_mse(X_train, y_train, X_test, y_test):
    
    "Returns the MSE (float) of a linear regression model"
    
    
    # Your code here
    
    # Create linear regression object
    lm = LinearRegression()
    
    # Create prediction object
    lm.fit(X_train, y_train)
    pred_lm = lm.predict(X_test)
    
    mse = float(round(mean_squared_error(y_test, pred_lm),3))
    
    return mse
    
    

In [817]:
lnr_mse(X_train_r, y_train_r, X_test_r, y_test_r)

0.032

## Function 4

Write a function to return the mean absolute error as a float (rounded to the nearest 3 integers) of a ridge regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)
* For a specific value of the regularisation parameter (alpha)

In [818]:
def ridge_mae(X_train, y_train, X_test, y_test, alpha):
    
    "Returns the MAE (float) of the ridge regression model"
    
    # Your code here
    
    # create ridge object
    ridge = Ridge(alpha=alpha)
    
    # Train the model using the training sets
    ridge.fit(X_train,y_train)
    
    # Create prediction object
    pred_rd = ridge.predict(X_test)
    
    # the MAE of the ridge regression model
    mae = float(round(mean_absolute_error(y_test, pred_rd),3))
    
    return mae

In [819]:
ridge_mae(X_train_r, y_train_r, X_test_r, y_test_r, 1)

0.096

## Function 5

Write a function to return the root mean squared error as a float (rounded to the nearest 3 integers) of a linear regression model

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [820]:
def lnr_rmse(X_train, y_train, X_test, y_test):
    
    "Returns the root mean squared error (float) of a linear regression model"
    
    # Your code here
    
    # Create logistic regression object
    lm  = LinearRegression()
    
    # Train the model 
    lm.fit(X_train,y_train)
    
    # Create prediction object
    pred_lm = lm.predict(X_test)
    
    # set the root mean squared error of a linear regression model
    rmse = float(round(np.sqrt(mean_squared_error(y_test, pred_lm)),3))
    
    return rmse
    

In [821]:
lnr_rmse(X_train_c, y_train_c, X_test_c, y_test_c)

1.106

## Function 6

Write a function to return the highest coefficient in a logistic regression model as a float (rounded to the nearest 3 integers)

* Given the training features (X_train) and labels (y_train)

In [822]:
def highest_coef(X_train, y_train):
    
    "Returns the highest coefficient in a logistic regression model as a float (rounded to the nearest 3 integers)"
    
    # Your code here
    
    # Create logistic regression object
    log_model = LogisticRegression()
    
    # Train the model 
    log_model.fit(X_train, y_train)
    
    # Get the list from Equation coefficient
    coef_list = list(log_model.coef_)
    
    # get the maximum value from the list
    highest_coef_lrm = max(coef_list)
    
    highest_lrm_coef = (max(highest_coef_lrm))
    
    coef_logmodel = float(round(highest_lrm_coef,3))
    
    return coef_logmodel
    
    

In [823]:
highest_coef(X_train_c, y_train_c)

0.977

## Function 7

Write a function to return the number of true positives (as an integer) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [824]:
def log_reg_tp(X_train, y_train, X_test, y_test):
    
    "Returns the number (integer) of true positives for a logistic regression model"
  
    # Your code here
    
    # Create logistic regression object
    log_model = LogisticRegression()
    
    # Train the model
    ls = log_model.fit(X_train, y_train) 
    
    # Create prediction object
    pred_lm = ls.predict(X_test)
    
    lgm = confusion_matrix(y_test,pred_lm).ravel()
    

    int_logmodel = int(lgm[0])
    
    return int_logmodel
    

In [825]:
log_reg_tp(X_train_c, y_train_c, X_test_c, y_test_c)

16

## Function 8

Write a function to return the precision as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [843]:
def lgr_precision(X_train, y_train, X_test, y_test):
    
    "Returns the precision (float) for a logistic regression model"
    
    # Your code here
    
    # Create logistic regression object
    lm = LogisticRegression()
    
    # Train the model 
    lm_train = lm.fit(X_train, y_train)
    
    #Predict Output
    predicted = lm_train.predict(X_test)
    
    # Set the precision for a logistic regression model
    prec_lrm =  float(round(score(y_test, predicted,average='weighted')[0],3))
    
    print(score(y_test, predicted,average='weighted'))
    
    return prec_lrm
    

In [844]:
lgr_precision(X_train_c, y_train_c, X_test_c, y_test_c)

(0.6082424263036621, 0.6055045871559633, 0.5767404460982443, None)


0.608

## Function 9

Write a function to return the f1-score as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [828]:
def lgr_f1_score(X_train, y_train, X_test, y_test):
    #from sklearn.metrics import f1_score
    
    "Returns the f1-score (float) for the logistic regression model"
     
    # Your code here
    
    # Create logistic regression object
    lm = LogisticRegression()
    
    # Train the model 
    lm_train = lm.fit(X_train, y_train)
    
    #Predict Output
    predicted = lm_train.predict(X_test)
    
    # Set the precision for a logistic regression model
    prec_lrm =  float(round(score(y_test, predicted,average='weighted')[2],3))
    
    return prec_lrm

In [829]:
lgr_f1_score(X_train_c, y_train_c, X_test_c, y_test_c)

0.577

## Function 10

Write a function to return a specific metric (precision, recall or f1-score) as a float (rounded to the nearest 3 integers) of a logistic regression model 

* Given the training features (X_train), training labels (y_train), testing features (X_test) and testing labels (y_test)

In [841]:
def lgr_metric_output(X_train, y_train, X_test, y_test, metric):
    
    
    "Returns the chosen metric (float) for the logistic regression model"
     
    # Your code here
    
    if metric == 'F1_score':
        metric = 2
        
    elif metric == 'precision':
        metric = 0
        
    elif metric == 'recall':
        metric = 1
        
    else:
        return  
       
             
    # Create logistic regression object
    lm = LogisticRegression()
    
    # Train the model 
    lm_train = lm.fit(X_train, y_train)
    
    #Predict Output
    predicted = lm_train.predict(X_test)
    
    # Set the precision for a logistic regression model
    
  
            
    prec_lrm = float(round(score(y_test, predicted,average="weighted")[metric],3))


    return prec_lrm
    
    

In [842]:
lgr_metric_output(X_train_c, y_train_c, X_test_c, y_test_c, 'F1_score')

0.577