## Regression models

Use the following datasets:
1. [CPU Computer Hardware](https://archive.ics.uci.edu/ml/datasets/Computer+Hardware); exclude from the dataset the columns: vendor name, model name, estimated relative performance; the "published relative performance" column will be estimated.
1. [Boston Housing](http://archive.ics.uci.edu/ml/machine-learning-databases/housing/)
1. [Wisconsin Breast Cancer](http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html); search in the left panel for Wisconsin Breast Cancer and follow the steps in "My personal Notes"
1. [Communities and Crime](http://archive.ics.uci.edu/ml/datasets/communities+and+crime); delete the first 5 dimensions and the features with missing values.

For each data set, apply at least 5 regression models from scikit learn. For each report: mean absolute error, mean squared error, median absolute error - see [sklearn.metrics](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics) - using 5 fold cross validation. The hyperparameter values must be searched with grid search (cv=3) and random search (n_iter given by you). The metric used to optimize the hyperparameters will be mean squared error. Report the average results for both the training folds and the test folds; hint: you can use the `cross_validate' method with the parameter `return_train_score=True', and as a model an object of type `GridSearchCV' or `RandomizedSearchCV'.

The results will be passed in a dataframe. In an intermediate state, the values will be calculated with a minus sign: for implementation reasons, the sklearn library transforms the scores into negative numbers; see image below:

![intermediate report](./images/cpu_intermediate_blurred.png)


The values will be brought to the positive range, then the maximum and minimum values will be marked; indicative, the image below can be used, representing the dataframe displayed in the notebook; you can use other styling variants on the dataframe as at https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#.

A final report will be created in HTML or PDF format - separate file(s). The report must contain at least: the name of the data set and the dataframe object; it is preferable to keep the color marking made in the notebook.

![report](./images/cpu_results_blurred.png)

Rating:
1. by defaultare awarded ex officio.
1. Optimization and quantification of model performance: 3 points for each data set combination + model = 60 points
1. Model documentation: number of models * 2 points = 10 points. Document in jupyter notebook each of the models used, in Romanian. You can make a separate section with the documentation of the algorithms. Each model must have a description of at least 20 lines, at least one associated image and at least 2 bibliographic references.
1. 10 points: export in HTML or PDF format.

In [4]:
import numpy as np
import pandas as pd
from typing import List, Dict
from dominate.tags import *
from dominate.util import raw
from sklearn.datasets import fetch_openml
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.model_selection import cross_val_score, cross_validate, train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import BayesianRidge

In [13]:
def lin_reg_params() -> Dict[str,List[bool]]:
    """
    Create a dictionary with parameter names as keys, and as values,
    lists of possible values; for Linear Regression
    return: the dictionary
    """
    copy_X:List[bool] = [True,False]
    fit_intercept:List[bool] = [True,False]
    positive:List[bool] = [True,False]
    
    return dict(copy_X=copy_X, fit_intercept=fit_intercept, positive=positive)

def lasso_ridge_elastic_params() -> Dict[str,List]:
    """
    Create a dictionary with parameter names as keys, and as values,
    lists of possible values; for Lasso, Ridge and Elastic Net
    return: the dictionary
    """
    alpha:List[float] = [1.0,1.1,1.2,1.3,1.4,1.5]
    fit_intercept:List[bool] = [True,False]
    copy_X:List[bool] = [True,False]
    max_iter:List[int] = [1000,1100,1200,1300,1400,1500]
    
    return dict(alpha=alpha, fit_intercept=fit_intercept, copy_X=copy_X, max_iter=max_iter)

def bayesian_ridge_params() -> Dict[str,List]:
    """
    Create a dictionary with parameter names as keys, and as values,
    lists of possible values; for Bayesian Ridge
    return: the dictionary
    """
    alpha_1:List[float] = [1.e-5,1.e-6]
    alpha_2:List[float] = [1.e-5,1.e-6]
    lambda_1:List[float] = [1.e-5,1.e-6]
    lambda_2:List[float] = [1.e-5,1.e-6]
    
    return dict(alpha_1=alpha_1, alpha_2=alpha_2, lambda_1=lambda_1, lambda_2=lambda_2)

def get_errors(model, data:np.ndarray, target:np.ndarray) -> None:
    """
    Displays mean absolute error, mean squared error, median absolute error for a specific regression model,
    for a given data set
    param model: the regression model
    param date: the date of the dataset
    param target: the dataset target
    """
    neg_mean_abs_err:List[float] = cross_val_score(model, data, target, scoring='neg_mean_absolute_error', cv=5)
    neg_mean_sqr_err:List[float] = cross_val_score(model, data, target, scoring='neg_mean_squared_error', cv=5)
    neg_median_abs_err:List[float] = cross_val_score(model, data, target, scoring='neg_median_absolute_error', cv=5)
    print(f'Negative mean absolute errors for {model} are {neg_mean_abs_err}')
    print(f'Negative mean squared errors for {model} are {neg_mean_sqr_err}')
    print(f'Negative median absolute errors for {model} are {neg_median_abs_err}')

def grid_search(model, param_grid:Dict[str,List], data:np.ndarray, target:np.ndarray) -> None:
    """
    It calculates the optimal values of the hyperparameters of a chosen regression model, for a certain data set
    param model: the regression model
    param param_grid: dictionary with parameter names as key value and a set of values
    param date: the date of the dataset
    param target: the dataset target
    """
    grid_src = GridSearchCV(estimator = model, param_grid=param_grid, scoring='neg_mean_squared_error', cv=3, return_train_score=True)
    grid_src.fit(data, target)
    
    return grid_src
    
def randomized_search(model, param_distributions:Dict[str,List], data:np.ndarray, target:np.ndarray) -> None:
    """
    It calculates the optimal values of the hyperparameters of a chosen regression model, for a certain data set
    param model: the regression model
    param param_distributions: dictionary with parameter names as key value and a set of values
    param date: the date of the dataset
    param target: the dataset target
    """
    randomized_src = RandomizedSearchCV(estimator = model, param_distributions=param_distributions, n_iter = 6, scoring='neg_mean_squared_error', cv=3, return_train_score=True)
    randomized_src.fit(data, target)
    
    return randomized_src

def get_mean_values(model, data:np.ndarray, target:np.ndarray) -> None:
    """
    Function that determines the averages of the results for both training and testing folds
    param model: the chosen model
    param date: the date of the dataset
    param target: the dataset target
    """
    res_mean_abs_err:List[float] = cross_validate(model, data, target, cv=5, return_train_score=True, scoring='neg_mean_absolute_error')
    res_mean_sqr_err:List[float] = cross_validate(model, data, target, cv=5, return_train_score=True, scoring='neg_mean_squared_error')
    res_median_abs_err:List[float] = cross_validate(model, data, target, cv=5, return_train_score=True, scoring='neg_median_absolute_error')
    print('The mean absolute error for train scores is ', res_mean_abs_err['train_score'].mean())
    print('The mean absolute error for test scores is ', res_mean_abs_err['test_score'].mean())
    print('The mean squared error for train scores is ', res_mean_sqr_err['train_score'].mean())
    print('The mean squared error for test scores is ', res_mean_sqr_err['test_score'].mean())
    print('The median absolute error for train scores is ', res_median_abs_err['train_score'].mean())
    print('The median absolute error for test scores is ', res_median_abs_err['test_score'].mean())
    

In [14]:
linear_reg = LinearRegression()
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=1.0)
bayesian_ridge = BayesianRidge()
elastic_net = ElasticNet(random_state=0)

In [15]:
computer_hardware:str = './data/machine.data'
names_computer_hardware:List[str] = ['vendor_name', 'model_name', 'myct', 'mmin', 'mmax', 'cach', 'chmin', 'chmax', 'prp', 'erp']
computer_hardware_data = pd.read_csv(computer_hardware, names=names_computer_hardware)
computer_hardware_data = computer_hardware_data.values[:, 2:-1]

X1:np.ndarray = computer_hardware_data[:, :-1]
y1:np.ndarray = computer_hardware_data[:, -1]
y:float = X1*2+3

linear_reg.fit(X1, y)
ridge.fit(X1, y1)
lasso.fit(X1,y1)
bayesian_ridge.fit(X1,y1)
elastic_net.fit(X1,y1)

get_errors(ridge, X1, y1)
grid_search(ridge, lasso_ridge_elastic_params(), X1, y1)
randomized_search(ridge, lasso_ridge_elastic_params(), X1, y1)
get_mean_values(ridge, X1, y1)

Negative mean absolute errors for Ridge() are [-61.3399541  -31.95820468 -28.01434149 -35.29783247 -60.26591349]
Negative mean squared errors for Ridge() are [ -7132.22717357  -2313.54305212  -1501.78912702  -2325.37181398
 -18642.63089465]
Negative median absolute errors for Ridge() are [-42.57144874 -22.57109952 -22.08688641 -23.87018445 -24.16193043]
The mean absolute error for train scores is  -36.69557755818259
The mean absolute error for test scores is  -43.37524924729828
The mean squared error for train scores is  -3243.698631843855
The mean squared error for test scores is  -6383.112412268875
The median absolute error for train scores is  -25.581468311412646
The median absolute error for test scores is  -27.052309909141457


In [32]:
print(y2)

['N', 'P', 'N', 'N', 'N', ..., 'P', 'P', 'N', 'P', 'P']
Length: 506
Categories (2, object): ['N', 'P']


In [34]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
boston_housing = fetch_openml(name='boston', version=2, parser='auto')
X2:np.ndarray = boston_housing.data.values
y2_not_encoded:np.ndarray = boston_housing.target.values
y2 = label_encoder.fit_transform(y2_not_encoded)
    
linear_reg.fit(X2, y2)
ridge.fit(X2, y2)
lasso.fit(X2,y2)
bayesian_ridge.fit(X2,y2)
elastic_net.fit(X2,y2)

get_errors(linear_reg, X2, y2)
grid_search(linear_reg, lin_reg_params(), X2, y2)
randomized_search(linear_reg, lin_reg_params(), X2, y2)
get_mean_values(linear_reg, X2, y2)

Negative mean absolute errors for LinearRegression() are [-0.32463162 -0.22774819 -0.29398559 -0.36434586 -0.32803842]
Negative mean squared errors for LinearRegression() are [-0.14117549 -0.08174485 -0.13988585 -0.20216243 -0.13367833]
Negative median absolute errors for LinearRegression() are [-0.29534526 -0.18243383 -0.255877   -0.35681389 -0.32967472]
The mean absolute error for train scores is  -0.2734108950722879
The mean absolute error for test scores is  -0.3077499343998152
The mean squared error for train scores is  -0.11105430219797721
The mean squared error for test scores is  -0.13972938930803822
The median absolute error for train scores is  -0.24264357312890433
The median absolute error for test scores is  -0.2840289409390695


In [40]:
breast_cancer = load_breast_cancer()
X3:np.ndarray = breast_cancer.data
y3:np.ndarray = breast_cancer.target
y:float = X3*2+3
    
linear_reg.fit(X3, y)
ridge.fit(X3, y3)
lasso.fit(X3,y3)
bayesian_ridge.fit(X3,y3)
elastic_net.fit(X3,y3)

get_errors(lasso, X3, y3)
grid_search(lasso, lasso_ridge_elastic_params(), X3, y3)
randomized_search(lasso, lasso_ridge_elastic_params(), X3, y3)
get_mean_values(lasso, X3, y3)

Negative mean absolute errors for Lasso() are [-0.33221043 -0.28178556 -0.22624776 -0.24051973 -0.23382699]
Negative mean squared errors for Lasso() are [-0.1791947  -0.12221632 -0.08211334 -0.08168521 -0.09210719]
Negative median absolute errors for Lasso() are [-0.22588341 -0.21865268 -0.18394417 -0.20790057 -0.19130003]
The mean absolute error for train scores is  -0.25245518215654095
The mean absolute error for test scores is  -0.2629180933980965
The mean squared error for train scores is  -0.1025975088287813
The mean squared error for test scores is  -0.111463351408519
The median absolute error for train scores is  -0.20084023571611875
The median absolute error for test scores is  -0.2055361744872072


In [41]:
attributes = pd.read_csv('./data/attributes.csv', delim_whitespace = True)
communities_crimes = pd.read_csv('./data/communities.data', names = attributes['attributes'])
communities_crimes.head()

Unnamed: 0,state,county,community,communityname,fold,population,householdsize,racepctblack,racePctWhite,racePctAsian,...,LandArea,PopDens,PctUsePubTrans,PolicCars,PolicOperBudg,LemasPctPolicOnPatr,LemasGangUnitDeploy,LemasPctOfficDrugUn,PolicBudgPerPop,ViolentCrimesPerPop
0,8,?,?,Lakewoodcity,1,0.19,0.33,0.02,0.9,0.12,...,0.12,0.26,0.2,0.06,0.04,0.9,0.5,0.32,0.14,0.2
1,53,?,?,Tukwilacity,1,0.0,0.16,0.12,0.74,0.45,...,0.02,0.12,0.45,?,?,?,?,0.0,?,0.67
2,24,?,?,Aberdeentown,1,0.0,0.42,0.49,0.56,0.17,...,0.01,0.21,0.02,?,?,?,?,0.0,?,0.43
3,34,5,81440,Willingborotownship,1,0.04,0.77,1.0,0.08,0.12,...,0.02,0.39,0.28,?,?,?,?,0.0,?,0.12
4,42,95,6096,Bethlehemtownship,1,0.01,0.55,0.02,0.95,0.09,...,0.04,0.09,0.02,?,?,?,?,0.0,?,0.03


In [42]:
communities_crimes = communities_crimes.drop(columns=['state','county','community','communityname','fold'], axis=1)
communities_crimes = communities_crimes.replace('?', np.NaN)
communities_crimes = communities_crimes.dropna(axis=1)
communities_crimes.head()

Unnamed: 0,population,householdsize,racepctblack,racePctWhite,racePctAsian,racePctHisp,agePct12t21,agePct12t29,agePct16t24,agePct65up,...,PctForeignBorn,PctBornSameState,PctSameHouse85,PctSameCity85,PctSameState85,LandArea,PopDens,PctUsePubTrans,LemasPctOfficDrugUn,ViolentCrimesPerPop
0,0.19,0.33,0.02,0.9,0.12,0.17,0.34,0.47,0.29,0.32,...,0.12,0.42,0.5,0.51,0.64,0.12,0.26,0.2,0.32,0.2
1,0.0,0.16,0.12,0.74,0.45,0.07,0.26,0.59,0.35,0.27,...,0.21,0.5,0.34,0.6,0.52,0.02,0.12,0.45,0.0,0.67
2,0.0,0.42,0.49,0.56,0.17,0.04,0.39,0.47,0.28,0.32,...,0.14,0.49,0.54,0.67,0.56,0.01,0.21,0.02,0.0,0.43
3,0.04,0.77,1.0,0.08,0.12,0.1,0.51,0.5,0.34,0.21,...,0.19,0.3,0.73,0.64,0.65,0.02,0.39,0.28,0.0,0.12
4,0.01,0.55,0.02,0.95,0.09,0.05,0.38,0.38,0.23,0.36,...,0.11,0.72,0.64,0.61,0.53,0.04,0.09,0.02,0.0,0.03


In [43]:
X4:np.ndarray = communities_crimes.drop(columns=['ViolentCrimesPerPop'])
y4:np.ndarray = communities_crimes['ViolentCrimesPerPop']
y:float = X4*2+3

linear_reg.fit(X4, y)
ridge.fit(X4, y4)
lasso.fit(X4,y4)
bayesian_ridge.fit(X4,y4)
elastic_net.fit(X4,y4)

get_errors(bayesian_ridge, X4, y4)
grid_search(bayesian_ridge, bayesian_ridge_params(), X4, y4)
randomized_search(bayesian_ridge, bayesian_ridge_params(), X4, y4)
get_mean_values(bayesian_ridge, X4, y4)

Negative mean absolute errors for BayesianRidge() are [-0.09631268 -0.10385865 -0.08741068 -0.09242159 -0.09237512]
Negative mean squared errors for BayesianRidge() are [-0.02025571 -0.02279832 -0.01650001 -0.01613057 -0.01740948]
Negative median absolute errors for BayesianRidge() are [-0.06664378 -0.06785932 -0.05817873 -0.06864403 -0.06217283]
The mean absolute error for train scores is  -0.09123194984053748
The mean absolute error for test scores is  -0.09447574447653888
The mean squared error for train scores is  -0.017128726010436517
The mean squared error for test scores is  -0.018618818758893846
The median absolute error for train scores is  -0.06378250172302721
The median absolute error for test scores is  -0.06469973943338136


In [45]:
def grid_calculate(model, params:Dict[str,List], data:np.ndarray, target:np.ndarray, score_type:str, error_type:str) -> float:
    """
    It reports the average results for one of the folds (training/testing) with the GridSearchCV method
    param model: the regression model
    param param_grid: dictionary with parameter names as key value and a set of values
    param date: the date of the dataset
    param target: the dataset target
    param score_type: the score type
    param error_type: the error type
    return: average
    """
    grd_src = grid_search(model, params, data, target)
    result:List[float] = cross_validate(grd_src, data, target, cv=5, return_train_score=True, scoring=error_type)
    
    return result[score_type].mean()

In [46]:
def randomize_calculate(model, params:Dict[str,List], data:np.ndarray, target:np.ndarray, score_type:str, error_type:str) -> float:
    """
    It reports the average results for one of the folds (training/testing) with the RandomizedSearchCV method
    param model: the regression model
    param param_grid: dictionary with parameter names as key value and a set of values
    param date: the date of the dataset
    param target: the dataset target
    param score_type: the score type
    param error_type: the error type
    return: average
    """
    rand_src = randomized_search(model, params, data, target)
    result:List[float] = cross_validate(rand_src, data, target, cv=5, return_train_score=True, scoring=error_type)
    
    return result[score_type].mean()

In [47]:
def get_errors_score(score_type: str, error_type: str, data:np.ndarray, target:np.ndarray) -> List[float]:
    """
    Build a list from the averages of the results for the training folds, as well as for the test folds,
    for a data set given as a parameter
    param score_type: the score type
    param error_type: the error type
    param date: the date of the dataset
    param target: the dataset target
    return: list of environments
    """
    column:List[float] = []
    column.append(grid_calculate(linear_reg, lin_reg_params(), data, target, score_type, error_type))
    column.append(randomize_calculate(linear_reg, lin_reg_params(), data, target, score_type, error_type))
    column.append(grid_calculate(ridge, lasso_ridge_elastic_params(), data, target, score_type, error_type))  
    column.append(randomize_calculate(ridge, lasso_ridge_elastic_params(), data, target, score_type, error_type))
    column.append(grid_calculate(lasso, lasso_ridge_elastic_params(), data, target, score_type, error_type))   
    column.append(randomize_calculate(lasso, lasso_ridge_elastic_params(), data, target, score_type, error_type))
    column.append(grid_calculate(elastic_net, lasso_ridge_elastic_params(), data, target, score_type, error_type))   
    column.append(randomize_calculate(elastic_net, lasso_ridge_elastic_params(), data, target, score_type, error_type))
    column.append(grid_calculate(bayesian_ridge, bayesian_ridge_params(), data, target, score_type, error_type))
    column.append(randomize_calculate(bayesian_ridge, bayesian_ridge_params(), data, target, score_type, error_type))
    
    return column

In [48]:
def get_data_frame(data:np.ndarray, target:np.ndarray) -> Dict[str,list]:
    """
    It builds a dictionary from the columns made up of the averages of the results for the training folds,
    as well as for the test ones, for a data set given as a parameter
    param date: the date of the dataset
    param target: the dataset target
    return: the dictionary
    """
    test_neg_mean_absolute_error:List[float] = get_errors_score('test_score', 'neg_mean_absolute_error', data, target)    
    test_neg_mean_squared_error:List[float] = get_errors_score('test_score', 'neg_mean_squared_error', data, target)
    test_neg_median_absolute_error:List[float] = get_errors_score('test_score', 'neg_median_absolute_error', data, target)
    train_neg_mean_absolute_error:List[float] = get_errors_score('train_score', 'neg_mean_absolute_error', data, target)
    train_neg_mean_squared_error:List[float] = get_errors_score('train_score', 'neg_mean_squared_error', data, target)
    train_neg_median_absolute_error:List[float] = get_errors_score('train_score', 'neg_median_absolute_error', data, target)

    data_frame:Dict[str,list] = {
            'Model_name': ['Linear Regression', 'Linear Regression', 'Ridge', 'Ridge', 'Lasso', 'Lasso', 'Elastic Net', 'Elastic Net', 'Bayesian Ridge', 'Bayesian Ridge'],
            'Search_strategy': ['GridSearchCV', 'RandomizedSearchCV', 'GridSearchCV', 'RandomizedSearchCV', 'GridSearchCV', 'RandomizedSearchCV','GridSearchCV', 'RandomizedSearchCV', 'GridSearchCV', 'RandomizedSearchCV'],
            'test_neg_mean_absolute_error': test_neg_mean_absolute_error,
            'test_neg_mean_squared_error': test_neg_mean_squared_error,
            'test_neg_median_absolute_error': test_neg_median_absolute_error,
            'train_neg_mean_absolute_error': train_neg_mean_absolute_error,
            'train_neg_mean_squared_error': train_neg_mean_squared_error,
            'train_neg_median_absolute_error': train_neg_median_absolute_error,
        }
    
    return data_frame

In [49]:
data_frame = get_data_frame(X1, y1)
df = pd.DataFrame(data_frame)
display(df)

Unnamed: 0,Model_name,Search_strategy,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_neg_median_absolute_error,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_neg_median_absolute_error
0,Linear Regression,GridSearchCV,-42.934476,-6257.865289,-27.414068,-36.772557,-3259.057133,-25.252351
1,Linear Regression,RandomizedSearchCV,-42.934476,-6257.865289,-27.414068,-36.772557,-3259.057133,-25.203523
2,Ridge,GridSearchCV,-43.373745,-6382.840842,-27.05157,-36.695529,-3243.698658,-25.581415
3,Ridge,RandomizedSearchCV,-43.374611,-6566.024479,-27.051601,-36.695529,-3285.893393,-25.581402
4,Lasso,GridSearchCV,-43.218771,-6350.585413,-26.929644,-36.679619,-3243.821918,-25.639825
5,Lasso,RandomizedSearchCV,-43.220251,-7185.831897,-26.925602,-36.679731,-3243.814645,-25.644941
6,Elastic Net,GridSearchCV,-42.9932,-6320.474251,-26.840256,-36.681003,-3244.065986,-25.629326
7,Elastic Net,RandomizedSearchCV,-43.04097,-6317.699927,-26.83653,-36.681738,-3244.098581,-25.620818
8,Bayesian Ridge,GridSearchCV,-42.165144,-6109.277929,-26.74385,-36.717201,-3281.178498,-25.581158
9,Bayesian Ridge,RandomizedSearchCV,-42.165144,-6109.277929,-26.743849,-36.717201,-3281.178497,-25.581158


In [50]:
def get_positive_data_frame(data_frame: Dict[str,List]) -> Dict[str,List]:
    """
    It builds a dictionary with positive values from the columns made up of the averages of the results
    for the training folds, as well as for the testing ones, for a data set given as a parameter
    param date: the date of the dataset
    param target: the dataset target
    return: the dictionary
    """
    pos_data_frame:Dict[str,List] = {}
    lst:List = []
    for key in data_frame:
        for value in data_frame[key]:
            if isinstance(value, float):
                lst.append(abs(value))
            else:
                lst.append(value)
        pos_data_frame[key.replace("_neg","")] = lst
        lst = []
        
    return pos_data_frame

In [51]:
positive_data_frame:Dict[str,List] = get_positive_data_frame(data_frame)
pos_df = pd.DataFrame(positive_data_frame)
display(pos_df)

Unnamed: 0,Model_name,Search_strategy,test_mean_absolute_error,test_mean_squared_error,test_median_absolute_error,train_mean_absolute_error,train_mean_squared_error,train_median_absolute_error
0,Linear Regression,GridSearchCV,42.934476,6257.865289,27.414068,36.772557,3259.057133,25.252351
1,Linear Regression,RandomizedSearchCV,42.934476,6257.865289,27.414068,36.772557,3259.057133,25.203523
2,Ridge,GridSearchCV,43.373745,6382.840842,27.05157,36.695529,3243.698658,25.581415
3,Ridge,RandomizedSearchCV,43.374611,6566.024479,27.051601,36.695529,3285.893393,25.581402
4,Lasso,GridSearchCV,43.218771,6350.585413,26.929644,36.679619,3243.821918,25.639825
5,Lasso,RandomizedSearchCV,43.220251,7185.831897,26.925602,36.679731,3243.814645,25.644941
6,Elastic Net,GridSearchCV,42.9932,6320.474251,26.840256,36.681003,3244.065986,25.629326
7,Elastic Net,RandomizedSearchCV,43.04097,6317.699927,26.83653,36.681738,3244.098581,25.620818
8,Bayesian Ridge,GridSearchCV,42.165144,6109.277929,26.74385,36.717201,3281.178498,25.581158
9,Bayesian Ridge,RandomizedSearchCV,42.165144,6109.277929,26.743849,36.717201,3281.178497,25.581158


In [52]:
style = pos_df.style.\
    highlight_max(color = 'green', axis = 0).\
    highlight_min(color = 'red', axis = 0)

style

Unnamed: 0,Model_name,Search_strategy,test_mean_absolute_error,test_mean_squared_error,test_median_absolute_error,train_mean_absolute_error,train_mean_squared_error,train_median_absolute_error
0,Linear Regression,GridSearchCV,42.934476,6257.865289,27.414068,36.772557,3259.057133,25.252351
1,Linear Regression,RandomizedSearchCV,42.934476,6257.865289,27.414068,36.772557,3259.057133,25.203523
2,Ridge,GridSearchCV,43.373745,6382.840842,27.05157,36.695529,3243.698658,25.581415
3,Ridge,RandomizedSearchCV,43.374611,6566.024479,27.051601,36.695529,3285.893393,25.581402
4,Lasso,GridSearchCV,43.218771,6350.585413,26.929644,36.679619,3243.821918,25.639825
5,Lasso,RandomizedSearchCV,43.220251,7185.831897,26.925602,36.679731,3243.814645,25.644941
6,Elastic Net,GridSearchCV,42.9932,6320.474251,26.840256,36.681003,3244.065986,25.629326
7,Elastic Net,RandomizedSearchCV,43.04097,6317.699927,26.83653,36.681738,3244.098581,25.620818
8,Bayesian Ridge,GridSearchCV,42.165144,6109.277929,26.74385,36.717201,3281.178498,25.581158
9,Bayesian Ridge,RandomizedSearchCV,42.165144,6109.277929,26.743849,36.717201,3281.178497,25.581158


In [66]:
head_title = 'CPU Computer Hardware'
print(style.to_html())

<style type="text/css">
#T_77134_row0_col1, #T_77134_row1_col7, #T_77134_row2_col1, #T_77134_row2_col6, #T_77134_row4_col1, #T_77134_row4_col5, #T_77134_row6_col1, #T_77134_row8_col0, #T_77134_row8_col1, #T_77134_row8_col2, #T_77134_row8_col3, #T_77134_row9_col0, #T_77134_row9_col3, #T_77134_row9_col4 {
  background-color: red;
}
#T_77134_row0_col4, #T_77134_row0_col5, #T_77134_row1_col1, #T_77134_row1_col4, #T_77134_row1_col5, #T_77134_row2_col0, #T_77134_row3_col0, #T_77134_row3_col1, #T_77134_row3_col2, #T_77134_row3_col6, #T_77134_row5_col1, #T_77134_row5_col3, #T_77134_row5_col7, #T_77134_row7_col1, #T_77134_row9_col1 {
  background-color: green;
}
</style>
<table id="T_77134">
  <thead>
    <tr>
      <th class="blank level0" >&nbsp;</th>
      <th id="T_77134_level0_col0" class="col_heading level0 col0" >Model_name</th>
      <th id="T_77134_level0_col1" class="col_heading level0 col1" >Search_strategy</th>
      <th id="T_77134_level0_col2" class="col_heading level0 col2" >test