## Regression Problems

<div style="border-bottom: 3px solid black; margin-bottom:5px"></div>
<div style="border-bottom: 3px solid black"></div>

# 1. Importing Required Libraries

The code block below will import all the libraries and dependencies for regression problems.

**Run the code cell below** 

In [1]:
from scipy.io import arff
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.svm               # For SVC
import sklearn.metrics           # For accuracy_score
import sklearn.model_selection   # For GridSearchCV and RandomizedSearchCV
import scipy
import scipy.stats               # For reciprocal distribution
import warnings
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor,RandomForestRegressor
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import Ridge
from scipy.stats import pearsonr
from sklearn.metrics import mean_squared_error
warnings.filterwarnings("ignore", category=DeprecationWarning)  # Ignore sklearn deprecation warnings
warnings.filterwarnings("ignore", category=FutureWarning)       # Ignore sklearn deprecation warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)

<div style="border-bottom: 3px solid black; margin-bottom:5px"></div>
<div style="border-bottom: 3px solid black"></div>

# 2. Loading All Datasets

The code block below will load all the datasets for regression problems.


**Dataset Mapper**
1. Wine Quality -> RP_1
2. Communities and Crime -> RP_2
3. QSAR aquatic toxicity -> RP_3
4. Parkinson Speech -> RP_4
5. Facebook metrics -> RP_5
6. Bike Sharing (use hour data) -> RP_6
7. Student Performance (use just student-por.csv if you do not know how to merge the math grades) -> RP_7
8. Concrete Compressive Strength -> RP_8
9. SGEMM GPU kernel performance -> RP_9
10. Merck Molecular Activity Challenge (from Kaggle) -> RP_10

In [2]:
np.random.seed(23)
M1Tr=[]
M1Te=[]
M2Tr=[]
M2Te=[]
MList=['SVM with RBF Kernel','SVM with Linear Kernel','Decision Tree Regressor','Random Forest Regressor','AdaBoost Regressor','Gaussian Process Regressor','Linear Regressor','Ridge Regressor']

"""
Splits the data into Features (X) and Labels (y)
"""
def splitData(data):
    X = data.iloc[:,0:len(data.columns)-1]
    y = data.iloc[:,-1]
    return X,y

"""
Splits data into Training Set and Testing Set. 
Size Ratio of Train:Test is 70:30 
"""
def getTrainTestData(data):
    X,y = splitData(data)
    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X,y,test_size=0.3,random_state=23)
    return X_train, X_test, y_train, y_test
    

"""
Converts categorical features by encoding
"""
def convertCategorical(df):
    categorical_feature_mask = df.dtypes==object
    categorical_cols = df.columns[categorical_feature_mask].tolist()
    le = LabelEncoder()
    df[categorical_cols] = df[categorical_cols].apply(lambda col: le.fit_transform(col))
    return df;

def minMax(x):
    return pd.Series(index=['min','max'],data=[x.min(),x.max()])


In [3]:
# Wine Quality Data | 11 Features | 6497 Samples
RP_1_red = pd.read_csv("RP_Data/winequality-red.csv", sep=";",header=None,skiprows=1)
RP_1_white = pd.read_csv("RP_Data/winequality-white.csv", sep=";",header=None,skiprows=1)
RP_1 = pd.concat([RP_1_red,RP_1_white])
# print(RP_1.apply(minMax))
RP_1_X_train, RP_1_X_test, RP_1_y_train, RP_1_y_test = getTrainTestData(RP_1)
scaler = StandardScaler().fit(RP_1_X_train)
RP_1_X_train = scaler.transform(RP_1_X_train)
RP_1_X_test = scaler.transform(RP_1_X_test)

# Communities and Crime Data| 127 Features | 1994 Samples 
RP_2 = pd.read_csv('RP_Data/communities.data',header=None)
RP_2 = RP_2.drop([0, 1, 2, 3, 4], axis=1) # removed first 5 columns. non predictive
RP_2 = RP_2.replace('?',0)
# print(RP_2.apply(minMax))
RP_2_X_train, RP_2_X_test, RP_2_y_train, RP_2_y_test = getTrainTestData(RP_2)


# QSAR Aquatic Toxicity Data | 8 Features | 546 Samples
RP_3 = pd.read_csv("RP_Data/qsar_aquatic_toxicity.csv", sep=";",header=None)
# print(RP_3.apply(minMax))
RP_3_X_train, RP_3_X_test, RP_3_y_train, RP_3_y_test = getTrainTestData(RP_3)
scaler = StandardScaler().fit(RP_3_X_train)
RP_3_X_train = scaler.transform(RP_3_X_train)
RP_3_X_test = scaler.transform(RP_3_X_test)


# Parkinson Speech Data | 14 Features | 690 Samples
RP_4 = pd.read_csv("RP_Data/parkinsons_train_data.txt", sep=",",header=None)
RP_4 = RP_4.drop([28], axis=1) # removed last column. categorical output
# print(RP_4_train.shape)
# print(RP_4_test.shape)
# print(RP_4_train[28].nunique())
# print(RP_4_test[27].nunique())
RP_4_X_train, RP_4_X_test, RP_4_y_train, RP_4_y_test = getTrainTestData(RP_4)
scaler = StandardScaler().fit(RP_4_X_train)
RP_4_X_train = scaler.transform(RP_4_X_train)
RP_4_X_test = scaler.transform(RP_4_X_test)




# Facebook Data | 18 Features | 500 Samples 
RP_5 = pd.read_csv("RP_Data/dataset_Facebook.csv", sep=";")
RP_5 = convertCategorical(RP_5)
RP_5  = RP_5.fillna(RP_5.mean())
# print(RP_5.apply(minMax))
RP_5_X_train, RP_5_X_test, RP_5_y_train, RP_5_y_test = getTrainTestData(RP_5)
scaler = StandardScaler().fit(RP_5_X_train)
RP_5_X_train = scaler.transform(RP_5_X_train)
RP_5_X_test = scaler.transform(RP_5_X_test)


# Bike Sharing Hours Data | 16 Features | 17379 Samples 
RP_6 = pd.read_csv("RP_Data/hour.csv", sep=",")
RP_6 = RP_6.drop(['instant'], axis=1) # removed second last column. non predictive
RP_6 = convertCategorical(RP_6)
# print(RP_6.apply(minMax))
RP_6_X_train, RP_6_X_test, RP_6_y_train, RP_6_y_test = getTrainTestData(RP_6)
scaler = StandardScaler().fit(RP_6_X_train)
RP_6_X_train = scaler.transform(RP_6_X_train)
RP_6_X_test = scaler.transform(RP_6_X_test)


# Student Performance Data | 32 Features | 4934964982 Samples
RP_7 = pd.read_csv("RP_Data/student-por.csv", sep=";")
RP_7 = convertCategorical(RP_7)
# print(RP_7.apply(minMax))
RP_7_X_train, RP_7_X_test, RP_7_y_train, RP_7_y_test = getTrainTestData(RP_7)


# Concrete Data | 8 Features | 1030 Samples
RP_8 =  pd.read_excel (r'RP_Data/Concrete_Data.xls',skiprows=1,header=None)
# print(RP_8.apply(minMax))
RP_8_X_train, RP_8_X_test, RP_8_y_train, RP_8_y_test = getTrainTestData(RP_8)
scaler = StandardScaler().fit(RP_8_X_train)
RP_8_X_train = scaler.transform(RP_8_X_train)
RP_8_X_test = scaler.transform(RP_8_X_test)


# SGEMM GPU kernel performance Data | 14 Features | 241600 Samples 
RP_9 = pd.read_csv("RP_Data/sgemm_product.csv", sep=",")
RP_9['Run (ms)'] = RP_9.iloc[:, -4:].sum(axis=1)/4 #add column of average run of 4 runs
RP_9 = RP_9.drop(['Run1 (ms)','Run2 (ms)','Run3 (ms)','Run4 (ms)'], axis = 1)  #drop 4 runs column
# print(RP_9.apply(minMax))
RP_9_X_train, RP_9_X_test, RP_9_y_train, RP_9_y_test = getTrainTestData(RP_9)
scaler = StandardScaler().fit(RP_9_X_train)
RP_9_X_train = scaler.transform(RP_9_X_train)
RP_9_X_test = scaler.transform(RP_9_X_test)


#Merck Molecular Dataset 1 | 5877 Features | 8716 Samples 
npzfile = np.load('RP_Data/File1.npz')
RP_101_X = npzfile['arr_0']
RP_101_y = npzfile['arr_1']
RP_101_X_train, RP_101_X_test, RP_101_y_train, RP_101_y_test = sklearn.model_selection.train_test_split(RP_101_X,RP_101_y,test_size=0.3,random_state=23)


#Merck Molecular Dataset 2 | 4306 Features | w Samples 
npzfile = np.load('RP_Data/File2.npz')
RP_102_X = npzfile['arr_0']
RP_102_y = npzfile['arr_1']
RP_102_X_train, RP_102_X_test, RP_102_y_train, RP_102_y_test = sklearn.model_selection.train_test_split(RP_102_X,RP_102_y,test_size=0.3,random_state=23)

print('Regression Data Loaded Successfully.')

Regression Data Loaded Successfully.


In [4]:
# # Merck Molecular Activity Challenge | RAW FILE LOADING AND CACHING PROCESS | Given By The Professor
# with open("ACT2_competition_training.csv") as f:
#     cols = f.readline().rstrip('\n').split(',') # Read the header line and get list of column names

# # Load the actual data, ignoring first column and using second column as targets.
# RP_101_X = np.loadtxt("ACT2_competition_training.csv", delimiter=',', usecols=range(2, len(cols)), skiprows=1, dtype=np.uint8)
# RP_101_y = np.loadtxt("ACT2_competition_training.csv", delimiter=',', usecols=[1], skiprows=1)
# np.savez('File1',RP_101_X,RP_101_y)

# with open("ACT4_competition_training.csv") as f:
#     cols = f.readline().rstrip('\n').split(',') # Read the header line and get list of column names

# # Load the actual data, ignoring first column and using second column as targets.
# RP_102_X = np.loadtxt("ACT4_competition_training.csv", delimiter=',', usecols=range(2, len(cols)), skiprows=1, dtype=np.uint8)
# RP_102_y = np.loadtxt("ACT4_competition_training.csv", delimiter=',', usecols=[1], skiprows=1)
# np.savez('File2',RP_102_X,RP_102_y)

# print('Done')

<div style="border-bottom: 3px solid black; margin-bottom:5px"></div>
<div style="border-bottom: 3px solid black"></div>

# 3. Regressors and Hyper Parameter Search Helper Methods

<div style="border-bottom: 3px solid black"></div>

## Hyper Parameter Search Helper Methods

##### Support Vector Regressor Parameter Search

In [5]:
def gridSearchSVR(model,X_train,y_train):
    print('Grid Search')
    param_grid = {
        'C': [0.001, 0.01, 0.1, 1, 10],
        'gamma' : [0.001, 0.01, 0.1, 1]
    }
    gridcv = GridSearchCV(model, param_grid, cv=5).fit(X_train,y_train)
    print(gridcv.best_params_)
    return gridcv.best_estimator_

def randomSearchSVR(model,X_train,y_train):
    print('Randomized Search')
    param_distributions = {
        'C'     : scipy.stats.reciprocal(1.0, 1000.),
        'gamma' : scipy.stats.reciprocal(0.01, 10.),
    }
    random_search = sklearn.model_selection.RandomizedSearchCV(model, param_distributions,cv=5, n_iter=30, random_state=23).fit(X_train,y_train)
    print(random_search.best_params_)
    return random_search.best_estimator_

##### Decision Tree Regressor Parameter Search

In [6]:
def gridSearchDTR(model,X_train,y_train):
    print('Grid Search')
    param_grid = {
        'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
        'max_features': ['auto', 'sqrt'],
        'min_samples_leaf': [1, 2, 4],
        'min_samples_split': [2, 5, 10],
    }
    gridcv = GridSearchCV(model, param_grid, cv=5).fit(X_train,y_train)
    print(gridcv.best_params_)
    return gridcv.best_estimator_
    
def randomSearchDTR(model,X_train,y_train):
    print('Randomized Search')
    param_grid = {
        'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
        'max_features': ['auto', 'sqrt'],
        'min_samples_leaf': [1, 2, 4],
        'min_samples_split': [2, 5, 10],
    }
    random_search = sklearn.model_selection.RandomizedSearchCV(model, param_grid,cv=5, n_iter=30, random_state=23).fit(X_train,y_train)
    print(random_search.best_params_)
    return random_search.best_estimator_

##### Random Forest Regressor Parameter Search

In [7]:

def gridSearchRFR(model,X_train,y_train):
    print('Grid Search')
    param_grid = {
        'bootstrap': [True, False],
        'max_depth': [10, 20, None],
        'max_features': ['auto', 'sqrt'],
        'min_samples_leaf': [1, 2, 4],
        'min_samples_split': [2, 5, 10],
        'n_estimators': [10,20,50,100,150,200,250]
    }
    gridcv = GridSearchCV(model, param_grid, cv=5).fit(X_train,y_train)
    print(gridcv.best_params_)
    return gridcv.best_estimator_
    

def randomSearchRFR(model,X_train,y_train):
    print('Randomized Search')
    param_grid = {
        'bootstrap': [True, False],
        'max_depth': [10, 20, None],
        'max_features': ['auto', 'sqrt'],
        'min_samples_leaf': [1, 2, 4],
        'min_samples_split': [2, 5, 10],
        'n_estimators': [10,20,50,100,150,200,250]
    }
    random_search = sklearn.model_selection.RandomizedSearchCV(model, param_grid,cv=5, n_iter=30, random_state=23).fit(X_train,y_train)
    print(random_search.best_params_)
    return random_search.best_estimator_

##### AdaBoost Regressor Parameter Search

In [8]:
def gridSearchABR(model,X_train,y_train):
    print('Grid Search')
    param_grid = {
        'n_estimators': [10,20,50,100,150,200,500],
        'learning_rate': [0.001,0.01,0.05,0.1,0.2,0.5,0.6,0.9,1,1.2,1.5,2],
        'loss' : ['linear', 'square', 'exponential']
    }
    gridcv = GridSearchCV(model, param_grid, cv=5).fit(X_train,y_train)
    print(gridcv.best_params_)
    return gridcv.best_estimator_
    
def randomSearchABR(model,X_train,y_train):
    print('Randomized Search')
    param_distributions = {
        'n_estimators': scipy.stats.randint(10,500),
        'learning_rate': scipy.stats.reciprocal(0.001, 1.5),
         'loss' : ['linear', 'square', 'exponential']
    }
    random_search = sklearn.model_selection.RandomizedSearchCV(model, param_distributions,cv=5, n_iter=30, random_state=23).fit(X_train,y_train)
    print(random_search.best_params_)
    return random_search.best_estimator_
    

##### Gaussian Process Regressor Parameter Search

In [9]:
def gridSearchGPR(model,X_train,y_train):
    print('Grid Search')
    param_grid = {
        "alpha": np.logspace(-5, 5, 50)
    }
    gridcv = GridSearchCV(model, param_grid, cv=5).fit(X_train,y_train)
    print(gridcv.best_params_)
    return gridcv.best_estimator_
    
def randomSearchGPR(model,X_train,y_train):
    print('Randomized Search')
    param_distributions = {
        "alpha": np.logspace(-5, 5, 50)
    }
    random_search = sklearn.model_selection.RandomizedSearchCV(model, param_distributions,cv=5, n_iter=30, random_state=23).fit(X_train,y_train)
    print(random_search.best_params_)
    return random_search.best_estimator_
    

##### Ridge Regressor Parameter Search

In [10]:
def gridSearchRIR(model,X_train,y_train):
    print('Grid Search')
    param_grid = {
        'solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg'],
        'alpha': np.logspace(-5, 5, 50),
        'normalize' : [True, False]
    }
    gridcv = GridSearchCV(model, param_grid, cv=5).fit(X_train,y_train)
    print(gridcv.best_params_)
    return gridcv.best_estimator_
    
def randomSearchRIR(model,X_train,y_train):
    print('Randomized Search')
    param_distributions = {
        'solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg'],
        'alpha': scipy.stats.reciprocal(0.00001, 100.),
        'normalize' : [True, False]
    
    }
    randcv = sklearn.model_selection.RandomizedSearchCV(model, param_distributions,cv=5, n_iter=30, random_state=23).fit(X_train,y_train)
    print(randcv.best_params_)
    return randcv.best_estimator_
    

In [11]:
def scoreHelper(rgr, X_train, X_test, y_train, y_test, parity):
    if parity == 1:
        print('Pearson Corelation Calculated and Added')
        M1Tr.append(pearsonr(y_train,rgr.predict(X_train))[0]**2)
        M1Te.apppend(pearsonr(y_test,rgr.predict(X_test))[0]**2)
    elif parity == 2:
        print('Pearson Corelation Calculated and Added')
        M2Tr.append(pearsonr(y_train,rgr.predict(X_train))[0]**2)
        M2Te.apppend(pearsonr(y_test,rgr.predict(X_test))[0]**2)
    elif parity == 3:
        print('Training Mean Squared Error ',mean_squared_error(y_train,rgr.predict(X_train)))
        print('Testing Mean Squared Error ',mean_squared_error(y_test,rgr.predict(X_test)))
    else:
        print('Training R2 Score ',rgr.score(X_train, y_train))
        print('Testing R2 Score ',rgr.score(X_test, y_test))
        
def merckScore(x,y):
    for i in x:
        print('Evaluation Using ',MList[i],' : ')
        print(*(x[i]+y[i])/2)

<div style="border-bottom: 3px solid black"></div>

## Regressors

#### Support Vector Regressor with RBF kernel

In [12]:
def rbfSVR(X_train, X_test, y_train, y_test, hs, parity, C=1, gamma='scale'):
    print('\nResult for RBF Support Vector Regression')
    rgr = SVR(C=C, gamma=gamma)
    if hs:
        rgr = gridSearchSVR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
        rgr = randomSearchSVR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
    else:
        rgr = rgr.fit(X_train, y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

#### Support Vector Regressor with Linear Kernel

In [13]:
def linearSVR(X_train, X_test, y_train, y_test, hs, parity, C=1, gamma='scale'):
    print('\nResult for Linear Support Vector Regression')
    rgr = SVR(kernel='linear', C=C, gamma=gamma)
    if hs:
        rgr = gridSearchSVR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
        rgr = randomSearchSVR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
    else:
        rgr = rgr.fit(X_train, y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

#### Decision Tree Regressor 

In [14]:
def DTR(X_train, X_test, y_train, y_test, hs, parity,max_depth=None, max_features=None,min_samples_leaf=1,min_samples_split=2):
    print('\nResult for Decision Tree Regression')
    rgr = DecisionTreeRegressor(max_depth=max_depth, max_features=max_features,min_samples_leaf=min_samples_leaf,min_samples_split=min_samples_split, random_state=23)       
    if hs:
        rgr = gridSearchDTR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
        rgr = randomSearchDTR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
    else:
        rgr = rgr.fit(X_train, y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

#### Random Forest Regressor 

In [15]:
def RFR(X_train, X_test, y_train, y_test, hs, parity, max_depth=None, max_features='auto',min_samples_leaf=1,min_samples_split=2, bootstrap=True,n_estimators=100):
    print('\nResult for Random Forest Regression')
    rgr = RandomForestRegressor(max_depth=max_depth, max_features=max_features,min_samples_leaf=min_samples_leaf,min_samples_split=min_samples_split, bootstrap=bootstrap, n_estimators=n_estimators, random_state=23)
    if hs:
        rgr = gridSearchRFR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
        rgr = randomSearchRFR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
    else:
        rgr = rgr.fit(X_train, y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

#### AdaBoost Regressor

In [16]:
def ABR(X_train, X_test, y_train, y_test, hs, parity,n_estimators=50, learning_rate=1, loss='linear'):
    print('\nResult for AdaBoost Regression')
    rgr = AdaBoostRegressor(random_state=23,n_estimators=n_estimators, learning_rate=learning_rate, loss=loss)
    if hs:
        rgr = gridSearchABR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
        rgr = randomSearchABR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
    else:
        rgr = rgr.fit(X_train, y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

#### Gaussian Process Regressor

In [17]:
def GPR(X_train, X_test, y_train, y_test, hs, parity, alpha=1e-10):
    print('\nResult for Gaussian Process Regression')
    rgr = GaussianProcessRegressor(alpha=alpha)
    if hs:
        rgr = gridSearchGPR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
        rgr = randomSearchGPR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
    else:
        rgr = GaussianProcessRegressor().fit(X_train, y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

#### Linear Regressor

In [18]:
def LIR(X_train, X_test, y_train, y_test, hs, parity):
    print('\nResult for Linear Regression')
    rgr = LinearRegression().fit(X_train, y_train)
    scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

#### Ridge Regressor

In [19]:
def RIR(X_train, X_test, y_train, y_test, hs, parity, solver='auto',alpha=1,normalize=False):
    print('\nResult for Ridge Regression')
    rgr = Ridge(solver=solver,alpha=alpha,normalize=normalize)
    if hs:
        rgr = gridSearchRIR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
        rgr = randomSearchRIR(rgr,X_train,y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)
    else:
        rgr = rgr.fit(X_train, y_train)
        scoreHelper(rgr, X_train, X_test, y_train, y_test, parity)

<div style="border-bottom: 3px solid black; margin-bottom:5px"></div>
<div style="border-bottom: 3px solid black"></div>

# 4. Working with Datasets

In [22]:
def regress(X_train, X_test, y_train, y_test,hs=False, parity=0):
    rbfSVR(X_train, X_test, y_train, y_test, hs, parity)
    linearSVR(X_train, X_test, y_train, y_test, hs, parity)
    DTR(X_train, X_test, y_train, y_test, hs, parity)
    RFR(X_train, X_test, y_train, y_test, hs, parity)
    ABR(X_train, X_test, y_train, y_test, hs, parity)
    GPR(X_train, X_test, y_train, y_test, hs, parity)
    LIR(X_train, X_test, y_train, y_test, hs, parity)
    RIR(X_train, X_test, y_train, y_test, hs, parity)

In [3]:
print('Wine Quality Dataset')
regress(RP_1_X_train, RP_1_X_test, RP_1_y_train, RP_1_y_test, hs = False, parity=0)
print('-------------------------------------------------------')
print('\n\nCommunities and Crimes Dataset ')
regress(RP_2_X_train, RP_2_X_test, RP_2_y_train, RP_2_y_test, hs = False, parity=0)
print('-------------------------------------------------------')
print('\n\nQSAR Aquatic Toxicity Dataset Training')
regress(RP_3_X_train, RP_3_X_test, RP_3_y_train, RP_3_y_test, hs = False, parity=0)
print('-------------------------------------------------------')
print('\n\nParkinsons Dataset')
regress(RP_4_X_train, RP_4_X_test, RP_4_y_train, RP_4_y_test, hs = False, parity=0)
print('-------------------------------------------------------')
print('\n\nFacebook Metrics Dataset')
regress(RP_5_X_train, RP_5_X_test, RP_5_y_train, RP_5_y_test, hs=False, parity=0)
print('-------------------------------------------------------')
print('\n\nBike Sharing Dataset')
regress(RP_6_X_train, RP_6_X_test, RP_6_y_train, RP_6_y_test, hs=False, parity=0)
print('-------------------------------------------------------')
print('\n\nStudent Performance Dataset')
regress(RP_7_X_train, RP_7_X_test, RP_7_y_train, RP_7_y_test, hs=False, parity=0)
print('-------------------------------------------------------')
print('\n\nConcrete Compressive Dataset')
regress(RP_8_X_train, RP_8_X_test, RP_8_y_train, RP_8_y_test, hs=False, parity=0)
print('-------------------------------------------------------')
print('\n\nSGEMM GPU Performance Dataset')
print('Might Crash in Gaussian Process Regression due to lack of memory.')
regress(RP_9_X_train, RP_9_X_test, RP_9_y_train, RP_9_y_test, hs=False, parity=0)
print('-------------------------------------------------------')

In [2]:
print('Initiating Merck Sequence. Please Be Patient.')
print('Merck Results can be found in RP_Data/Hyperparameter Search Results/RP_10 Merck Result')
print('Merck Molecular Dataset 1')
regress(RP_101_X_train, RP_101_X_test, RP_101_y_train, RP_101_y_test, parity=1)
print('-------------------------------------------------------')
print('Merck Molecular Dataset 2')
regress(RP_102_X_train, RP_102_X_test, RP_102_y_train, RP_102_y_test, parity=2)
print('Training Complete')
print('Merck Score Training')
merckScore(M1Tr,M2Tr)
print('Merck Score Testing')
merckScore(M1Te,M2Te)

<div style="border-bottom: 3px solid black; margin-bottom:5px"></div>
<div style="border-bottom: 3px solid black"></div>


## For Best Models

Use the code cell below:

1. Find Dataset & Parameters from the Configs file
2. Give the Dataset and Parameters for the Model
3. Run and Enjoy the best result

Add the proper RP_#_ according to your requirements in parameters. Example RP_3_X_train, RP_3_X_test, ...

**Dataset Mapper**
1. Wine Quality -> RP_1
2. Communities and Crime -> RP_2
3. QSAR aquatic toxicity -> RP_3
4. Parkinson Speech -> RP_4
5. Facebook metrics -> RP_5
6. Bike Sharing (use hour data) -> RP_6
7. Student Performance (use just student-por.csv if you do not know how to merge the math grades) -> RP_7
8. Concrete Compressive Strength -> RP_8
9. SGEMM GPU kernel performance -> RP_9
10. Merck Molecular Activity Challenge (from Kaggle) -> RP_10

**Model Mapper**
1. Support Vector Regressor with RBF kernel -> rbfSVR()
1. Support Vector Regressor with Linear kernel -> linearSVR()
3. Decision Tree Regressor -> DTR()
4. Random Forest Regressor -> RFR()
5. AdaBoost Regressor -> ABR()
6. Gaussian Process Regressor -> GPR()
7. Linear Regressor -> LIR()
8. Ridge Regressor -> RIR()

In [None]:
# rbfSVR(X_train, X_test, y_train, y_test, hs=False, parity=0, C=0.1, gamma=1)
# linearSVR(X_train, X_test, y_train, y_test, hs=False, parity=0, C=0.1, gamma =1)
# DTR(X_train, X_test, y_train, y_test, hs=False, parity=0, max_depth=10, max_features='sqrt',min_samples_leaf=1,min_samples_split=2)
# RFR(X_train, X_test, y_train, y_test, hs=False, parity=0, max_depth=10, max_features='sqrt',min_samples_leaf=1,min_samples_split=2, bootstrap=False,n_estimators=100)
# ABR(RP_1_X_train, RP_1_X_test, RP_1_y_train, RP_1_y_test, hs=False, parity=0, n_estimators=150, learning_rate=0.1, loss='exponential')
# GPR(X_train, X_test, y_train, y_test, hs=False, parity=0, alpha=1e-10)
# LIR(X_train, X_test, y_train, y_test, hs=False, parity=0)
# RIR(X_train, X_test, y_train, y_test, hs=False, parity=0, solver='svd',alpha=0.5,normalize=True)