The xGBoost random forest model.


In [25]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_validate, GridSearchCV,RandomizedSearchCV, train_test_split,StratifiedKFold
import xgboost as gb

import utility as util

from gensim.models import Word2Vec
from nltk import word_tokenize

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
reviews = pd.read_csv("../Datasets/processedAnimeReviews.csv",index_col = 'id')
reviews.head(10)

Unnamed: 0_level_0,workName,overallRating,review,sentiment
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
8121,Cowboy_Bebop,10,cowboy bebop episodic series episodic mean one...,1
63480,Utawarerumono,8,utawarerumono manages one harem anime anyone p...,1
8452,Hajime_no_Ippo,10,first let say fan boxing fact pretty much hate...,1
66544,Gensoumaden_Saiyuuki,9,saiyuki one anime grab first episode let go ev...,1
55936,Ranma_½,7,comedy romance based manga rumiko takahashi ra...,1
22039,Kino_no_Tabi__The_Beautiful_World,9,say anime traveler journeying different countr...,1
68626,Kareshi_Kanojo_no_Jijou,8,kare kano romance anime could become incredibl...,1
18797,Hunter_x_Hunter,10,overall best anime actually seen anything else...,1
43899,Golden_Boy,10,overall honestly really care others opinion an...,1
18796,Hunter_x_Hunter,10,think hear anime people killing poor cute anim...,1


Some other bs

In [3]:
w2v_model = Word2Vec.load('../Models/w2vmodel.bin')
# Get mean feature vector of all words in a sentence
def meanFeatureVec(sentence, word_vectors):
    word_vecs = [word_vectors[word] for word in word_tokenize(sentence)]
    mean_vec = np.asarray(word_vecs).mean(axis=0)
    return mean_vec

# Takes a dataframe of the reviews and returns a new dataframe of the word embeddings per review
def reviewToVectors(sentences, word_vectors):
    sent_vecs = [meanFeatureVec(sentence, word_vectors) for sentence in sentences]
    df = pd.DataFrame(sent_vecs, index=sentences.index)
    return df
# Convert reviews to word embeddings
X_vectors = reviewToVectors(reviews['review'], w2v_model.wv)
# Split into train and test
y = reviews['sentiment']

X_train, X_test, y_train, y_test = train_test_split(X_vectors, y, test_size=0.1, random_state=2)

#Baseline
   All possible parameters for xgboost.XGBClassifier are:
   ----------
   max_depth : [0,infinity)
   
       Maximum tree depth for base learners.

   learning_rate : float
   
       Boosting learning rate (xgb's "eta")

   n_estimators : int
   
       Number of trees to fit.


   objective : string or callable
       Specify the learning task and the corresponding learning objective or
       a custom objective function to be used. There are 

   gamma : float
       Minimum loss reduction required to make a further partition on a leaf node of the tree.

   min_child_weight : int
       Minimum sum of instance weight(hessian) needed in a child.

   max_delta_step : int
       Maximum delta step we allow each tree's weight estimation to be.

   subsample : float
       Subsample ratio of the training instance.



   reg_alpha : float (xgb's alpha)
       L1 regularization term on weights

   reg_lambda : float (xgb's lambda)
       L2 regularization term on weights

   scale_pos_weight : float
       Balancing of positive and negative weights.

   base_score:
      The initial prediction score of all instances, global bias.

Not all of these paramaters affect accuracy of the model. The relevant once are:

Also it is important to note that we only have one X column so :

   colsample_bytree : float
       Subsample ratio of columns when constructing each tree.

   colsample_bylevel : float
       Subsample ratio of columns for each level.

   colsample_bynode : float
       Subsample ratio of columns for each split.
       
must all be set to 1 which is there default value.

In [4]:
metrics = ['accuracy', 'recall', 'precision', 'f1', 'roc_auc']
#RFModel = gb.XGBRFClassifier()
#util.cross_validate_scores(RFModel, X_train, y_train, cv=5, metrics=metrics)


First we will test out different objective methods for the XGBClassifier. There are 12 different objective functions
we can define. We will break down this report into 12 seperate testing grounds and tweak each parameter for them.
The 12 are:

reg:squarederror: regression with squared loss. This is the default value

reg:logistic: logistic regression

binary:logistic: logistic regression for binary classification

binary:hinge: hinge loss for binary classification. 

count:poisson –poisson regression for count data

survival:cox: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).

multi:softmax: set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)

rank:pairwise: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized

rank:ndcg: Use LambdaMART to perform list-wise ranking where Normalized Discounted Cumulative Gain (NDCG) is maximized

rank:map: Use LambdaMART to perform list-wise ranking where Mean Average Precision (MAP) is maximized

reg:gamma: gamma regression with log-link. 

reg:tweedie: Tweedie regression with log-link. 



First we will test the baseline for each objective. Then we will break down each objective and do furthur testing.


In [10]:
SquaredError = gb.XGBClassifier(objective = "reg:squarederror") 
Logistic = gb.XGBClassifier(objective = "reg:logistic") 
BinaryLogistic = gb.XGBClassifier(objective = "binary:logistic") 
BinaryHinge = gb.XGBClassifier(objective = "binary:hinge") 
CountPoisson = gb.XGBClassifier(objective = "count:poisson") 
SurvivalCox = gb.XGBClassifier(objective = "survival:cox") 
MultiSoftMax = gb.XGBClassifier(objective = "multi:softmax", num_class = 2) 
RankPairWise = gb.XGBClassifier(objective = "rank:pairwise") 
RankNDCG = gb.XGBClassifier(objective = "rank:ndcg") 
RankMap = gb.XGBClassifier(objective = "rank:map") 
RegGamma = gb.XGBClassifier(objective = "reg:gamma") 
RegTweedie = gb.XGBClassifier(objective = "reg:tweedie") 

print("Regression Squared Error")
util.cross_validate_scores(SquaredError, X_train, y_train, cv=5, metrics=metrics)

print("Logistical regression")
util.cross_validate_scores(Logistic, X_train, y_train, cv=5, metrics=metrics)

print("Logistical regression for binary classification")
util.cross_validate_scores(BinaryLogistic, X_train, y_train, cv=5, metrics=metrics)

print("Binary classification using hinge loss optimization")
util.cross_validate_scores(BinaryHinge, X_train, y_train, cv=5, metrics=metrics)

print("Poisson regression for count data")
util.cross_validate_scores(CountPoisson, X_train, y_train, cv=5, metrics=metrics)

print("Survival Cox Regression")
util.cross_validate_scores(SurvivalCox, X_train, y_train, cv=5, metrics=metrics)

print("Classification by optimizing softmax objective")
util.cross_validate_scores(MultiSoftMax, X_train, y_train, cv=5, metrics=metrics)

print("Rank Pairwise")
util.cross_validate_scores(RankPairWise, X_train, y_train, cv=5, metrics=metrics)

print("Rank ndcg")
util.cross_validate_scores(RankNDCG, X_train, y_train, cv=5, metrics=metrics)

print("Rank map")
util.cross_validate_scores(RankMap, X_train, y_train, cv=5, metrics=metrics)

print("Regression gamma")
util.cross_validate_scores(RegGamma, X_train, y_train, cv=5, metrics=metrics)

print("Regression tweedie")
util.cross_validate_scores(RegTweedie, X_train, y_train, cv=5, metrics=metrics)


Regression Squared Log Error


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.4min remaining:  3.7min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.5min finished


Training scores
accuracy: 0.8473 (0.0002)
recall: 0.9784 (0.0002)
precision: 0.8541 (0.0002)
f1: 0.9121 (0.0001)
roc_auc: 0.8560 (0.0004)

Validation Scores
accuracy: 0.8436 (0.0013)
recall: 0.9772 (0.0009)
precision: 0.8514 (0.0009)
f1: 0.9100 (0.0008)
roc_auc: 0.8429 (0.0010)
Logistical regression


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.5min remaining:  3.7min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.6min finished


Training scores
accuracy: 0.8509 (0.0002)
recall: 0.9723 (0.0002)
precision: 0.8613 (0.0002)
f1: 0.9134 (0.0001)
roc_auc: 0.8592 (0.0004)

Validation Scores
accuracy: 0.8463 (0.0012)
recall: 0.9700 (0.0014)
precision: 0.8584 (0.0005)
f1: 0.9108 (0.0007)
roc_auc: 0.8468 (0.0014)
Logistical regression for binary classification


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.6min remaining:  3.9min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.6min finished


Training scores
accuracy: 0.8509 (0.0002)
recall: 0.9723 (0.0002)
precision: 0.8613 (0.0002)
f1: 0.9134 (0.0001)
roc_auc: 0.8592 (0.0004)

Validation Scores
accuracy: 0.8463 (0.0012)
recall: 0.9700 (0.0014)
precision: 0.8584 (0.0005)
f1: 0.9108 (0.0007)
roc_auc: 0.8468 (0.0014)
Binary classification using hinge loss optimization


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.5min remaining:  3.7min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.6min finished


Training scores
accuracy: 0.8245 (0.0007)
recall: 0.9977 (0.0001)
precision: 0.8230 (0.0006)
f1: 0.9020 (0.0003)
roc_auc: 0.5443 (0.0019)

Validation Scores
accuracy: 0.8229 (0.0014)
recall: 0.9972 (0.0006)
precision: 0.8219 (0.0013)
f1: 0.9011 (0.0007)
roc_auc: 0.5409 (0.0039)
Poisson regression for count data


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.5min remaining:  3.8min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.6min finished


Training scores
accuracy: 0.8457 (0.0003)
recall: 0.9789 (0.0002)
precision: 0.8523 (0.0003)
f1: 0.9112 (0.0001)
roc_auc: 0.8521 (0.0002)

Validation Scores
accuracy: 0.8425 (0.0012)
recall: 0.9776 (0.0011)
precision: 0.8502 (0.0008)
f1: 0.9094 (0.0007)
roc_auc: 0.8391 (0.0017)
Survival Cox Regression


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.3min remaining:  3.5min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.4min finished


Training scores
accuracy: 0.8090 (0.0000)
recall: 1.0000 (0.0000)
precision: 0.8090 (0.0000)
f1: 0.8944 (0.0000)
roc_auc: 0.5111 (0.0107)

Validation Scores
accuracy: 0.8090 (0.0000)
recall: 1.0000 (0.0000)
precision: 0.8090 (0.0000)
f1: 0.8944 (0.0000)
roc_auc: 0.5164 (0.0140)
Classification by optimizing softmax objective


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  5.0min remaining:  7.4min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  5.0min finished


Training scores
accuracy: 0.8509 (0.0002)
recall: 0.9722 (0.0001)
precision: 0.8613 (0.0002)
f1: 0.9134 (0.0001)
roc_auc: 0.6545 (0.0007)

Validation Scores
accuracy: 0.8462 (0.0014)
recall: 0.9698 (0.0013)
precision: 0.8584 (0.0007)
f1: 0.9107 (0.0008)
roc_auc: 0.6462 (0.0020)
Rank Pairwise


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.6min remaining:  3.9min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.7min finished


Training scores
accuracy: 0.7469 (0.0007)
recall: 0.7395 (0.0010)
precision: 0.9339 (0.0002)
f1: 0.8254 (0.0006)
roc_auc: 0.8437 (0.0003)

Validation Scores
accuracy: 0.7409 (0.0014)
recall: 0.7355 (0.0015)
precision: 0.9295 (0.0009)
f1: 0.8212 (0.0010)
roc_auc: 0.8343 (0.0012)
Rank ndcg


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:   48.6s remaining:  1.2min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:   50.1s finished


Training scores
accuracy: 0.1910 (0.0000)
recall: 0.0000 (0.0000)
precision: 0.0000 (0.0000)
f1: 0.0000 (0.0000)
roc_auc: 0.5000 (0.0000)

Validation Scores
accuracy: 0.1910 (0.0000)
recall: 0.0000 (0.0000)
precision: 0.0000 (0.0000)
f1: 0.0000 (0.0000)
roc_auc: 0.5000 (0.0000)
Rank map


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:   44.5s remaining:  1.1min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:   45.7s finished


Training scores
accuracy: 0.1910 (0.0000)
recall: 0.0000 (0.0000)
precision: 0.0000 (0.0000)
f1: 0.0000 (0.0000)
roc_auc: 0.5000 (0.0000)

Validation Scores
accuracy: 0.1910 (0.0000)
recall: 0.0000 (0.0000)
precision: 0.0000 (0.0000)
f1: 0.0000 (0.0000)
roc_auc: 0.5000 (0.0000)
Regression gamma


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.5min remaining:  3.7min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.6min finished


Training scores
accuracy: 0.8498 (0.0004)
recall: 0.9658 (0.0007)
precision: 0.8644 (0.0003)
f1: 0.9123 (0.0003)
roc_auc: 0.8482 (0.0002)

Validation Scores
accuracy: 0.8442 (0.0010)
recall: 0.9626 (0.0015)
precision: 0.8612 (0.0007)
f1: 0.9091 (0.0006)
roc_auc: 0.8370 (0.0020)
Regression tweedie


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   5 | elapsed:  2.5min remaining:  3.7min
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  2.6min finished


Training scores
accuracy: 0.8479 (0.0002)
recall: 0.9730 (0.0004)
precision: 0.8580 (0.0002)
f1: 0.9119 (0.0001)
roc_auc: 0.8505 (0.0004)

Validation Scores
accuracy: 0.8438 (0.0012)
recall: 0.9711 (0.0014)
precision: 0.8554 (0.0006)
f1: 0.9096 (0.0007)
roc_auc: 0.8386 (0.0014)


In [27]:
AbsoluteBaseline = gb.XGBClassifier() 

# Mean Squared Error Objective 1



In [28]:
parametersMSE = {'objective':['reg:squarederror'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfMSE = RandomizedSearchCV(AbsoluteBaseline, parametersMSE, n_jobs=5, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfMSE.fit( X_train,y_train)

print(clfMSE.cv_results_['params'][clfMSE.best_index_])

[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.


Fitting 3 folds for each of 10 candidates, totalling 30 fits


[Parallel(n_jobs=5)]: Done  30 out of  30 | elapsed: 25.8min finished


RandomizedSearchCV(cv=StratifiedKFold(n_splits=3, random_state=None, shuffle=False),
                   error_score='raise-deprecating',
                   estimator=XGBClassifier(base_score=0.5, booster='gbtree',
                                           colsample_bylevel=1,
                                           colsample_bynode=1,
                                           colsample_bytree=1, gamma=0,
                                           learning_rate=0.1, max_delta_step=0,
                                           max_depth=3, min_child_weight=1,
                                           missing=None, n_estimators=100,
                                           n_jobs=1, nthread=N...
                                                          0.8, 0.9],
                                        'max_delta_step': [0],
                                        'max_depth': [1, 3, 5, 7, 9, 11],
                                        'min_child_weight': [0, 5, 10, 15],
        

{'scale_pos_weight': 0.3, 'reg_lambda': 0.6, 'reg_alpha': 0.001, 'objective': 'reg:squarederror', 'n_estimators': 100, 'min_child_weight': 15, 'max_depth': 3, 'max_delta_step': 0, 'learning_rate': 0.1, 'gamma': 15}


# Logistical regression for binary classification Objective 3
This is the exact same for objective

In [30]:
parametersLRB = {'objective':['binary:logistic'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfLRB = RandomizedSearchCV(AbsoluteBaseline, parametersLRB, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfLRB.fit( X_train,y_train)

print(clfLRB.cv_results_['params'][clfLRB.best_index_])

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 3 folds for each of 10 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed: 15.6min finished


RandomizedSearchCV(cv=StratifiedKFold(n_splits=3, random_state=None, shuffle=False),
                   error_score='raise-deprecating',
                   estimator=XGBClassifier(base_score=0.5, booster='gbtree',
                                           colsample_bylevel=1,
                                           colsample_bynode=1,
                                           colsample_bytree=1, gamma=0,
                                           learning_rate=0.1, max_delta_step=0,
                                           max_depth=3, min_child_weight=1,
                                           missing=None, n_estimators=100,
                                           n_jobs=1, nthread=N...
                                                          0.8, 0.9],
                                        'max_delta_step': [0],
                                        'max_depth': [1, 3, 5, 7, 9, 11],
                                        'min_child_weight': [0, 5, 10, 15],
        

{'scale_pos_weight': 0.3, 'reg_lambda': 0, 'reg_alpha': 0.005, 'objective': 'binary:logistic', 'n_estimators': 100, 'min_child_weight': 15, 'max_depth': 3, 'max_delta_step': 0, 'learning_rate': 0.2, 'gamma': 5}


# Hinge loss for binary classificative Objective 4

$\ell(y) = \max(0, 1-t \cdot y)$

In [None]:
parametersHL = {'objective':['binary:hinge'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfHL = RandomizedSearchCV(AbsoluteBaseline, parametersHL, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfHL.fit( X_train,y_train)

print(clfHL.cv_results_['params'][clfHL.best_index_])

# Poisson regression for count data

In [None]:
parametersPRC = {'objective':['count:poisson'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfPRC = RandomizedSearchCV(AbsoluteBaseline, parametersPRC, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfPRC.fit( X_train,y_train)

print(clfPRC.cv_results_['params'][clfPRC.best_index_])

# Survival Cox

In [None]:
parametersSC = {'objective':['survival:cox'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfSC = RandomizedSearchCV(AbsoluteBaseline, parametersPRC, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfSC.fit( X_train,y_train)

print(clfSC.cv_results_['params'][clfSC.best_index_])

# multi:softmax

In [None]:
parametersMSM = {'objective':['multi:softmax'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9],
              'num_class':[2]}
clfMSM = RandomizedSearchCV(AbsoluteBaseline, parametersMSM, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfMSM.fit( X_train,y_train)

print(clfMSM.cv_results_['params'][clfMSM.best_index_])

# rank pairwise

In [None]:


parametersRPW = {'objective':['rank:pairwise'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfRPW = RandomizedSearchCV(AbsoluteBaseline, parametersRPW, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfRPW.fit( X_train,y_train)

print(clfRPW.cv_results_['params'][clfRPW.best_index_])

# rank ndcg

In [None]:

parametersNDCG = {'objective':['rank:ndcg'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfNDCG = RandomizedSearchCV(AbsoluteBaseline, parametersNDCG, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfNDCG.fit( X_train,y_train)

print(clfNDCG.cv_results_['params'][clfNDCG.best_index_])

# rank map

In [None]:
parametersMAP = {'objective':['rank:map'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfMAP = RandomizedSearchCV(AbsoluteBaseline, parametersMAP, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfMAP.fit( X_train,y_train)

print(clfMAP.cv_results_['params'][clfMAP.best_index_])

print(clf)

# reg gamma

In [None]:
parametersGam = {'objective':['reg:gamma'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfGam = RandomizedSearchCV(AbsoluteBaseline, parametersGam, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfGam.fit( X_train,y_train)

print(clfGam.cv_results_['params'][clfGam.best_index_])

# reg tweedie

In [None]:
parametersTwd = {'objective':['reg:tweedie'],
              'learning_rate': [x * 0.1 for x in range(0, 10)], #so called `eta` value
              'max_depth': [i for i in range(1,12,2)],
              'min_child_weight': [i for i in range(0,20,5)],
              'n_estimators': [100],
              'gamma' : [i for i in range(0,20,5)],
              'max_delta_step':[0],
              'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05],
              'reg_lambda':[0, 0.1, 0.3, 0.6, 9],
              'scale_pos_weight':[0, 0.1, 0.3, 0.6, 9]
             }
clfTwd = RandomizedSearchCV(AbsoluteBaseline, parametersTwd, n_jobs=-1, 
                   cv=StratifiedKFold(), 
                   scoring='roc_auc',
                   verbose=2, refit=True)

clfTwd.fit( X_train,y_train)

print(clfTwd.cv_results_['params'][clfTwd.best_index_])