<font color="blue" size="5px"> This is the modelling component of the final project for the internship with Data Glacier. We will explore various models. ROC-AUC will be used as the main scoring metric, since the classes in the target are somewhat imbalanced </font>. 

<font color="red" size="5px"> Let's begin by importing each dataset, shuffling and defining new dataframes: </font>

Import, shuffle and reset index:

In [1]:
import pandas as pd
import numpy as np
df_modified=pd.read_csv(r'df_modified.csv', engine='python').sample(frac=1).reset_index(drop=True) #cleaned dataset, with some rows (with outliers) removed
df_modified2= pd.read_csv(r'df_modified_all_rows.csv', engine='python').sample(frac=1).reset_index(drop=True) #cleaned dataset but without rows (with outliers) removed - has been created for the modelling.

In [2]:
print(df_modified.shape,df_modified2.shape)

(1215, 115) (3424, 115)


Let's create a dataframes with the 6 best features (found in weeks 8-9) and the target variable:

In [3]:
data_best = [df_modified["Dexa_During_Rx_Y"] , df_modified["Comorb_Encounter_For_Screening_For_Malignant_Neoplasms_Y"], df_modified["Comorb_Encounter_For_Immunization_Y"] , df_modified["Comorb_Encntr_For_General_Exam_W_O_Complaint,_Susp_Or_Reprtd_Dx_Y"] , df_modified["Comorb_Long_Term_Current_Drug_Therapy_Y"] , df_modified["Concom_Viral_Vaccines_Y"], df_modified["Persistency_Flag_Persistent"]]
headers = ["Dexa_During_Rx_Y" , "Comorb_Encounter_For_Screening_For_Malignant_Neoplasms_Y", "Comorb_Encounter_For_Immunization_Y" ,"Comorb_Encntr_For_General_Exam_W_O_Complaint,_Susp_Or_Reprtd_Dx_Y" , "Comorb_Long_Term_Current_Drug_Therapy_Y" , "Concom_Viral_Vaccines_Y", "Persistency_Flag_Persistent"]
df_best=pd.concat(data_best, axis=1, keys=headers)

data_best2 = [df_modified2["Dexa_During_Rx_Y"] , df_modified2["Comorb_Encounter_For_Screening_For_Malignant_Neoplasms_Y"], df_modified2["Comorb_Encounter_For_Immunization_Y"] , df_modified2["Comorb_Encntr_For_General_Exam_W_O_Complaint,_Susp_Or_Reprtd_Dx_Y"] , df_modified2["Comorb_Long_Term_Current_Drug_Therapy_Y"] , df_modified2["Concom_Viral_Vaccines_Y"], df_modified2["Persistency_Flag_Persistent"]]
headers = ["Dexa_During_Rx_Y" , "Comorb_Encounter_For_Screening_For_Malignant_Neoplasms_Y", "Comorb_Encounter_For_Immunization_Y" ,"Comorb_Encntr_For_General_Exam_W_O_Complaint,_Susp_Or_Reprtd_Dx_Y" , "Comorb_Long_Term_Current_Drug_Therapy_Y" , "Concom_Viral_Vaccines_Y", "Persistency_Flag_Persistent"]
df_best2=pd.concat(data_best2, axis=1, keys=headers)

Let's create a way to quickly evaluate a model:

In [4]:
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, roc_auc_score
from sklearn.model_selection import cross_val_score

def evaluate():
    print('Precision score is {}'.format(precision_score(y_test,preds, average='macro'))) #use macro scoring since classes are imbalanced
    print('Recall score is {}'.format(recall_score(y_test,preds, average='macro'))) #use macro scoring since classes are imbalanced
    print('F1 score is {}'.format(f1_score(y_test,preds, average='macro'))) #use macro scoring since classes are imbalanced
    print('Accuracy score is {}'.format(accuracy_score(y_test,preds))) 
    print('ROC_AUC score is {}'.format(roc_auc_score(y_test,preds))) 


<font color="red" size="5px"> We will test each dataset on a basic logistic regression model and choose the best dataset for upcoming modelling </font>

First let's try using the cleaned data, without outliers removed, and with all of the features with a basic model:

In [5]:
from sklearn.model_selection import train_test_split
#let the feature dataframe contain every column of df, except the value we are predicting,Persistency_Flag_Persistent
X=df_modified2.loc[:,df_modified2.columns!="Persistency_Flag_Persistent"]
#let the target array contain only the value we are predicting, Persistency_Flag_Persistent
# y=df.loc[:,df.columns=="heart_disease"].values.ravel()
y=df_modified2.loc[:,df_modified2.columns=="Persistency_Flag_Persistent"].values.ravel()
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.30,random_state=123)  


Let's create an evaluate a simple logistic regression model:

In [6]:
from sklearn.linear_model import LogisticRegression
logreg=LogisticRegression(max_iter=1000000)
logreg.fit(X_train,y_train)
preds=logreg.predict(X_test)
evaluate()

Precision score is 0.8039320049927572
Recall score is 0.7944083203933747
F1 score is 0.7984965338177266
Accuracy score is 0.8142023346303502
ROC_AUC score is 0.7944083203933748


Let's try again, but with outliers removed and still using all of the features:

In [7]:
X=df_modified.loc[:,df_modified.columns!="Persistency_Flag_Persistent"]
y=df_modified.loc[:,df_modified.columns=="Persistency_Flag_Persistent"].values.ravel()
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.30,random_state=123)   
logreg=LogisticRegression(max_iter=1000000)
logreg.fit(X_train,y_train)
preds=logreg.predict(X_test)
evaluate()

Precision score is 0.8377358490566038
Recall score is 0.8142997612024161
F1 score is 0.8243296010751808
Accuracy score is 0.8547945205479452
ROC_AUC score is 0.8142997612024161


We used SelectKBest in weeks 8-9 to find the best 6 features. Let's test the dataframe with just the 6 best features and target, and outliers removed:

In [8]:
X=df_best.loc[:,df_best.columns!="Persistency_Flag_Persistent"]
y=df_best.loc[:,df_best.columns=="Persistency_Flag_Persistent"].values.ravel()
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.3,random_state=123)   
logreg=LogisticRegression(max_iter=1000000)
logreg.fit(X_train,y_train)
preds=logreg.predict(X_test)
evaluate()

Precision score is 0.8393908125047772
Recall score is 0.8118591094254811
F1 score is 0.8233478526879069
Accuracy score is 0.8547945205479452
ROC_AUC score is 0.8118591094254812


Let's test the dataframe with just the 6 best features and target, and outliers included:

In [9]:
X=df_best2.loc[:,df_best2.columns!="Persistency_Flag_Persistent"]
y=df_best2.loc[:,df_best2.columns=="Persistency_Flag_Persistent"].values.ravel()
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.30,random_state=123) 

logreg.fit(X_train,y_train)
preds=logreg.predict(X_test)
evaluate()

Precision score is 0.7926569198160438
Recall score is 0.7730250388198758
F1 score is 0.7802078607447734
Accuracy score is 0.8005836575875487
ROC_AUC score is 0.7730250388198757


The df_best dataset seemed to be the best overall to me, based on the evaluation metrics (e.g., high accuracy), and also, there are only 6 features and lower number of rows, so using this data would be computationally efficient, especially since we will be doing some grid searching.We will use this dataset in all future modelling. Also, only needing 6 features to predict persistency may be very useful for the company.

In [10]:
df_best.head()

Unnamed: 0,Dexa_During_Rx_Y,Comorb_Encounter_For_Screening_For_Malignant_Neoplasms_Y,Comorb_Encounter_For_Immunization_Y,"Comorb_Encntr_For_General_Exam_W_O_Complaint,_Susp_Or_Reprtd_Dx_Y",Comorb_Long_Term_Current_Drug_Therapy_Y,Concom_Viral_Vaccines_Y,Persistency_Flag_Persistent
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,1.0,1.0,0.0,0.0,1.0
2,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,1.0,1.0,1.0,1.0,0.0,0.0,1.0
4,0.0,1.0,1.0,0.0,0.0,0.0,0.0


<font color="red" size="5px"> Now that we have chosen the best dataset to use, let's try to optimise our logistic regression model    </font>

Let's check the base model's cross-val scores:

In [11]:
X=df_best.loc[:,df_best.columns!="Persistency_Flag_Persistent"]
y=df_best.loc[:,df_best.columns=="Persistency_Flag_Persistent"].values.ravel()
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.3,random_state=123)  

logreg=LogisticRegression(max_iter=1000000)
scores = cross_val_score(logreg, X, y, cv=10, scoring='roc_auc')#
print('Cross-validation scores are: {}'.format(scores))
scores_mean=np.mean(scores)
print('The mean cross-validation score is:{}'.format(scores_mean))

Cross-validation scores are: [0.86915739 0.83767886 0.85961844 0.89856916 0.84680451 0.83526384
 0.84604247 0.90315315 0.87467825 0.82915058]
The mean cross-validation score is:0.8600116646943272


It looks like the base model is good at generalising to new data.

Let's optimise the logistic regression model on df_best (uncomment to run):

In [12]:
# from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
# from sklearn.model_selection import StratifiedKFold

# logreg = LogisticRegression(max_iter=1000000,
#                      n_jobs=-1)

# params={"C":[0.001, 0.01, 0.1, 1, 10, 100, 1000] }




# skf = StratifiedKFold(n_splits=10, shuffle = False) # make sure class balance in each fold is the same as in the orginal dataset

# grid_search=GridSearchCV(logreg,params,cv=skf.split(X_train,y_train),scoring='roc_auc')


# grid_search.fit(X_train, y_train)

# #print(grid_search.cv_results_)
# print(grid_search.best_score_, grid_search.best_estimator_)


We now have a logistic regression model with optimal parameters:

In [13]:
logreg_new=LogisticRegression(C=1, max_iter=1000000, n_jobs=-1)

Let's use the best parameters (i.e., the ones yielding the highest average cross-validation score, 0.867) from grid search:

In [14]:
logreg_new=LogisticRegression(C=1, max_iter=1000000, n_jobs=-1).fit(X_train,y_train)
preds=logreg_new.predict(X_test)
evaluate()

Precision score is 0.8393908125047772
Recall score is 0.8118591094254811
F1 score is 0.8233478526879069
Accuracy score is 0.8547945205479452
ROC_AUC score is 0.8118591094254812


<font color="red" size="5px"> Let's create an XGB classifier and find the best hyperparameters: </font>

Let's create an XGB Classifier with mostly default paramters:

In [15]:
from xgboost import XGBClassifier
xgb=XGBClassifier(eval_metric='mlogloss',use_label_encoder=False)
xgb.fit(X_train,y_train)

#check cross val scores:

scores = cross_val_score(xgb, X, y, cv=10, scoring='roc_auc')#
print('Cross-validation scores are: {}'.format(scores))
scores_mean=np.mean(scores)
print('The mean cross-validation score is:{}'.format(scores_mean))

preds=xgb.predict(X_test)
evaluate()

Cross-validation scores are: [0.84721781 0.83926868 0.84213037 0.87408585 0.87969925 0.8465251
 0.85505148 0.89993565 0.88143501 0.83011583]
The mean cross-validation score is:0.8595465013886067
Precision score is 0.8207868773562204
Recall score is 0.7808856580980474
F1 score is 0.7959069559382689
Accuracy score is 0.8356164383561644
ROC_AUC score is 0.7808856580980477


We ran grid searches with various parameters ranges and attempted to find the best set of parameters (uncomment to run):

In [16]:
# from sklearn.model_selection import GridSearchCV
# from sklearn.model_selection import StratifiedKFold

# xgb = XGBClassifier(#learning_rate=0.03, 
#     #n_estimators=1000, objective='binary:logistic',
#                      n_jobs=-1, use_label_encoder=False)

# params = {
#              'learning_rate':[0.3],
#              'n_estimators':[50,100], 
#              'min_child_weight': [1, 5, 10],
#              'gamma': [0, 1, 2],
#              'subsample': [0.6, 0.8, 1.0],
#              'colsample_bytree': [0.6, 0.8, 1.0],
#              'max_depth': [3,4]
#         }



# skf = StratifiedKFold(n_splits=10, shuffle = False)

# grid_search = GridSearchCV(xgb, param_grid=params,  
#                              scoring='roc_auc', n_jobs=-1, cv=skf.split(X_train,y_train), verbose=3
#                             )
# grid_search.fit(X_train, y_train)

# #print(grid_search.cv_results_)

# print(grid_search.best_score_, grid_search.best_estimator_)

We shall use the best parameters (those yielding the highest average cross val score(0.870)) from grid search:

In [17]:
xgb_new=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=0.8, gamma=2, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.3, max_delta_step=0, max_depth=4,
              min_child_weight=5, #missing=nan,
              monotone_constraints='()',
              n_estimators=100, n_jobs=-1, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=0.6,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=None,
                     eval_metric='mlogloss' #specify eval metric to avoid warnings 
                     )

xgb_new.fit(X_train,y_train)

#test on test data now:
preds=xgb_new.predict(X_test)
evaluate()

Precision score is 0.8420103288524341
Recall score is 0.8162838881865431
F1 score is 0.8271654403729876
Accuracy score is 0.8575342465753425
ROC_AUC score is 0.8162838881865431


<font color="red" size="5px">  Let's create a neural network and optimise its hyperparameters: </font>

Import necessary packages:

In [42]:
from keras.models import Sequential
from keras.layers import Dense
from keras import metrics
import tensorflow
from keras.wrappers.scikit_learn import KerasClassifier

Define the network architecture and wrapper for KerasClassifier:

In [43]:
def create_nn(optimizer='uniform', init='adam'):
    nn = Sequential()
    nn.add(Dense(5, input_dim=X_train.shape[1], activation='relu')) #let's use 2/3 size of input layer + size of output layer for number of nodes
    nn.add(Dense(5, activation='relu'))
    nn.add(Dense(1, activation='relu'))
    nn.compile(loss='binary_crossentropy', optimizer='adam', 
               #metrics='roc-auc'
              )
    return nn

In [44]:
nn=KerasClassifier(build_fn=create_nn, verbose=0)

We used grid search to search for optimal parameters (we used various ranges and narrowed the parameters down). Uncomment to run:

In [45]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
optimizers = [#'rmsprop',
    'adam'
]
init = [#'glorot_uniform'
    #'normal',
         'uniform'
]
epochs = [150]
batches = [5,10]

params = dict(optimizer=optimizers, epochs=epochs, 
                  batch_size=batches,
                  init=init)

skf = StratifiedKFold(n_splits=10, shuffle = False)

grid_search = GridSearchCV(estimator=nn, param_grid=params, n_jobs=-1, cv=skf.split(X_train,y_train))
grid_search = grid_search.fit(X_train.values, y_train)



Let's use the params which acheived the best average cross-validation score:

In [51]:
nn_new=KerasClassifier(build_fn=create_nn, verbose=0,batch_size=5, epochs=150, init='uniform', optimizer='adam')
nn_new._estimator_type="classifier"

Train the model:

In [52]:
nn_new.fit(X_train,y_train)

<keras.callbacks.History at 0x2ce26203340>

In [53]:
import keras

In [56]:
nn_new.save('neural_network_best_model.h5')

AttributeError: 'KerasClassifier' object has no attribute 'save'

Predict and evaluate:

In [25]:
preds = nn_new.predict(X_test)
evaluate()



Precision score is 0.8478020240354207
Recall score is 0.8089619328557381
F1 score is 0.8241680871563033
Accuracy score is 0.8575342465753425
ROC_AUC score is 0.8089619328557381


Best: 0.793230 using {'batch_size': 20, 'epochs': 100, 'init': 'uniform', 'optimizer': 'adam'}

<font color="red" size="5px"> Let's create a random forest and find the best hyperparameters: </font>



Let's create a random forest classifier with default settings and check the cross val scores:

In [26]:
from sklearn.ensemble import RandomForestClassifier

rf=RandomForestClassifier()

scores = cross_val_score(rf, X, y, cv=10, scoring='roc_auc')#
print('Cross-validation scores are: {}'.format(scores))
scores_mean=np.mean(scores)
print('The mean cross-validation score is:{}'.format(scores_mean))

Cross-validation scores are: [0.83736089 0.83958665 0.83863275 0.86677266 0.87750627 0.8503861
 0.85456885 0.90250965 0.88240026 0.83011583]
The mean cross-validation score is:0.8579839901821327


Train, predict and evaluate:

In [27]:
rf.fit(X_train,y_train)
preds=rf.predict(X_test)
evaluate()

Precision score is 0.8207868773562204
Recall score is 0.7808856580980474
F1 score is 0.7959069559382689
Accuracy score is 0.8356164383561644
ROC_AUC score is 0.7808856580980477


In [28]:
# # from sklearn.model_selection import GridSearchCV
# # from sklearn.model_selection import StratifiedKFold


# params= {'bootstrap': [True],
#  'max_depth': [11],
#  'max_features': ['auto'],
#  'min_samples_leaf': [7,8],
#  'min_samples_split': [9,10],
#  'n_estimators': [1000]}

# skf = StratifiedKFold(n_splits=10, shuffle = False)
# grid_search = GridSearchCV(estimator=rf, param_grid=params, n_jobs=-1, scoring='roc_auc' ,cv=skf.split(X_train,y_train))
# grid_search = grid_search.fit(X_train, y_train)

# print(grid_search.best_score_,grid_search.best_params_)
# #print(grid_search.cv_results_)
# #grid_search.best_estimator_

The best average cross-val score was 0.864. Let's use the estimator which achieved this score:

In [29]:
rf_new=RandomForestClassifier(**{'bootstrap': True, 'max_depth': 11, 'max_features': 'auto', 'min_samples_leaf': 8, 'min_samples_split': 9, 'n_estimators': 1000})


Train, predict and evaluate:

In [30]:
rf_new.fit(X_train,y_train)
preds=rf_new.predict(X_test)
evaluate()

Precision score is 0.8426470588235294
Recall score is 0.7863815142576205
F1 score is 0.8059542796384902
Accuracy score is 0.8465753424657534
ROC_AUC score is 0.7863815142576204


<font color="red" size="5px">Let's create a K nearest neighbors classifier and find the best value for K: </font>

Let's define a KNN with default parameters, and check the cross-validation scores:

In [31]:
from sklearn.neighbors import  KNeighborsClassifier
knn=KNeighborsClassifier()

scores = cross_val_score(knn, X, y, cv=10, scoring='roc_auc')#
print('Cross-validation scores are: {}'.format(scores))
scores_mean=np.mean(scores)
print('The mean cross-validation score is:{}'.format(scores_mean))

Cross-validation scores are: [0.83100159 0.76852146 0.82670906 0.84244833 0.85714286 0.83542471
 0.82239382 0.85987773 0.86985199 0.79456242]
The mean cross-validation score is:0.8307933984404572


cross-val scores are quite close together, except for the 0.63 score.

let's train the knn and predict and evaluate on the test data:

In [32]:
knn.fit(X_train,y_train)
preds=knn.predict(X_test)
acc_score=accuracy_score(y_test,preds)
evaluate()

Precision score is 0.7877799882110227
Recall score is 0.7743187245399634
F1 score is 0.7803199755661555
Accuracy score is 0.8164383561643835
ROC_AUC score is 0.7743187245399634


Let's create a grid search to find the best value of K (uncomment to run)

In [33]:
# # from sklearn.model_selection import GridSearchCV
# # from sklearn.model_selection import StratifiedKFold

# params = {
#     'n_neighbors':n_neigh
# }

# # skf = StratifiedKFold(n_splits=10, shuffle = False)
#n_neigh=[i for i in range(1,33) if i%2!=0]
# grid_search = GridSearchCV(estimator=knn, param_grid=params, n_jobs=-1, scoring='roc_auc' ,cv=skf.split(X_train,y_train))
# grid_search = grid_search.fit(X_train, y_train)

# print(grid_search.best_score_,grid_search.best_params_)
# #print(grid_search.cv_results_)
# #grid_search.best_estimator_

let's use the parameters with the highest average cross-val score (0.853) and predict and evalute on the test data:

In [34]:
knn_new=KNeighborsClassifier(n_neighbors=31)
preds=knn.predict(X_test)
evaluate()

Precision score is 0.7877799882110227
Recall score is 0.7743187245399634
F1 score is 0.7803199755661555
Accuracy score is 0.8164383561643835
ROC_AUC score is 0.7743187245399634


<font color="red" size="5px">Let's use stacking: </font>

Let's use the top 3 estimators (from the code blocks above):

In [35]:
xgb_new=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=0.8, gamma=2, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.3, max_delta_step=0, max_depth=4,
              min_child_weight=5, #missing=nan,
              monotone_constraints='()',
              n_estimators=100, n_jobs=-1, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=0.6,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=None,
                     eval_metric='mlogloss' #specify eval metric to avoid warnings 
                     )

nn_new=KerasClassifier(build_fn=create_nn, verbose=0,batch_size=5, epochs=150, init='uniform', optimizer='adam')
nn_new._estimator_type="classifier"

rf_new=RandomForestClassifier(**{'bootstrap': True, 'max_depth': 11, 'max_features': 'auto', 'min_samples_leaf': 8, 'min_samples_split': 9, 'n_estimators': 1000})

Define the splitting method and declare the estimators:

In [36]:
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=10, shuffle = False)
estimators=[('xgb_new', xgb_new), ('nn_new', nn_new), ('rf_new',rf_new)]

Define the stacking model:

In [37]:
sc = StackingClassifier(
    estimators=estimators,
    cv=skf.split(X_train,y_train))    

Fit to the training data:

In [38]:
sc.fit(X_train,y_train)

KeyboardInterrupt: 

Predict and evaluate:

In [None]:
preds=sc.predict(X_test)
evaluate()

<font color="red" size="5px">Let's use voting: </font>

Let's define the voting classifier model. The neural network performed well so we will give it a relatively high weight:

In [None]:
from sklearn.ensemble import VotingClassifier
vc = VotingClassifier(estimators= [('xgb_new', xgb_new),
                                   ('nn_new', nn_new), 
                                   ('rf_new',rf_new)], 
                      voting='soft', #vote based on probabilities rather than majority vote of classes
                      weights=[1,2,1] #assign weights: xgboost performed very well so we'll give it a relatively high weight
                     )

Train the model:

In [None]:
vc.fit(X_train,y_train)

Predict and evaluate:

In [None]:
preds=vc.predict(X_test)
evaluate()

<font color="green" size="5px">Which model is best? </font>

The XGB classifier (xgb_new) was definitely a good model, with solid scores overall, and some relatively high precision and recall scores. The default model for xgb boost had a good cross-validation score and this wasn't found to change significantly after parameter tuning.  The neural network (nn_new) performed slightly better (or the same) compared to xgb_new in general. The random forest (rf_new) also performed well but was slighly worse than the other models. Both the voting classifer (vc) and stacking classifer (sc) performed well in general but weren't better than nn_new. Overall the best model is nn_new.

These are the evaluation metrics for nn_new on the test data when I ran the code (we found these earlier):    

In [None]:
print("Precision score 0.8175742386268702")
print("Recall score is 0.777711363485422")
print("F1 score is 0.7911991199119912")
print("Accuracy score is 0.821917808219178")
print("ROC_AUC score is 0.777711363485422")

In [None]:
X_train.shape

In [None]:
df_best.head()