# Predict new locations for UBS wealth management branch offices.

Union Bank of Switzerland (UBS) is looking to apply Machine Learning in predicting most likely new locations/zipcodes be opened with UBS wealth management branches in the USA

### keywords: data wrangling, PCA, under-sampling, cross-validation, cross-prediction, LogisticRegression, Naive Bayes, KNN, ensemble learning 

In [28]:
#Import libraries
import numpy as np
import pandas as pd
import random
import copy
from sklearn.model_selection import KFold, cross_val_score,cross_validate
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier


# Data wrangling

In [29]:
#Import original dataset
original_dataset=pd.read_csv('UBS_original_data.csv')
dataset=copy.deepcopy(original_dataset)
dataset.head()

Unnamed: 0,ZipCode,Population,HouseholdsPerZipCode,WhitePopulation,BlackPopulation,HispanicPopulation,AsianPopulation,HawaiianPopulation,IndianPopulation,OtherPopulation,...,DeliveryTotal,PopulationEstimate,LandArea,WaterArea,BoxCount,SFDU,MFDU,CityDeliveryIndicator,MedicareCBSAType,MarketRatingAreaID
0,501,0,0,0,0,0,0,0,0,0,...,1,0,0.0,0.0,0,0,0,N,Metro,8
1,501,0,0,0,0,0,0,0,0,0,...,1,0,0.0,0.0,0,0,0,N,Metro,8
2,544,0,0,0,0,0,0,0,0,0,...,0,0,0.0,0.0,0,0,0,N,Metro,8
3,544,0,0,0,0,0,0,0,0,0,...,0,0,0.0,0.0,0,0,0,N,Metro,8
4,601,18570,6525,17479,663,18486,7,10,113,558,...,5074,11342,64.348,0.309,831,2376,1206,Y,Micro,1


In [30]:
#Remove the rows where the population are 0
dataset = dataset[dataset.Population != 0]
dataset.head()

Unnamed: 0,ZipCode,Population,HouseholdsPerZipCode,WhitePopulation,BlackPopulation,HispanicPopulation,AsianPopulation,HawaiianPopulation,IndianPopulation,OtherPopulation,...,DeliveryTotal,PopulationEstimate,LandArea,WaterArea,BoxCount,SFDU,MFDU,CityDeliveryIndicator,MedicareCBSAType,MarketRatingAreaID
4,601,18570,6525,17479,663,18486,7,10,113,558,...,5074,11342,64.348,0.309,831,2376,1206,Y,Micro,1
5,601,18570,6525,17479,663,18486,7,10,113,558,...,5074,11342,64.348,0.309,831,2376,1206,Y,Micro,1
6,601,18570,6525,17479,663,18486,7,10,113,558,...,5074,11342,64.348,0.309,831,2376,1206,Y,Micro,1
7,601,18570,6525,17479,663,18486,7,10,113,558,...,5074,11342,64.348,0.309,831,2376,1206,Y,Micro,1
8,602,41520,15002,36828,2860,41265,42,32,291,2634,...,11165,24000,30.613,1.717,1502,5420,821,Y,Metro,1


In [31]:
#Reset the row index for later use
dataset.index=range(len(dataset))
#Replace the empty cell with the NaN
for j in range(dataset.columns.size):
    dataset.iloc[:,j].replace(r'\s+', np.nan, regex=True, inplace=True)

In [32]:
#Checking for missing values(NaN) in each feature
feature_nan_value=dataset.isnull().sum()
print(feature_nan_value)

ZipCode                            0
Population                         0
HouseholdsPerZipCode               0
WhitePopulation                    0
BlackPopulation                    0
HispanicPopulation                 0
AsianPopulation                    0
HawaiianPopulation                 0
IndianPopulation                   0
OtherPopulation                    0
MalePopulation                     0
FemalePopulation                   0
PersonsPerHousehold                0
AverageHouseValue                  0
IncomePerHousehold                 0
MedianAge                          0
MedianAgeMale                      0
MedianAgeFemale                    0
CityType                           0
NumberOfBusinesses                 0
NumberOfEmployees                  0
BusinessFirstQuarterPayroll        0
BusinessAnnualPayroll              0
BusinessEmploymentFlag         62853
GrowthRank                         0
GrowingCountiesA                   0
GrowingCountiesB                   0
G

In [33]:
#Drop those features with over 50% of the values being NaN 
dataset.drop(['BusinessEmploymentFlag'],axis=1,inplace=True)

In [34]:
#Fill the nan with its next value for the feature "MedicareCBSAType" 
dataset.MedicareCBSAType.fillna(method='bfill',inplace=True)

In [35]:
#Encode categorical features with dummy variables
dataset=pd.get_dummies(dataset)  

In [36]:
#Group by zip code, and select max value of each group
dataset=dataset.groupby('ZipCode',as_index=False).max()

In [37]:
#Input label/target data
#the zipcodes already opened with branches are labeled with "1", otherwise "0"
dataset_label=pd.read_csv('UBS_data_label.csv')
dataset_label.head()

Unnamed: 0,ZipCode,UBS_Open_Branch
0,1144,1
1,1608,1
2,1945,1
3,1960,1
4,2109,1


In [38]:
#Final clean data with the label
#Final_dataset=pd.merge(dataset,dataset_label.drop(['UBS_Branches'],axis=1), on='ZipCode',how='left')
Final_dataset=pd.merge(dataset,dataset_label, on='ZipCode',how='left')
Final_dataset.UBS_Open_Branch.fillna(0,inplace=True)
Final_dataset.to_csv('UBS_clean_data.csv',index=False)

In [39]:
# Import clean dataset
original_dataset = pd.read_csv('UBS_clean_data.csv')
original_dataset.head()

Unnamed: 0,ZipCode,Population,HouseholdsPerZipCode,WhitePopulation,BlackPopulation,HispanicPopulation,AsianPopulation,HawaiianPopulation,IndianPopulation,OtherPopulation,...,CityType_C,CityType_N,CityType_P,CityType_U,CityType_Z,CityDeliveryIndicator_N,CityDeliveryIndicator_Y,MedicareCBSAType_Metro,MedicareCBSAType_Micro,UBS_Open_Branch
0,601,18570,6525,17479,663,18486,7,10,113,558,...,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
1,602,41520,15002,36828,2860,41265,42,32,291,2634,...,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
2,603,54689,21161,46501,5042,53877,135,35,313,4177,...,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
3,606,6615,2404,5979,371,6575,3,9,35,323,...,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
4,610,29016,10836,24510,2654,28789,57,31,200,2494,...,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0


## Principal Component Analysis (PCA)

In [40]:
# Import StandardScaler
from sklearn.preprocessing import StandardScaler 

In [41]:
# Scale the features and set the values to a new variable
scaler = StandardScaler() 
original_dataset_features = scaler.fit_transform(original_dataset.drop(['ZipCode','UBS_Open_Branch'],axis=1))

  return self.partial_fit(X, y)
  return self.fit(X, **fit_params).transform(X)


In [42]:
# Import PCA class
from sklearn.decomposition import PCA

In [43]:
#Find the suitable number of PCA components 
pca = PCA(random_state=7)
pca.fit(original_dataset_features)
#Get explained variance ratios from PCA using all features
exp_variance = pca.fit(original_dataset_features).explained_variance_ratio_

In [17]:
# Calculate the cumulative explained variance
cum_exp_variance = np.cumsum(exp_variance)

In [18]:
#Determine the number of components by 90% 
#of the cumulative explained variance  
n_components = len(cum_exp_variance[cum_exp_variance<0.9])
print(n_components)

19


In [19]:
#Perform PCA with the chosen number of components and project data onto components
pca = PCA(n_components, random_state=10)
pca.fit(original_dataset_features)
PCA_dataset_features= pca.transform(original_dataset_features)
PCA_dataset_features=pd.DataFrame(PCA_dataset_features)

In [20]:
#Add label and zip code index
PCA_dataset=PCA_dataset_features
PCA_dataset['UBS_Open_Branch']=original_dataset['UBS_Open_Branch']
PCA_dataset['ZipCode']=original_dataset['ZipCode']
#Add prediction coloumn
PCA_dataset['y_pred_prob']=0.0

In [21]:
#Number of records in class 0 and class 1
print('The size of class 1: {}'.format(len(PCA_dataset[PCA_dataset.UBS_Open_Branch == 0])))
print('The size of class 0: {}'.format(len(PCA_dataset[PCA_dataset.UBS_Open_Branch != 0]))) 

The size of class 1: 32695
The size of class 0: 264


## Binary classification problem on a highly imbalanced dataset

In [22]:
# Minority dataset
minority_dataset=PCA_dataset[PCA_dataset.UBS_Open_Branch != 0]
# Majority dataset
majority_dataset = PCA_dataset[PCA_dataset.UBS_Open_Branch == 0]

In [23]:
#Shuffle the indeces of Majority dataset
majority_index=list(majority_dataset.index)
random.seed(2337)
random.shuffle(majority_index)

## Undersampling & cross-prediction 

For this specific problem, the information of zipcodes in majority class are requied for model training, on the other hand, any zipcode in majority class is possbile to be opened with a new branch.
Therefore, regular model training and prediction process lead to "data leakage", which refers to a mistake made by the creator of a machine learning model, in which they accidentally share information between the test and training data-sets, memorizes the training set data, and is easily able to correctly output the labels/values for those test data-set examples.
In order to avoid "data leakage", we employ an undersampling & cross-prediction scheme.

Divide the majority dataset into m=122 subsets, each subset size is n=268, we randomly select one subset of majority dataset, and combine it with the minority set as a train set to train the model with cross-validation, the cross-validation score is recored as the weight which is used in the final stage. 
Keep the remaining part of the majority dataset as the test set, predict the test set with the trained model.
<img src="cross_prediction2-1.png">
Repeat above process until all the m=122 subsets of majority dataset have been employed for model training. 
Finally calculate the weighted average of the predictions for each zipcode, 
those zipcodes with the top predictions are the ones to be opend wth the new branches most likely.

In [24]:
#Parameters initialization

simulation_rounds=122

#Size of each subset in majority set
subsetsize=268

#Indeces of elements in each subset of majority set
majority_subset_index=[]

#Cross-validation score initialization
Model_performance=pd.DataFrame(np.zeros((122,1)),columns=['Accuracy'])

#Obtain indeces of elements in each subset of majority set
for i in range(122):
    majority_subset_index.append(majority_index[i*subsetsize:(i+1)*subsetsize])
    
#Initialize the final weighted prediction of each zipcode
Final_All_ZipCode_prediction=PCA_dataset.loc[:,['ZipCode','y_pred_prob']]

#Initialize the prediction of each zipcode in a single round cross-prediction
temp_single_round_prediction=PCA_dataset.loc[:,['ZipCode','y_pred_prob']] 

In [25]:
#Train model with K-Fold cross validation
kf = KFold(n_splits=10,random_state=10)

#Classification model ensembling

#Logistic regression
logreg= LogisticRegression(random_state=10,solver='liblinear')#random_state=10
#Naive bayes
NB = GaussianNB()
# Decision tree
#tree= DecisionTreeClassifier(criterion = 'entropy', random_state = 10) 
#KNN
KNN = KNeighborsClassifier(n_neighbors =15, metric = 'minkowski', p = 2)

In [26]:
#Cross prediction
NB_count=0
for j in range(simulation_rounds):
    
    #A subset of majoriry data set is combined with the minority set to form the train set
    majority_datasubset=majority_dataset.loc[majority_subset_index[j],:]
    dataset_train=majority_datasubset.append(minority_dataset,ignore_index=False)    
    X_train= dataset_train.iloc[:,1:-3].values ##features
    y_train = dataset_train.UBS_Open_Branch.values##labels
      
    
    #Train models using KFold cv
    #logit_score =cross_val_score(logreg, X_train, y_train, cv=kf,scoring='accuracy')
    #NB_score = cross_val_score(NB, X_train, y_train, cv=kf,scoring='accuracy')
    logit_score = cross_validate(logreg, X_train, y_train,  scoring='accuracy', cv=kf)
    NB_score=cross_validate(NB, X_train, y_train,  scoring='accuracy', cv=kf)
    #tree_score=cross_validate(tree, X_train, y_train,  scoring='accuracy', cv=kf)
    KNN_score=cross_validate(KNN, X_train, y_train,  scoring='accuracy', cv=kf)
    
    #Select the average accuracy score of different models
    cross_validation_score = (np.mean(NB_score['test_score'])+np.mean(logit_score['test_score'])+np.mean(KNN_score['test_score']))/3
    
    #Predict the test set
    #The remaining part of the majority dataset which is not used in training process
    #is the test set   
    X_test=majority_dataset.drop(majority_subset_index[j],axis=0).iloc[:,1:-3].values
    X_test_ZipCode=majority_dataset.drop(majority_subset_index[j],axis=0).loc[:,['ZipCode','y_pred_prob']]
     

    #Predict the test set with different models   

    NB.fit(X_train, y_train)           
    y_probas_NB= NB.predict_proba(X_test)

    logreg.fit(X_train, y_train) 
    y_probas_logit= logreg.predict_proba(X_test)
    
    #tree.fit(X_train, y_train) 
    #y_probas_tree= tree.predict_proba(X_test)
    
    KNN.fit(X_train, y_train) 
    y_probas_KNN= KNN.predict_proba(X_test)
    
    # Get the average predicted probability of text set with different models
    y_probas=(y_probas_NB+y_probas_logit+y_probas_KNN)/3
    
    X_test_ZipCode['y_pred_prob']=y_probas[:,1]
    
    #Merge the predictions on Zipcode
    single_round_predictions=pd.merge(temp_single_round_prediction,X_test_ZipCode, on='ZipCode',how='left')
    
    #Replace nan with 0
    single_round_predictions.iloc[:,-1].fillna(0,inplace=True)
    
    
    #Weighted averaging the prediction probability with the 
    #Cross validation score from each model
    Final_All_ZipCode_prediction['y_pred_prob']+=single_round_predictions.iloc[:,-1]*cross_validation_score 
    Model_performance.iloc[j,:]=cross_validation_score
    print('Cross-validation for {}-th subset using LogRg: train_score mean is {}, test_score mean is {}'.format(j,np.mean(logit_score['train_score']),np.mean(logit_score['test_score'])))
    print('Cross-validation for {}-th subset using    NB: train_score mean is {}, test_score mean is {}'.format(j,np.mean(NB_score['train_score']),np.mean(NB_score['test_score'])))
    #print('Cross-validation for {}-th subset using tree: train_score mean is {}, test_score mean is {}'.format(j,np.mean(tree_score['train_score']),np.mean(tree_score['test_score'])))
    print('Cross-validation for {}-th subset using   KNN: train_score mean is {}, test_score mean is {}'.format(j,np.mean(KNN_score['train_score']),np.mean(KNN_score['test_score'])))



Cross-validation for 0-th subset using LogRg: train_score mean is 0.8926455044941954, test_score mean is 0.836408106219427
Cross-validation for 0-th subset using    NB: train_score mean is 0.7456058210532752, test_score mean is 0.689203354297694
Cross-validation for 0-th subset using   KNN: train_score mean is 0.9012089342336284, test_score mean is 0.8365478686233402




Cross-validation for 1-th subset using LogRg: train_score mean is 0.898075663210489, test_score mean is 0.8494060097833683
Cross-validation for 1-th subset using    NB: train_score mean is 0.7529166411893676, test_score mean is 0.7154088050314465
Cross-validation for 1-th subset using   KNN: train_score mean is 0.8978686419580543, test_score mean is 0.8288259958071279




Cross-validation for 2-th subset using LogRg: train_score mean is 0.8842943370515632, test_score mean is 0.787351502445842
Cross-validation for 2-th subset using    NB: train_score mean is 0.7172041648832558, test_score mean is 0.6684136967155835
Cross-validation for 2-th subset using   KNN: train_score mean is 0.8878438343480578, test_score mean is 0.7930468204053109




Cross-validation for 3-th subset using LogRg: train_score mean is 0.8955700072501113, test_score mean is 0.839937106918239
Cross-validation for 3-th subset using    NB: train_score mean is 0.741217319904613, test_score mean is 0.7078965758211041
Cross-validation for 3-th subset using   KNN: train_score mean is 0.9091478061861796, test_score mean is 0.8232704402515724




Cross-validation for 4-th subset using LogRg: train_score mean is 0.9076851180545245, test_score mean is 0.8419287211740041
Cross-validation for 4-th subset using    NB: train_score mean is 0.7387121006979325, test_score mean is 0.7042627533193571
Cross-validation for 4-th subset using   KNN: train_score mean is 0.9053851730854902, test_score mean is 0.8306778476589798




Cross-validation for 5-th subset using LogRg: train_score mean is 0.8926489985237727, test_score mean is 0.8307826694619148
Cross-validation for 5-th subset using    NB: train_score mean is 0.7253452537975734, test_score mean is 0.6795946890286513
Cross-validation for 5-th subset using   KNN: train_score mean is 0.8951555279915443, test_score mean is 0.8044025157232703




Cross-validation for 6-th subset using LogRg: train_score mean is 0.8799053991492037, test_score mean is 0.8306079664570231
Cross-validation for 6-th subset using    NB: train_score mean is 0.7326582577021515, test_score mean is 0.6909503843466107
Cross-validation for 6-th subset using   KNN: train_score mean is 0.893691092845101, test_score mean is 0.8193221523410201




Cross-validation for 7-th subset using LogRg: train_score mean is 0.8895148539932389, test_score mean is 0.8155136268343816
Cross-validation for 7-th subset using    NB: train_score mean is 0.7151191027332047, test_score mean is 0.6722222222222223
Cross-validation for 7-th subset using   KNN: train_score mean is 0.8970344423965549, test_score mean is 0.840041928721174




Cross-validation for 8-th subset using LogRg: train_score mean is 0.8897249325215538, test_score mean is 0.826869322152341
Cross-validation for 8-th subset using    NB: train_score mean is 0.7161633808230187, test_score mean is 0.6780573025856045
Cross-validation for 8-th subset using   KNN: train_score mean is 0.8857539679073383, test_score mean is 0.7950733752620545




Cross-validation for 9-th subset using LogRg: train_score mean is 0.908520627877115, test_score mean is 0.8588749126484976
Cross-validation for 9-th subset using    NB: train_score mean is 0.7499956324630288, test_score mean is 0.7212788259958072
Cross-validation for 9-th subset using   KNN: train_score mean is 0.9047623623133971, test_score mean is 0.8363731656184485




Cross-validation for 10-th subset using LogRg: train_score mean is 0.896616032354714, test_score mean is 0.8344863731656185
Cross-validation for 10-th subset using    NB: train_score mean is 0.7472803347280335, test_score mean is 0.7155835080363382
Cross-validation for 10-th subset using   KNN: train_score mean is 0.8966182161231997, test_score mean is 0.8270440251572329




Cross-validation for 11-th subset using LogRg: train_score mean is 0.8826228806526848, test_score mean is 0.813801537386443
Cross-validation for 11-th subset using    NB: train_score mean is 0.7205457674199212, test_score mean is 0.6834730957372467
Cross-validation for 11-th subset using   KNN: train_score mean is 0.8740607611743434, test_score mean is 0.7950034940600978




Cross-validation for 12-th subset using LogRg: train_score mean is 0.9018387330648754, test_score mean is 0.8402166317260656
Cross-validation for 12-th subset using    NB: train_score mean is 0.756468759008045, test_score mean is 0.708001397624039
Cross-validation for 12-th subset using   KNN: train_score mean is 0.9055969986285934, test_score mean is 0.8364430468204052




Cross-validation for 13-th subset using LogRg: train_score mean is 0.8824171696613412, test_score mean is 0.8025506638714186
Cross-validation for 13-th subset using    NB: train_score mean is 0.7088512504258349, test_score mean is 0.6496156533892383
Cross-validation for 13-th subset using   KNN: train_score mean is 0.8872153457779021, test_score mean is 0.8083508036338225




Cross-validation for 14-th subset using LogRg: train_score mean is 0.8974511054236075, test_score mean is 0.8212089447938504
Cross-validation for 14-th subset using    NB: train_score mean is 0.7328670259693749, test_score mean is 0.6925925925925925
Cross-validation for 14-th subset using   KNN: train_score mean is 0.9024641643591513, test_score mean is 0.8439552760307478




Cross-validation for 15-th subset using LogRg: train_score mean is 0.8742690926878696, test_score mean is 0.8156533892382949
Cross-validation for 15-th subset using    NB: train_score mean is 0.7080096260514844, test_score mean is 0.6645003494060099
Cross-validation for 15-th subset using   KNN: train_score mean is 0.8943200181689539, test_score mean is 0.8251572327044026




Cross-validation for 16-th subset using LogRg: train_score mean is 0.8907683371039739, test_score mean is 0.8342767295597483
Cross-validation for 16-th subset using    NB: train_score mean is 0.7364191437880523, test_score mean is 0.6986023759608665
Cross-validation for 16-th subset using   KNN: train_score mean is 0.8849206418532333, test_score mean is 0.7968553459119496




Cross-validation for 17-th subset using LogRg: train_score mean is 0.9120727456957922, test_score mean is 0.8550663871418589
Cross-validation for 17-th subset using    NB: train_score mean is 0.7518727998532507, test_score mean is 0.7191474493361285
Cross-validation for 17-th subset using   KNN: train_score mean is 0.9070614337750369, test_score mean is 0.843990216631726




Cross-validation for 18-th subset using LogRg: train_score mean is 0.9026716223652833, test_score mean is 0.8266946191474493
Cross-validation for 18-th subset using    NB: train_score mean is 0.7274381775141727, test_score mean is 0.6833333333333333
Cross-validation for 18-th subset using   KNN: train_score mean is 0.9139507865934086, test_score mean is 0.8288958770090845




Cross-validation for 19-th subset using LogRg: train_score mean is 0.891602536665473, test_score mean is 0.8080712788259957
Cross-validation for 19-th subset using    NB: train_score mean is 0.7157419135052978, test_score mean is 0.6740391334730957
Cross-validation for 19-th subset using   KNN: train_score mean is 0.8788633048278754, test_score mean is 0.8044025157232705




Cross-validation for 20-th subset using LogRg: train_score mean is 0.8932704990347743, test_score mean is 0.8249475890985325
Cross-validation for 20-th subset using    NB: train_score mean is 0.7272254784636754, test_score mean is 0.6833682739343118
Cross-validation for 20-th subset using   KNN: train_score mean is 0.8978690787117513, test_score mean is 0.8157232704402515




Cross-validation for 21-th subset using LogRg: train_score mean is 0.8819987596195002, test_score mean is 0.8083857442348009
Cross-validation for 21-th subset using    NB: train_score mean is 0.7349608231933684, test_score mean is 0.6966457023060797
Cross-validation for 21-th subset using   KNN: train_score mean is 0.8842982678348372, test_score mean is 0.8083857442348009




Cross-validation for 22-th subset using LogRg: train_score mean is 0.8961950017906902, test_score mean is 0.8305380852550662
Cross-validation for 22-th subset using    NB: train_score mean is 0.7224203142879606, test_score mean is 0.6945842068483578
Cross-validation for 22-th subset using   KNN: train_score mean is 0.8857552781684298, test_score mean is 0.8005939902166317




Cross-validation for 23-th subset using LogRg: train_score mean is 0.8811593190136355, test_score mean is 0.8154437456324249
Cross-validation for 23-th subset using    NB: train_score mean is 0.7084249788174457, test_score mean is 0.6569182389937108
Cross-validation for 23-th subset using   KNN: train_score mean is 0.8878420873332693, test_score mean is 0.8024458420684836




Cross-validation for 24-th subset using LogRg: train_score mean is 0.8861776189935447, test_score mean is 0.8081062194269742
Cross-validation for 24-th subset using    NB: train_score mean is 0.7320363204374526, test_score mean is 0.6780573025856046
Cross-validation for 24-th subset using   KNN: train_score mean is 0.9014229435452172, test_score mean is 0.8120195667365477




Cross-validation for 25-th subset using LogRg: train_score mean is 0.8993330771044977, test_score mean is 0.8306079664570231
Cross-validation for 25-th subset using    NB: train_score mean is 0.7424786645818957, test_score mean is 0.6949336128581411
Cross-validation for 25-th subset using   KNN: train_score mean is 0.9001694604344828, test_score mean is 0.8233053808525506




Cross-validation for 26-th subset using LogRg: train_score mean is 0.8924371729806693, test_score mean is 0.8118448637316561
Cross-validation for 26-th subset using    NB: train_score mean is 0.7119823376804885, test_score mean is 0.679769392033543
Cross-validation for 26-th subset using   KNN: train_score mean is 0.8966177793695025, test_score mean is 0.8252271139063593




Cross-validation for 27-th subset using LogRg: train_score mean is 0.8751054760178546, test_score mean is 0.8004891684136967
Cross-validation for 27-th subset using    NB: train_score mean is 0.7232615019086136, test_score mean is 0.6758909853249475
Cross-validation for 27-th subset using   KNN: train_score mean is 0.8868008665193351, test_score mean is 0.8043675751222921




Cross-validation for 28-th subset using LogRg: train_score mean is 0.8886828382002253, test_score mean is 0.8155485674353598
Cross-validation for 28-th subset using    NB: train_score mean is 0.7351639136625292, test_score mean is 0.6946890286512928
Cross-validation for 28-th subset using   KNN: train_score mean is 0.8995400983569327, test_score mean is 0.8419636617749827




Cross-validation for 29-th subset using LogRg: train_score mean is 0.8972414636489898, test_score mean is 0.8363731656184488
Cross-validation for 29-th subset using    NB: train_score mean is 0.7641979891859785, test_score mean is 0.696575821104123
Cross-validation for 29-th subset using   KNN: train_score mean is 0.8880539128763726, test_score mean is 0.7949336128581412




Cross-validation for 30-th subset using LogRg: train_score mean is 0.8836662852351045, test_score mean is 0.7950034940600978
Cross-validation for 30-th subset using    NB: train_score mean is 0.7167809505507463, test_score mean is 0.6664220824598183
Cross-validation for 30-th subset using   KNN: train_score mean is 0.8803264297132276, test_score mean is 0.8007686932215232




Cross-validation for 31-th subset using LogRg: train_score mean is 0.8976576899223453, test_score mean is 0.8438155136268344
Cross-validation for 31-th subset using    NB: train_score mean is 0.7395489207816144, test_score mean is 0.6966107617051013
Cross-validation for 31-th subset using   KNN: train_score mean is 0.897449358408819, test_score mean is 0.8213487071977637




Cross-validation for 32-th subset using LogRg: train_score mean is 0.911234615351019, test_score mean is 0.8567784765897972
Cross-validation for 32-th subset using    NB: train_score mean is 0.749989081157572, test_score mean is 0.709853249475891
Cross-validation for 32-th subset using   KNN: train_score mean is 0.908309239087709, test_score mean is 0.839937106918239




Cross-validation for 33-th subset using LogRg: train_score mean is 0.8957805225321234, test_score mean is 0.8325296995108316
Cross-validation for 33-th subset using    NB: train_score mean is 0.7453996733082346, test_score mean is 0.6948287910552061
Cross-validation for 33-th subset using   KNN: train_score mean is 0.903511062971148, test_score mean is 0.8250873515024459




Cross-validation for 34-th subset using LogRg: train_score mean is 0.8857544046610354, test_score mean is 0.8269042627533194
Cross-validation for 34-th subset using    NB: train_score mean is 0.7159467509892471, test_score mean is 0.6759259259259259
Cross-validation for 34-th subset using   KNN: train_score mean is 0.9012137385242965, test_score mean is 0.8156883298392732




Cross-validation for 35-th subset using LogRg: train_score mean is 0.8957787755173348, test_score mean is 0.8419986023759607
Cross-validation for 35-th subset using    NB: train_score mean is 0.7458207038722583, test_score mean is 0.7098881900768694
Cross-validation for 35-th subset using   KNN: train_score mean is 0.9020461910710076, test_score mean is 0.8288958770090844




Cross-validation for 36-th subset using LogRg: train_score mean is 0.9051798988478439, test_score mean is 0.8455974842767295
Cross-validation for 36-th subset using    NB: train_score mean is 0.7631528375887702, test_score mean is 0.7209643605870022
Cross-validation for 36-th subset using   KNN: train_score mean is 0.901419012761943, test_score mean is 0.8175401816911251




Cross-validation for 37-th subset using LogRg: train_score mean is 0.8970331321354635, test_score mean is 0.8420335429769393
Cross-validation for 37-th subset using    NB: train_score mean is 0.7355831972117645, test_score mean is 0.6930817610062893
Cross-validation for 37-th subset using   KNN: train_score mean is 0.8868013032730321, test_score mean is 0.8156883298392732




Cross-validation for 38-th subset using LogRg: train_score mean is 0.8922314619893259, test_score mean is 0.8307127882599581
Cross-validation for 38-th subset using    NB: train_score mean is 0.7291096339130512, test_score mean is 0.6891334730957372
Cross-validation for 38-th subset using   KNN: train_score mean is 0.8918108681789991, test_score mean is 0.8176450034940601




Cross-validation for 39-th subset using LogRg: train_score mean is 0.8870065775106786, test_score mean is 0.8250174703004893
Cross-validation for 39-th subset using    NB: train_score mean is 0.7385085734750745, test_score mean is 0.7045422781271837
Cross-validation for 39-th subset using   KNN: train_score mean is 0.8865886042225348, test_score mean is 0.7800838574423479




Cross-validation for 40-th subset using LogRg: train_score mean is 0.8945287864361774, test_score mean is 0.836408106219427
Cross-validation for 40-th subset using    NB: train_score mean is 0.739761619832112, test_score mean is 0.7004192872117401
Cross-validation for 40-th subset using   KNN: train_score mean is 0.8861728147028763, test_score mean is 0.7892732354996504




Cross-validation for 41-th subset using LogRg: train_score mean is 0.8822044706108437, test_score mean is 0.8005939902166318
Cross-validation for 41-th subset using    NB: train_score mean is 0.6856605026161546, test_score mean is 0.6512928022361985
Cross-validation for 41-th subset using   KNN: train_score mean is 0.9026738061337689, test_score mean is 0.8419986023759609




Cross-validation for 42-th subset using LogRg: train_score mean is 0.8959879805382555, test_score mean is 0.8324598183088749
Cross-validation for 42-th subset using    NB: train_score mean is 0.7299455804893389, test_score mean is 0.6873864430468204
Cross-validation for 42-th subset using   KNN: train_score mean is 0.8968252373756342, test_score mean is 0.8250873515024459




Cross-validation for 43-th subset using LogRg: train_score mean is 0.9039242319686236, test_score mean is 0.8438504542278128
Cross-validation for 43-th subset using    NB: train_score mean is 0.7293131611359092, test_score mean is 0.6926974143955277
Cross-validation for 43-th subset using   KNN: train_score mean is 0.893068718826705, test_score mean is 0.823130677847659




Cross-validation for 44-th subset using LogRg: train_score mean is 0.8859644831893501, test_score mean is 0.813906359189378
Cross-validation for 44-th subset using    NB: train_score mean is 0.7195001790690159, test_score mean is 0.6647449336128581
Cross-validation for 44-th subset using   KNN: train_score mean is 0.8707161013617981, test_score mean is 0.7707547169811321




Cross-validation for 45-th subset using LogRg: train_score mean is 0.8868008665193351, test_score mean is 0.8214884696016771
Cross-validation for 45-th subset using    NB: train_score mean is 0.7247211327643889, test_score mean is 0.6513626834381551
Cross-validation for 45-th subset using   KNN: train_score mean is 0.8907709576261562, test_score mean is 0.8008735150244585




Cross-validation for 46-th subset using LogRg: train_score mean is 0.8995383513421441, test_score mean is 0.8475192173305383
Cross-validation for 46-th subset using    NB: train_score mean is 0.7385072632139832, test_score mean is 0.6929070580013976
Cross-validation for 46-th subset using   KNN: train_score mean is 0.9039242319686236, test_score mean is 0.8287910552061497




Cross-validation for 47-th subset using LogRg: train_score mean is 0.8690481389924966, test_score mean is 0.7949685534591195
Cross-validation for 47-th subset using    NB: train_score mean is 0.7065513054568008, test_score mean is 0.653424178895877
Cross-validation for 47-th subset using   KNN: train_score mean is 0.8824154226465527, test_score mean is 0.7838923829489868




Cross-validation for 48-th subset using LogRg: train_score mean is 0.8907661533354879, test_score mean is 0.8137665967854646
Cross-validation for 48-th subset using    NB: train_score mean is 0.731404774591417, test_score mean is 0.6910552061495457
Cross-validation for 48-th subset using   KNN: train_score mean is 0.9022567063530195, test_score mean is 0.8440600978336829




Cross-validation for 49-th subset using LogRg: train_score mean is 0.8771892279068142, test_score mean is 0.796785464709993
Cross-validation for 49-th subset using    NB: train_score mean is 0.6919239873865533, test_score mean is 0.6380852550663871
Cross-validation for 49-th subset using   KNN: train_score mean is 0.8918117416863932, test_score mean is 0.804262753319357




Cross-validation for 50-th subset using LogRg: train_score mean is 0.878232632489234, test_score mean is 0.7985674353598882
Cross-validation for 50-th subset using    NB: train_score mean is 0.6925612110306514, test_score mean is 0.6515723270440252
Cross-validation for 50-th subset using   KNN: train_score mean is 0.877610258470838, test_score mean is 0.785779175401817




Cross-validation for 51-th subset using LogRg: train_score mean is 0.8872184030537819, test_score mean is 0.823025856044724
Cross-validation for 51-th subset using    NB: train_score mean is 0.7366252915330929, test_score mean is 0.675925925925926
Cross-validation for 51-th subset using   KNN: train_score mean is 0.8916051571876557, test_score mean is 0.8174703004891685




Cross-validation for 52-th subset using LogRg: train_score mean is 0.87426297813611, test_score mean is 0.8097833682739344
Cross-validation for 52-th subset using    NB: train_score mean is 0.7232562608642482, test_score mean is 0.6910552061495456
Cross-validation for 52-th subset using   KNN: train_score mean is 0.8807382884496118, test_score mean is 0.7778476589797344




Cross-validation for 53-th subset using LogRg: train_score mean is 0.8976572531686481, test_score mean is 0.8306429070580015
Cross-validation for 53-th subset using    NB: train_score mean is 0.7328648422008893, test_score mean is 0.6946890286512929
Cross-validation for 53-th subset using   KNN: train_score mean is 0.8926481250163782, test_score mean is 0.813801537386443




Cross-validation for 54-th subset using LogRg: train_score mean is 0.8913928948908552, test_score mean is 0.8285464709993011
Cross-validation for 54-th subset using    NB: train_score mean is 0.701535189245377, test_score mean is 0.6553109713487072
Cross-validation for 54-th subset using   KNN: train_score mean is 0.8882604973751104, test_score mean is 0.8324598183088749




Cross-validation for 55-th subset using LogRg: train_score mean is 0.8886784706632541, test_score mean is 0.8306079664570231
Cross-validation for 55-th subset using    NB: train_score mean is 0.7439321808859113, test_score mean is 0.7116352201257862
Cross-validation for 55-th subset using   KNN: train_score mean is 0.8951555279915444, test_score mean is 0.8192872117400419




Cross-validation for 56-th subset using LogRg: train_score mean is 0.8964072640874905, test_score mean is 0.836198462613557
Cross-validation for 56-th subset using    NB: train_score mean is 0.7535416357299465, test_score mean is 0.7135918937805731
Cross-validation for 56-th subset using   KNN: train_score mean is 0.9045562145683563, test_score mean is 0.8175751222921035




Cross-validation for 57-th subset using LogRg: train_score mean is 0.8918104314253019, test_score mean is 0.8380503144654089
Cross-validation for 57-th subset using    NB: train_score mean is 0.7472759671910623, test_score mean is 0.7174353598881901
Cross-validation for 57-th subset using   KNN: train_score mean is 0.9037198312383715, test_score mean is 0.8270440251572329




Cross-validation for 58-th subset using LogRg: train_score mean is 0.8780282317589819, test_score mean is 0.7968553459119498
Cross-validation for 58-th subset using    NB: train_score mean is 0.7243070902595192, test_score mean is 0.6911250873515025
Cross-validation for 58-th subset using   KNN: train_score mean is 0.8895148539932389, test_score mean is 0.8062893081761008




Cross-validation for 59-th subset using LogRg: train_score mean is 0.8903516740769211, test_score mean is 0.817330538085255
Cross-validation for 59-th subset using    NB: train_score mean is 0.7236764179208779, test_score mean is 0.6721523410202657
Cross-validation for 59-th subset using   KNN: train_score mean is 0.8905608790978417, test_score mean is 0.8044723969252271




Cross-validation for 60-th subset using LogRg: train_score mean is 0.9062233034302636, test_score mean is 0.8493710691823899
Cross-validation for 60-th subset using    NB: train_score mean is 0.7493671438928731, test_score mean is 0.723025856044724
Cross-validation for 60-th subset using   KNN: train_score mean is 0.8968226168534518, test_score mean is 0.8250174703004891




Cross-validation for 61-th subset using LogRg: train_score mean is 0.8888885491915689, test_score mean is 0.8306079664570231
Cross-validation for 61-th subset using    NB: train_score mean is 0.7192852962500329, test_score mean is 0.6775681341719078
Cross-validation for 61-th subset using   KNN: train_score mean is 0.883042600955617, test_score mean is 0.7930468204053109




Cross-validation for 62-th subset using LogRg: train_score mean is 0.8949419554336527, test_score mean is 0.8247728860936409
Cross-validation for 62-th subset using    NB: train_score mean is 0.7322398476603105, test_score mean is 0.6872816212438855
Cross-validation for 62-th subset using   KNN: train_score mean is 0.9001650928975113, test_score mean is 0.8229909154437456




Cross-validation for 63-th subset using LogRg: train_score mean is 0.8759405490867481, test_score mean is 0.8120894479385046
Cross-validation for 63-th subset using    NB: train_score mean is 0.7161590132860475, test_score mean is 0.6607617051013278
Cross-validation for 63-th subset using   KNN: train_score mean is 0.8932753033254428, test_score mean is 0.8175401816911251




Cross-validation for 64-th subset using LogRg: train_score mean is 0.8926520557996522, test_score mean is 0.8345213137665967
Cross-validation for 64-th subset using    NB: train_score mean is 0.7084345873987823, test_score mean is 0.670440251572327
Cross-validation for 64-th subset using   KNN: train_score mean is 0.8913963889204324, test_score mean is 0.8252271139063593




Cross-validation for 65-th subset using LogRg: train_score mean is 0.8886819646928311, test_score mean is 0.8287910552061495
Cross-validation for 65-th subset using    NB: train_score mean is 0.7205462041736184, test_score mean is 0.6758211041229909
Cross-validation for 65-th subset using   KNN: train_score mean is 0.8907687738576706, test_score mean is 0.8119147449336129




Cross-validation for 66-th subset using LogRg: train_score mean is 0.9137415815724881, test_score mean is 0.860587002096436
Cross-validation for 66-th subset using    NB: train_score mean is 0.7487417125985971, test_score mean is 0.7193221523410203
Cross-validation for 66-th subset using   KNN: train_score mean is 0.9026720591189805, test_score mean is 0.8420335429769391




Cross-validation for 67-th subset using LogRg: train_score mean is 0.9047593050375171, test_score mean is 0.845632424877708
Cross-validation for 67-th subset using    NB: train_score mean is 0.7322367903844307, test_score mean is 0.696575821104123
Cross-validation for 67-th subset using   KNN: train_score mean is 0.9099806954865874, test_score mean is 0.8513277428371768




Cross-validation for 68-th subset using LogRg: train_score mean is 0.8918147989622731, test_score mean is 0.8456673654786864
Cross-validation for 68-th subset using    NB: train_score mean is 0.7428940173478569, test_score mean is 0.7042976939203355
Cross-validation for 68-th subset using   KNN: train_score mean is 0.8853386151413772, test_score mean is 0.7932564640111809




Cross-validation for 69-th subset using LogRg: train_score mean is 0.8888854919156891, test_score mean is 0.8418588399720475
Cross-validation for 69-th subset using    NB: train_score mean is 0.7531236624418025, test_score mean is 0.7248427672955976
Cross-validation for 69-th subset using   KNN: train_score mean is 0.8968243638682403, test_score mean is 0.8174703004891685




Cross-validation for 70-th subset using LogRg: train_score mean is 0.8855469466549035, test_score mean is 0.8288259958071279
Cross-validation for 70-th subset using    NB: train_score mean is 0.7332880565333987, test_score mean is 0.6929419986023759
Cross-validation for 70-th subset using   KNN: train_score mean is 0.8730186668530149, test_score mean is 0.7989867225716283




Cross-validation for 71-th subset using LogRg: train_score mean is 0.8868008665193351, test_score mean is 0.8251572327044026
Cross-validation for 71-th subset using    NB: train_score mean is 0.7132345105301316, test_score mean is 0.6588749126484974
Cross-validation for 71-th subset using   KNN: train_score mean is 0.8978708257265398, test_score mean is 0.8345213137665969




Cross-validation for 72-th subset using LogRg: train_score mean is 0.8880508556004927, test_score mean is 0.8138015373864432
Cross-validation for 72-th subset using    NB: train_score mean is 0.7088451358740752, test_score mean is 0.6589797344514327
Cross-validation for 72-th subset using   KNN: train_score mean is 0.8807413457254916, test_score mean is 0.8100978336827394




Cross-validation for 73-th subset using LogRg: train_score mean is 0.8995370410810528, test_score mean is 0.8511879804332635
Cross-validation for 73-th subset using    NB: train_score mean is 0.7723443191446615, test_score mean is 0.7362334032145352
Cross-validation for 73-th subset using   KNN: train_score mean is 0.8961984958202672, test_score mean is 0.8250873515024459




Cross-validation for 74-th subset using LogRg: train_score mean is 0.9060140984093431, test_score mean is 0.8381900768693221
Cross-validation for 74-th subset using    NB: train_score mean is 0.7412221241952814, test_score mean is 0.709853249475891
Cross-validation for 74-th subset using   KNN: train_score mean is 0.8941068823647592, test_score mean is 0.8119846261355695




Cross-validation for 75-th subset using LogRg: train_score mean is 0.9041356207580298, test_score mean is 0.8532494758909854
Cross-validation for 75-th subset using    NB: train_score mean is 0.7619015382465213, test_score mean is 0.7116352201257862
Cross-validation for 75-th subset using   KNN: train_score mean is 0.8959884172919523, test_score mean is 0.8383647798742139




Cross-validation for 76-th subset using LogRg: train_score mean is 0.8897244957678566, test_score mean is 0.8212438853948288
Cross-validation for 76-th subset using    NB: train_score mean is 0.7334963880469249, test_score mean is 0.6778127183787561
Cross-validation for 76-th subset using   KNN: train_score mean is 0.8765637966125384, test_score mean is 0.7780922431865827




Cross-validation for 77-th subset using LogRg: train_score mean is 0.9072693285348661, test_score mean is 0.8381201956673655
Cross-validation for 77-th subset using    NB: train_score mean is 0.7535438194984321, test_score mean is 0.7060447239692522
Cross-validation for 77-th subset using   KNN: train_score mean is 0.8980769734715806, test_score mean is 0.8365478686233404




Cross-validation for 78-th subset using LogRg: train_score mean is 0.8888868021767804, test_score mean is 0.834346610761705
Cross-validation for 78-th subset using    NB: train_score mean is 0.7575121635904648, test_score mean is 0.713591893780573
Cross-validation for 78-th subset using   KNN: train_score mean is 0.897034879150252, test_score mean is 0.8307477288609364




Cross-validation for 79-th subset using LogRg: train_score mean is 0.8953590552144023, test_score mean is 0.8362334032145352
Cross-validation for 79-th subset using    NB: train_score mean is 0.7130191909574515, test_score mean is 0.6645702306079665
Cross-validation for 79-th subset using   KNN: train_score mean is 0.8851259160908796, test_score mean is 0.8117749825296995




Cross-validation for 80-th subset using LogRg: train_score mean is 0.8757335278343131, test_score mean is 0.7931167016072675
Cross-validation for 80-th subset using    NB: train_score mean is 0.6998676636297726, test_score mean is 0.6609364081062195
Cross-validation for 80-th subset using   KNN: train_score mean is 0.8849206418532333, test_score mean is 0.8008036338225016




Cross-validation for 81-th subset using LogRg: train_score mean is 0.9032961801521651, test_score mean is 0.8230607966457024
Cross-validation for 81-th subset using    NB: train_score mean is 0.7232601916475222, test_score mean is 0.6894479385045422
Cross-validation for 81-th subset using   KNN: train_score mean is 0.9028817008935981, test_score mean is 0.8343815513626834




Cross-validation for 82-th subset using LogRg: train_score mean is 0.8855434526253265, test_score mean is 0.823025856044724
Cross-validation for 82-th subset using    NB: train_score mean is 0.7276469457813961, test_score mean is 0.6911949685534591
Cross-validation for 82-th subset using   KNN: train_score mean is 0.8805360714878452, test_score mean is 0.810272536687631




Cross-validation for 83-th subset using LogRg: train_score mean is 0.8945266026676915, test_score mean is 0.8325995807127882
Cross-validation for 83-th subset using    NB: train_score mean is 0.7318205641110752, test_score mean is 0.6816561844863732
Cross-validation for 83-th subset using   KNN: train_score mean is 0.9005848132004438, test_score mean is 0.8327044025157233




Cross-validation for 84-th subset using LogRg: train_score mean is 0.8673827971453779, test_score mean is 0.8044723969252272
Cross-validation for 84-th subset using    NB: train_score mean is 0.6936107301648307, test_score mean is 0.6269042627533195
Cross-validation for 84-th subset using   KNN: train_score mean is 0.8911893676679974, test_score mean is 0.8158280922431868




Cross-validation for 85-th subset using LogRg: train_score mean is 0.9070561927306715, test_score mean is 0.860587002096436
Cross-validation for 85-th subset using    NB: train_score mean is 0.7625221652501288, test_score mean is 0.7493710691823898
Cross-validation for 85-th subset using   KNN: train_score mean is 0.9156218062385898, test_score mean is 0.8361635220125786




Cross-validation for 86-th subset using LogRg: train_score mean is 0.8945252924066004, test_score mean is 0.8418238993710692
Cross-validation for 86-th subset using    NB: train_score mean is 0.7343292773473328, test_score mean is 0.7080712788259957
Cross-validation for 86-th subset using   KNN: train_score mean is 0.8999571981376823, test_score mean is 0.8137665967854646




Cross-validation for 87-th subset using LogRg: train_score mean is 0.9014177025008516, test_score mean is 0.8306429070580015
Cross-validation for 87-th subset using    NB: train_score mean is 0.7151125514277478, test_score mean is 0.6588399720475192
Cross-validation for 87-th subset using   KNN: train_score mean is 0.9005835029393523, test_score mean is 0.840146750524109




Cross-validation for 88-th subset using LogRg: train_score mean is 0.8970318218743722, test_score mean is 0.830503144654088
Cross-validation for 88-th subset using    NB: train_score mean is 0.7161577030249561, test_score mean is 0.690985324947589
Cross-validation for 88-th subset using   KNN: train_score mean is 0.8882618076362017, test_score mean is 0.8119496855345913




Cross-validation for 89-th subset using LogRg: train_score mean is 0.8761466968317887, test_score mean is 0.8042976939203355
Cross-validation for 89-th subset using    NB: train_score mean is 0.7259746158751235, test_score mean is 0.6853249475890986
Cross-validation for 89-th subset using   KNN: train_score mean is 0.8855482569159948, test_score mean is 0.7912299091544376




Cross-validation for 90-th subset using LogRg: train_score mean is 0.900374297918432, test_score mean is 0.8249126484975543
Cross-validation for 90-th subset using    NB: train_score mean is 0.7598099247910134, test_score mean is 0.696575821104123
Cross-validation for 90-th subset using   KNN: train_score mean is 0.8920209467073139, test_score mean is 0.8174703004891685




Cross-validation for 91-th subset using LogRg: train_score mean is 0.897449795162516, test_score mean is 0.8286862334032146
Cross-validation for 91-th subset using    NB: train_score mean is 0.7328661524619806, test_score mean is 0.6948637316561845
Cross-validation for 91-th subset using   KNN: train_score mean is 0.9074767865409982, test_score mean is 0.7950733752620545




Cross-validation for 92-th subset using LogRg: train_score mean is 0.8980774102252773, test_score mean is 0.8399720475192174
Cross-validation for 92-th subset using    NB: train_score mean is 0.7460268516172989, test_score mean is 0.6948986722571628
Cross-validation for 92-th subset using   KNN: train_score mean is 0.8997506136389445, test_score mean is 0.8289308176100629




Cross-validation for 93-th subset using LogRg: train_score mean is 0.8849219521143248, test_score mean is 0.8214185883997205
Cross-validation for 93-th subset using    NB: train_score mean is 0.7303600597479059, test_score mean is 0.6889937106918239
Cross-validation for 93-th subset using   KNN: train_score mean is 0.8901424690560006, test_score mean is 0.8063591893780574




Cross-validation for 94-th subset using LogRg: train_score mean is 0.8734313990967933, test_score mean is 0.7947938504542277
Cross-validation for 94-th subset using    NB: train_score mean is 0.7048767917820424, test_score mean is 0.6549266247379455
Cross-validation for 94-th subset using   KNN: train_score mean is 0.866543356539513, test_score mean is 0.7838574423480085




Cross-validation for 95-th subset using LogRg: train_score mean is 0.8872184030537819, test_score mean is 0.8212438853948288
Cross-validation for 95-th subset using    NB: train_score mean is 0.7272280989858579, test_score mean is 0.6985324947589098
Cross-validation for 95-th subset using   KNN: train_score mean is 0.8834579537215783, test_score mean is 0.8102375960866528




Cross-validation for 96-th subset using LogRg: train_score mean is 0.8882587503603219, test_score mean is 0.8117749825296995
Cross-validation for 96-th subset using    NB: train_score mean is 0.7499860238816921, test_score mean is 0.7078965758211042
Cross-validation for 96-th subset using   KNN: train_score mean is 0.8874258610599138, test_score mean is 0.8004542278127185




Cross-validation for 97-th subset using LogRg: train_score mean is 0.9103991055284283, test_score mean is 0.8568134171907756
Cross-validation for 97-th subset using    NB: train_score mean is 0.7875883334352426, test_score mean is 0.7474493361285814
Cross-validation for 97-th subset using   KNN: train_score mean is 0.9068509184930251, test_score mean is 0.8437106918238992




Cross-validation for 98-th subset using LogRg: train_score mean is 0.9018387330648755, test_score mean is 0.8476240391334731
Cross-validation for 98-th subset using    NB: train_score mean is 0.7395519780574944, test_score mean is 0.7155485674353599
Cross-validation for 98-th subset using   KNN: train_score mean is 0.8922301517282344, test_score mean is 0.8195667365478686




Cross-validation for 99-th subset using LogRg: train_score mean is 0.8780243009757077, test_score mean is 0.8192522711390635
Cross-validation for 99-th subset using    NB: train_score mean is 0.7276434517518192, test_score mean is 0.6757861635220126
Cross-validation for 99-th subset using   KNN: train_score mean is 0.8838759270097221, test_score mean is 0.8060796645702306




Cross-validation for 100-th subset using LogRg: train_score mean is 0.8579777430315948, test_score mean is 0.7608665269042627
Cross-validation for 100-th subset using    NB: train_score mean is 0.6641508197866894, test_score mean is 0.6212788259958072
Cross-validation for 100-th subset using   KNN: train_score mean is 0.8830408539408285, test_score mean is 0.8004891684136968




Cross-validation for 101-th subset using LogRg: train_score mean is 0.9007913976991816, test_score mean is 0.8399021663172608
Cross-validation for 101-th subset using    NB: train_score mean is 0.7468610511787983, test_score mean is 0.7174004192872118
Cross-validation for 101-th subset using   KNN: train_score mean is 0.9124889719691479, test_score mean is 0.8418937805730259




Cross-validation for 102-th subset using LogRg: train_score mean is 0.8922292782208402, test_score mean is 0.8363382250174703
Cross-validation for 102-th subset using    NB: train_score mean is 0.7426773875140853, test_score mean is 0.704053109713487
Cross-validation for 102-th subset using   KNN: train_score mean is 0.8899319537739887, test_score mean is 0.8137316561844864




Cross-validation for 103-th subset using LogRg: train_score mean is 0.8999576348913795, test_score mean is 0.840146750524109
Cross-validation for 103-th subset using    NB: train_score mean is 0.7410137926817552, test_score mean is 0.6983927323549965
Cross-validation for 103-th subset using   KNN: train_score mean is 0.8995409718643268, test_score mean is 0.8269392033542978




Cross-validation for 104-th subset using LogRg: train_score mean is 0.883668032249893, test_score mean is 0.8212438853948288
Cross-validation for 104-th subset using    NB: train_score mean is 0.7326586944558486, test_score mean is 0.7040181691125087
Cross-validation for 104-th subset using   KNN: train_score mean is 0.8826259379285645, test_score mean is 0.8045422781271837




Cross-validation for 105-th subset using LogRg: train_score mean is 0.8999563246302881, test_score mean is 0.8512928022361985
Cross-validation for 105-th subset using    NB: train_score mean is 0.7169958333697295, test_score mean is 0.685464709993012
Cross-validation for 105-th subset using   KNN: train_score mean is 0.8899358845572628, test_score mean is 0.8176450034940601




Cross-validation for 106-th subset using LogRg: train_score mean is 0.8878433975943606, test_score mean is 0.830817610062893
Cross-validation for 106-th subset using    NB: train_score mean is 0.7518749836217364, test_score mean is 0.6948287910552061
Cross-validation for 106-th subset using   KNN: train_score mean is 0.8982879255072893, test_score mean is 0.827078965758211




Cross-validation for 107-th subset using LogRg: train_score mean is 0.9039229217075322, test_score mean is 0.847414395527603
Cross-validation for 107-th subset using    NB: train_score mean is 0.7357893449568051, test_score mean is 0.7022711390635918
Cross-validation for 107-th subset using   KNN: train_score mean is 0.9051798988478439, test_score mean is 0.8419287211740041




Cross-validation for 108-th subset using LogRg: train_score mean is 0.893480140809392, test_score mean is 0.823235499650594
Cross-validation for 108-th subset using    NB: train_score mean is 0.7316170368882173, test_score mean is 0.6930817610062893
Cross-validation for 108-th subset using   KNN: train_score mean is 0.8865872939614434, test_score mean is 0.7970300489168414




Cross-validation for 109-th subset using LogRg: train_score mean is 0.8959919113215292, test_score mean is 0.8531795946890286
Cross-validation for 109-th subset using    NB: train_score mean is 0.7527078729221442, test_score mean is 0.7134521313766597
Cross-validation for 109-th subset using   KNN: train_score mean is 0.8959871070308612, test_score mean is 0.8306429070580015




Cross-validation for 110-th subset using LogRg: train_score mean is 0.884712310339707, test_score mean is 0.81743535988819
Cross-validation for 110-th subset using    NB: train_score mean is 0.7226334500921551, test_score mean is 0.6948986722571628
Cross-validation for 110-th subset using   KNN: train_score mean is 0.8882609341288074, test_score mean is 0.8175401816911251




Cross-validation for 111-th subset using LogRg: train_score mean is 0.8771918484289969, test_score mean is 0.8079315164220825
Cross-validation for 111-th subset using    NB: train_score mean is 0.7063403534210917, test_score mean is 0.6494060097833683
Cross-validation for 111-th subset using   KNN: train_score mean is 0.890977542124894, test_score mean is 0.8175052410901469




Cross-validation for 112-th subset using LogRg: train_score mean is 0.8918104314253019, test_score mean is 0.8325296995108318
Cross-validation for 112-th subset using    NB: train_score mean is 0.7431014753539888, test_score mean is 0.6945143256464011
Cross-validation for 112-th subset using   KNN: train_score mean is 0.887636813095623, test_score mean is 0.8250873515024459




Cross-validation for 113-th subset using LogRg: train_score mean is 0.9037159004550974, test_score mean is 0.8361635220125787
Cross-validation for 113-th subset using    NB: train_score mean is 0.7199185891108568, test_score mean is 0.6758560447239692
Cross-validation for 113-th subset using   KNN: train_score mean is 0.8897223119993709, test_score mean is 0.8063591893780574




Cross-validation for 114-th subset using LogRg: train_score mean is 0.9043422052567675, test_score mean is 0.8549965059399021
Cross-validation for 114-th subset using    NB: train_score mean is 0.7518754203754335, test_score mean is 0.7175751222921034
Cross-validation for 114-th subset using   KNN: train_score mean is 0.904554467553568, test_score mean is 0.8439203354297694




Cross-validation for 115-th subset using LogRg: train_score mean is 0.8966147220936227, test_score mean is 0.8343815513626834
Cross-validation for 115-th subset using    NB: train_score mean is 0.7550069443837841, test_score mean is 0.6986023759608665
Cross-validation for 115-th subset using   KNN: train_score mean is 0.8847105633249186, test_score mean is 0.8081761006289309




Cross-validation for 116-th subset using LogRg: train_score mean is 0.8959888540456497, test_score mean is 0.8475541579315164
Cross-validation for 116-th subset using    NB: train_score mean is 0.7387173417422979, test_score mean is 0.7042627533193571
Cross-validation for 116-th subset using   KNN: train_score mean is 0.9028808273862039, test_score mean is 0.8175401816911251




Cross-validation for 117-th subset using LogRg: train_score mean is 0.899122998576183, test_score mean is 0.834346610761705
Cross-validation for 117-th subset using    NB: train_score mean is 0.7341191988190181, test_score mean is 0.7004891684136967
Cross-validation for 117-th subset using   KNN: train_score mean is 0.8987028415195537, test_score mean is 0.817330538085255




Cross-validation for 118-th subset using LogRg: train_score mean is 0.8801146041701242, test_score mean is 0.808211041229909
Cross-validation for 118-th subset using    NB: train_score mean is 0.7132332002690402, test_score mean is 0.6494409503843466
Cross-validation for 118-th subset using   KNN: train_score mean is 0.8897240590141596, test_score mean is 0.7932564640111811




Cross-validation for 119-th subset using LogRg: train_score mean is 0.8859640464356531, test_score mean is 0.8137316561844864
Cross-validation for 119-th subset using    NB: train_score mean is 0.7788235602414374, test_score mean is 0.7100279524807827
Cross-validation for 119-th subset using   KNN: train_score mean is 0.8945252924066004, test_score mean is 0.8174703004891685




Cross-validation for 120-th subset using LogRg: train_score mean is 0.9101916475222964, test_score mean is 0.8550663871418589
Cross-validation for 120-th subset using    NB: train_score mean is 0.7817380176623194, test_score mean is 0.7419986023759609
Cross-validation for 120-th subset using   KNN: train_score mean is 0.9122819507167129, test_score mean is 0.8570929419986024
Cross-validation for 121-th subset using LogRg: train_score mean is 0.8821943282194328, test_score mean is 0.8117051013277429
Cross-validation for 121-th subset using    NB: train_score mean is 0.7225344947062797, test_score mean is 0.6737945492662474
Cross-validation for 121-th subset using   KNN: train_score mean is 0.8821930124645843, test_score mean is 0.8174004192872119




In [27]:
#Rank the predicted probability
Final_All_ZipCode_prediction.sort_values('y_pred_prob',inplace=True,ascending=False)
Final_All_ZipCode_prediction['y_pred_prob']=Final_All_ZipCode_prediction['y_pred_prob']/Model_performance.iloc[:,0].sum()
print(Final_All_ZipCode_prediction.head(20).to_string(index=False))

ZipCode  y_pred_prob
  10017     0.988455
  10036     0.988185
  60611     0.987978
  43215     0.987709
  92121     0.987674
  89109     0.987522
  94107     0.987456
  94103     0.987425
  15222     0.987300
  98101     0.987243
  10018     0.987083
  77030     0.986990
  80112     0.986934
  94111     0.986803
  60601     0.986781
   2210     0.986490
  10001     0.986470
   2110     0.986403
  10016     0.986232
  30339     0.986153
