#  <span style="color:orange">Binary Classification Tutorial (CLF102) - Level Intermediate</span>

**Created using: PyCaret 2.2** <br />
**Date Updated: November 20, 2020**

# 1.0 Tutorial Objective
Welcome to the Binary Classification Tutorial **(CLF102)** - Level Intermediate. This tutorial assumes that you have completed __[Binary Classification Tutorial (CLF101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Binary%20Classification%20Tutorial%20Level%20Beginner%20-%20%20CLF101.ipynb)__. If you haven't used PyCaret before and this is your first tutorial, we strongly recommend you to go back and progress through the beginner tutorial to understand the basics of working in PyCaret.

In this tutorial we will use the `pycaret.classification` module to learn:

* **Normalization:**  How to normalize and scale the dataset
* **Transformation:**  How to apply transformations that make the data linear and approximately normal
* **Ignore Low Variance:**  How to remove features with statistically insignificant variances to make the experiment more efficient
* **Remove Multi-collinearity:**  How to remove multi-collinearity from the dataset to boost performance of Linear algorithms
* **Group Features:**  How to extract statistical information from related features in the dataset
* **Bin Numeric Variables:**  How to bin numeric variables and transform numeric features into categorical ones using 'sturges' rule
* **Model Ensembling and Stacking:**  How to boost model performance using several ensembling techniques such as Bagging, Boosting, Soft/hard Voting and Generalized Stacking
* **Model Calibration:**  How to calibrate probabilities of a classification model
* **Experiment Logging:**  How to log experiments in PyCaret using MLFlow backend

Read Time : Approx 60 Minutes


## 1.1 Installing PyCaret
If you haven't installed PyCaret yet, please follow the link to __[Beginner's Tutorial](https://github.com/pycaret/pycaret/blob/master/tutorials/Binary%20Classification%20Tutorial%20Level%20Beginner%20-%20%20CLF101.ipynb)__ for instructions on how to install.

## 1.2 Pre-Requisites
- Python 3.6 or greater
- PyCaret 2.0 or greater
- Internet connection to load data from pycaret's repository
- Completion of Binary Classification Tutorial (CLF101) - Level Beginner

## 1.3 For Google colab users:
If you are running this notebook on Google colab, run the following code at top of your notebook to display interactive visuals.<br/>
<br/>
`from pycaret.utils import enable_colab` <br/>
`enable_colab()`

## 1.4 See also:
- __[Binary Classification Tutorial (CLF101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Binary%20Classification%20Tutorial%20Level%20Beginner%20-%20%20CLF101.ipynb)__
- __[Binary Classification Tutorial (CLF103) - Level Expert](https://github.com/pycaret/pycaret/blob/master/tutorials/Binary%20Classification%20Tutorial%20Level%20Expert%20-%20CLF103.ipynb)__

# 2.0 Brief overview of techniques covered in this tutorial
Before we get into the practical execution of the techniques mentioned above in the Section 1, it is important to understand what these techniques are and when to use them. More often than not most of these techniques will help linear and parametric algorithms, however it is not suprising to also see performance gains in tree-based models. The below explanations are only brief and we recommend that you to do extra reading to dive deeper and get a more thorough understanding of these techniques.

- **Normalization:** Normalization / Scaling (often used interchangeably with standardization) is used to transform the actual values of numeric variables in a way that provides helpful properties for machine learning. Many algorithms such as Logistic Regression, Support Vector Machine, K Nearest Neighbors and Naive Bayes assume that all features are centered around zero and have variances that are at at the same level of order. If a particular feature in a dataset has a variance that is larger in order of magnitude than other features, the model may not understand all features correctly and could perform poorly. For instance, in the dataset we are using for this example the `AGE` feature ranges between 21 to 79 while other numeric features range from 10,000 to 1,000,000. __[Read more](https://sebastianraschka.com/Articles/2014_about_feature_scaling.html#z-score-standardization-or-min-max-scaling)__ <br/>
<br/>
- **Transformation:** While normalization transforms the range of data to remove the impact of magnitude in variance, transformation is a more radical technique as it changes the shape of the distribution so that transformed data can be represented by a normal or approximate normal distirbution. In general, you should transform the data if using algorithms that assume normality or a gaussian distribution. Examples of such models are Logistic Regression, Linear Discriminant Analysis (LDA) and Gaussian Naive Bayes. (Pro tip: any method with “Gaussian” in the name probably assumes normality.) __[Read more](https://en.wikipedia.org/wiki/Power_transform)__<br/>
<br/>
- **Ignore Low Variance:** Datasets can sometimes contain categorical features that have a single unique or small number of values across samples. This kind of features are not only non-informative and add no value but are also sometimes harmful for few algorithms. Imagine a feature with only one unique value or few dominant unique values accross samples, they can be removed from the dataset by using the ignore low variance feature in PyCaret. <br/>
<br/>
- **Multi-collinearity:** Multi-collinearity is a state of very high intercorrelations or inter-associations among the independent features in the dataset. It is a type of disturbance in the data that is not handled well by machine learning models (mostly linear algorithms). Multi-collinearity may reduce overall coefficient of the model and cause unpredictable variance. This will lead to overfitting where the model may do great on a known training set but will fail with an unknown testing set. __[Read more](https://towardsdatascience.com/multicollinearity-in-data-science-c5f6c0fe6edf)__<br/>
<br/>
- **Group Features:** Sometimes datasets may contain features that are related at a sample level. For example in the `credit` dataset there are features called `BILL_AMT1 .. BILL_AMT6` which are related in such a way that `BILL_AMT1` is the amount of the bill 1 month ago and `BILL_AMT6` is the amount of the bill 6 months ago. Such features can be used to extract additional features based on the statistical properties of the distribution such as mean, median, variance, standard deviation etc. <br/>
<br/>
- **Bin Numeric Variables:** Binning or discretization is the process of transforming numerical variables into categorical features. An example would be the Age variable which is a continious distribution of numeric values that can be discretized into intervals (10-20 years, 21-30 etc.). Binning may improve the accuracy of a predictive model by reducing the noise or non-linearity in the data. PyCaret automatically determines the number and size of bins using Sturges rule.  __[Read more](https://www.vosesoftware.com/riskwiki/Sturgesrule.php)__<br/>
<br/>
- **Model Ensembling and Stacking:** Ensemble modeling is a process where multiple diverse models are created to predict an outcome. This is achieved either by using many different modeling algorithms or using different samples of training data sets. The ensemble model then aggregates the predictions of each base model resulting in one final prediction for the unseen data. The motivation for using ensemble models is to reduce the generalization error of the prediction. As long as the base models are diverse and independent, the prediction error of the model decreases when the ensemble approach is used. The two most common methods in ensemble learning are `Bagging` and `Boosting`. Stacking is also a type of ensemble learning where predictions from multiple models are used as input features for a meta model that predicts the final outcome. __[Read more](https://blog.statsbot.co/ensemble-learning-d1dcd548e936)__<br/>
<br/>

# 3.0 Dataset for the Tutorial

For this tutorial we will be using the same dataset that was used in __[Binary Classification Tutorial (CLF101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Binary%20Classification%20Tutorial%20Level%20Beginner%20-%20%20CLF101.ipynb)__

#### Dataset Acknowledgements:
Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

The original dataset and data dictionary can be __[found here](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients)__ at the UCI Machine Learning Repository.

# 4.0 Getting the Data

You can download the data from the original source __[found here](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients)__ and load it using the pandas read_csv function or you can use PyCaret's data respository to load the data using the get_data function (This will require an internet connection).

In [1]:
# from pycaret.datasets import get_data
# dataset = get_data('credit', profile=True)
import pandas as pd

dataset = pd.read_excel('C:/Users/Stavros/Desktop/0. Dummy/0_Dummy.xlsx')

Notice that when the `profile` parameter is to `True`, it displays a data profile for exploratory data analysis. Several pre-processing steps as discussed in section 2 above will be performed in this experiment based on this analysis. Let's summarize how the profile has helped make critical pre-processing choices with the data.

- **Missing Values:** There are no missing values in the data. However, we still need imputers in our pipeline just in case the new unseen data has missing values (not applicable in this case). When you execute the `setup()` function, imputers are created and stored in the pipeline automatically. By default, it uses a mean imputer for numeric values and a constant imputer for categorical. This can be changed using the `numeric_imputation` and `categorical_imputation` parameters in `setup()`. <br/>
<br/>
- **Multicollinearity:** There are high correlations between `BILL_AMT1 ... BIL_AMT6` which introduces multicollinearity into the data. We will remove multi-collinearity by using the `remove_multicollinearity` and `multicollinearity_threshold` parameters in setup. <br/>
<br/>
- **Data Scale / Range:** Notice how the scale / range of numeric features are different. For example the `AGE` feature ranges from between 21 to 79 and `BILL_AMT1` ranges from -165,580 to 964,511. This may cause problems for algorithms that assume all features have variance within the same order. In this case, the order of magnitude for `BILL_AMT1` is widely different than `AGE`. We will deal with this problem by using the `normalize` parameter in setup. <br/>
<br/>
- **Distribution of Feature Space:** Numeric features are not normally distributed. Look at the distributions of `LIMIT_BAL`, `BILL_AMT1` and `PAY_AMT1 ... PAY_AMT6`. A few features are also highly skewed such as `PAY_AMT1`. This may cause problems for algorithms that assume normal or approximate normal distributions of the data. Examples include Logistic Regression, Linear Discriminant Analysis (LDA) and Naive Bayes.  We will deal with this problem by using the `transformation` parameter in setup. <br/>
<br/>
- **Group Features:** From the data description we know that certain features are related with each other such as `BILL_AMT1 ... BILL_AMT6` and `PAY_AMT1 ... PAY_AMT6`. We will use the `group_features` parameter in setup to extract statistical information from these features.  <br/>
<br/>
- **Bin Numeric Features:** When looking at the correlations between the numeric features and the target variable, we that `AGE` and `LIMIT_BAL` are weak. We will use the `bin_numeric_features` parameter to remove the noise from these variables which may help linear algorithms. <br/>

In [2]:
#check the shape of data
dataset.shape

(7379, 208)

In order to demonstrate the `predict_model()` function on unseen data, a sample of 1200 rows has been withheld from the original dataset to be used for predictions. This should not be confused with a train/test split as this particular split is performed to simulate a real life scenario. Another way to think about this is that these 1200 records were not available at the time when the machine learning experiment was performed.

In [3]:
data = dataset.sample(frac=0.95, random_state=786)
data_unseen = dataset.drop(data.index)

data.reset_index(inplace=True, drop=True)
data_unseen.reset_index(inplace=True, drop=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions ' + str(data_unseen.shape))

Data for Modeling: (7010, 208)
Unseen Data For Predictions (369, 208)


# 5.0 Setting up Environment in PyCaret

In the previous tutorial __[Binary Classification (CLF101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Binary%20Classification%20Tutorial%20Level%20Beginner%20-%20%20CLF101.ipynb)__ we learned how to initialize the environment in pycaret using `setup()`. No additional parameters were passed in our last example as we did not perform any pre-processing steps. In this example we will take it to the next level by customizing the pre-processing pipeline using `setup()`. Let's look at how to implement all the steps discussed in section 4 above.

In [4]:
from pycaret.classification import *

In [5]:
data.head(10)

Unnamed: 0.1,Unnamed: 0,gmDate,teamAbbr,teamConf,teamDiv,teamRslt,teamDayOff,opptAbbr,opptConf,opptDiv,opptDayOff,seasonID,oppt2P%_cum_mean,opptAway2P%_cum_mean,oppt2PA_cum_mean,opptAway2PA_cum_mean,oppt2PM_cum_mean,opptAway2PM_cum_mean,oppt3P%_cum_mean,opptAway3P%_cum_mean,oppt3PA_cum_mean,opptAway3PA_cum_mean,oppt3PM_cum_mean,opptAway3PM_cum_mean,opptAR_cum_mean,opptAwayAR_cum_mean,opptASST%_cum_mean,opptAwayASST%_cum_mean,opptAST_cum_mean,opptAwayAST_cum_mean,opptAST/TO_cum_mean,opptAwayAST/TO_cum_mean,opptBLK_cum_mean,opptAwayBLK_cum_mean,opptBLK%_cum_mean,opptAwayBLK%_cum_mean,opptBLKR_cum_mean,opptAwayBLKR_cum_mean,opptDRB_cum_mean,opptAwayDRB_cum_mean,opptDREB%_cum_mean,opptAwayDREB%_cum_mean,opptDrtg_cum_mean,opptAwayDrtg_cum_mean,opptEDiff_cum_mean,opptAwayEDiff_cum_mean,opptEFG%_cum_mean,opptAwayEFG%_cum_mean,opptFG%_cum_mean,opptAwayFG%_cum_mean,opptFGA_cum_mean,opptAwayFGA_cum_mean,opptFGM_cum_mean,opptAwayFGM_cum_mean,opptFIC_cum_mean,opptAwayFIC_cum_mean,opptFIC40_cum_mean,opptAwayFIC40_cum_mean,opptFT%_cum_mean,opptAwayFT%_cum_mean,opptFTA_cum_mean,opptAwayFTA_cum_mean,opptFTM_cum_mean,opptAwayFTM_cum_mean,opptORB_cum_mean,opptAwayORB_cum_mean,opptOREB%_cum_mean,opptAwayOREB%_cum_mean,opptOrtg_cum_mean,opptAwayOrtg_cum_mean,opptPF_cum_mean,opptAwayPF_cum_mean,opptPPS_cum_mean,opptAwayPPS_cum_mean,opptPTS_cum_mean,opptAwayPTS_cum_mean,opptPTS1_cum_mean,opptAwayPTS1_cum_mean,opptPTS2_cum_mean,opptAwayPTS2_cum_mean,opptPTS3_cum_mean,opptAwayPTS3_cum_mean,opptPTS4_cum_mean,opptAwayPTS4_cum_mean,opptPTS5_cum_mean,opptAwayPTS5_cum_mean,opptPTS6_cum_mean,opptAwayPTS6_cum_mean,opptPTS7_cum_mean,opptAwayPTS7_cum_mean,opptPTS8_cum_mean,opptAwayPTS8_cum_mean,opptPlay%_cum_mean,opptAwayPlay%_cum_mean,opptSTL_cum_mean,opptAwaySTL_cum_mean,opptSTL%_cum_mean,opptAwaySTL%_cum_mean,opptSTL/TO_cum_mean,opptAwaySTL/TO_cum_mean,opptTO_cum_mean,opptAwayTO_cum_mean,opptTO%_cum_mean,opptAwayTO%_cum_mean,opptTRB_cum_mean,opptAwayTRB_cum_mean,opptTREB%_cum_mean,opptAwayTREB%_cum_mean,opptTS%_cum_mean,opptAwayTS%_cum_mean,team2P%_cum_mean,teamHome2P%_cum_mean,team2PA_cum_mean,teamHome2PA_cum_mean,team2PM_cum_mean,teamHome2PM_cum_mean,team3P%_cum_mean,teamHome3P%_cum_mean,team3PA_cum_mean,teamHome3PA_cum_mean,team3PM_cum_mean,teamHome3PM_cum_mean,teamAR_cum_mean,teamHomeAR_cum_mean,teamASST%_cum_mean,teamHomeASST%_cum_mean,teamAST_cum_mean,teamHomeAST_cum_mean,teamAST/TO_cum_mean,teamHomeAST/TO_cum_mean,teamBLK_cum_mean,teamHomeBLK_cum_mean,teamBLK%_cum_mean,teamHomeBLK%_cum_mean,teamBLKR_cum_mean,teamHomeBLKR_cum_mean,teamDRB_cum_mean,teamHomeDRB_cum_mean,teamDREB%_cum_mean,teamHomeDREB%_cum_mean,teamDrtg_cum_mean,teamHomeDrtg_cum_mean,teamEDiff_cum_mean,teamHomeEDiff_cum_mean,teamEFG%_cum_mean,teamHomeEFG%_cum_mean,teamFG%_cum_mean,teamHomeFG%_cum_mean,teamFGA_cum_mean,teamHomeFGA_cum_mean,teamFGM_cum_mean,teamHomeFGM_cum_mean,teamFIC_cum_mean,teamHomeFIC_cum_mean,teamFIC40_cum_mean,teamHomeFIC40_cum_mean,teamFT%_cum_mean,teamHomeFT%_cum_mean,teamFTA_cum_mean,teamHomeFTA_cum_mean,teamFTM_cum_mean,teamHomeFTM_cum_mean,teamORB_cum_mean,teamHomeORB_cum_mean,teamOREB%_cum_mean,teamHomeOREB%_cum_mean,teamOrtg_cum_mean,teamHomeOrtg_cum_mean,teamPF_cum_mean,teamHomePF_cum_mean,teamPPS_cum_mean,teamHomePPS_cum_mean,teamPTS_cum_mean,teamHomePTS_cum_mean,teamPTS1_cum_mean,teamHomePTS1_cum_mean,teamPTS2_cum_mean,teamHomePTS2_cum_mean,teamPTS3_cum_mean,teamHomePTS3_cum_mean,teamPTS4_cum_mean,teamHomePTS4_cum_mean,teamPTS5_cum_mean,teamHomePTS5_cum_mean,teamPTS6_cum_mean,teamHomePTS6_cum_mean,teamPTS7_cum_mean,teamHomePTS7_cum_mean,teamPTS8_cum_mean,teamHomePTS8_cum_mean,teamPlay%_cum_mean,teamHomePlay%_cum_mean,teamSTL_cum_mean,teamHomeSTL_cum_mean,teamSTL%_cum_mean,teamHomeSTL%_cum_mean,teamSTL/TO_cum_mean,teamHomeSTL/TO_cum_mean,teamTO_cum_mean,teamHomeTO_cum_mean,teamTO%_cum_mean,teamHomeTO%_cum_mean,teamTRB_cum_mean,teamHomeTRB_cum_mean,teamTREB%_cum_mean,teamHomeTREB%_cum_mean,teamTS%_cum_mean,teamHomeTS%_cum_mean
0,7284,2018-03-30,LAL,West,Pacific,0,2,MIL,East,Central,1,2017-2018,0.532839,0.526911,57.626667,57.918919,30.706667,30.486486,0.359399,0.36203,24.6,25.081081,8.786667,9.054054,17.70356,17.289803,58.062136,56.611346,23.04,22.594595,1.781512,1.714849,5.48,5.027027,5.712575,5.201716,9.589608,8.728824,31.2,31.621622,77.005188,79.182773,110.0924,109.688932,-0.21744,-1.233751,0.533875,0.5306,0.480481,0.476162,82.226667,83.0,39.493333,39.540541,79.846667,78.280405,66.171797,64.671541,0.780212,0.774722,23.44,22.189189,18.266667,17.108108,8.213333,8.459459,20.335789,20.817227,109.87496,108.455181,21.333333,21.675676,1.293247,1.270114,106.04,105.243243,26.64,26.918919,27.226667,27.162162,26.426667,25.540541,25.213333,24.72973,0.533333,0.891892,0.0,0.0,0.0,0.0,0.0,0.0,0.449292,0.444227,8.68,9.027027,8.971231,9.267422,65.773107,67.073489,13.88,14.378378,13.022292,13.399392,39.413333,40.081081,48.386076,49.026557,0.573208,0.567503,0.519139,0.524151,59.72973,60.371429,30.932432,31.542857,0.338799,0.326191,29.040541,29.057143,10.0,9.542857,17.124534,17.603383,58.080454,59.725294,23.797297,24.6,1.630381,1.739494,4.77027,4.971429,4.684493,4.865029,8.113081,8.364314,35.513514,37.4,77.873953,79.529594,108.075846,103.964083,-1.368905,2.006906,0.51808,0.513434,0.461512,0.459654,88.77027,89.428571,40.932432,41.085714,80.525338,83.110714,66.492428,68.705709,0.711828,0.709043,23.256757,22.2,16.486486,15.742857,10.756757,10.857143,23.508808,23.393957,106.706941,105.970989,21.189189,20.171429,1.223146,1.204049,108.351351,107.457143,26.756757,25.571429,28.189189,27.714286,27.054054,27.571429,25.297297,25.571429,0.932432,1.028571,0.121622,0.0,0.0,0.0,0.0,0.0,0.436597,0.437114,7.824324,8.114286,7.711151,7.976983,53.007884,56.79358,15.716216,15.457143,13.713054,13.501186,46.27027,48.257143,50.718238,51.915451,0.547909,0.542386
1,3511,2015-03-24,DET,East,Central,1,2,TOR,East,Atlantic,2,2014-2015,0.497651,0.493124,58.7,61.205882,29.171429,30.147059,0.349719,0.340009,25.1,24.941176,8.814286,8.5,16.167671,15.307447,55.058577,51.506665,20.842857,19.852941,1.805083,1.910071,4.4,4.352941,4.658199,4.579856,7.5905,7.226156,30.714286,29.323529,74.035896,72.120124,107.614543,110.098006,3.386534,1.594165,0.50642,0.498465,0.453734,0.448856,83.8,86.147059,37.985714,38.647059,75.491071,74.180147,62.368884,60.904035,0.788433,0.793794,24.842857,25.441176,19.557143,20.0,10.8,11.352941,26.306204,27.879876,111.001077,111.692171,21.014286,22.205882,1.249699,1.233335,104.342857,105.794118,25.357143,25.441176,26.671429,27.911765,25.685714,25.617647,25.671429,25.147059,0.957143,1.676471,0.0,0.0,0.0,0.0,0.0,0.0,0.442913,0.446221,7.614286,8.058824,8.063687,8.466729,64.138899,73.399297,12.871429,11.970588,11.943071,10.927229,41.514286,40.676471,49.302717,47.786709,0.551499,0.544335,0.465929,0.472729,61.171429,60.114286,28.357143,28.257143,0.330859,0.341337,25.071429,25.6,8.414286,8.857143,16.177067,17.13808,57.847986,60.95426,21.242857,22.6,1.766291,2.043283,4.614286,4.428571,4.891464,4.701123,7.67,7.476949,32.471429,31.857143,73.663371,76.3898,106.03466,107.932146,-1.725946,-0.858297,0.476047,0.485703,0.427067,0.433909,86.242857,85.714286,36.771429,37.114286,72.580357,75.871429,59.985767,62.748206,0.708637,0.716183,22.7,24.514286,16.057143,17.371429,12.971429,12.342857,27.7188,26.374543,104.308714,107.073849,19.042857,18.628571,1.139817,1.175554,98.014286,100.457143,24.428571,25.857143,24.042857,23.857143,24.514286,25.885714,24.085714,24.257143,0.8,0.314286,0.142857,0.285714,0.0,0.0,0.0,0.0,0.423776,0.43244,7.642857,7.428571,8.128647,7.903023,61.208199,65.584129,13.542857,12.485714,12.311263,11.405577,45.442857,44.2,50.475293,50.056103,0.510054,0.521537
2,4645,2016-03-09,SAC,West,Pacific,0,2,CLE,East,Central,2,2015-2016,0.509876,0.496941,55.612903,54.965517,28.306452,27.310345,0.356418,0.349303,28.16129,28.310345,10.064516,9.862069,17.20759,15.7873,58.310524,54.254062,22.419355,20.137931,1.803202,1.516707,3.66129,3.310345,3.875524,3.491979,6.717771,6.304186,34.032258,34.413793,77.881516,77.201876,103.52294,101.813407,6.525134,4.270234,0.519965,0.508031,0.459484,0.4483,83.774194,83.275862,38.370968,37.172414,77.443548,71.5,64.062237,58.569845,0.744721,0.73001,22.193548,21.724138,16.419355,15.862069,10.774194,10.344828,25.012842,22.798124,110.048074,106.083641,20.5,21.068966,1.238871,1.208131,103.225806,100.068966,25.66129,25.275862,26.129032,25.724138,25.677419,24.413793,24.951613,22.931034,0.66129,1.413793,0.145161,0.310345,0.0,0.0,0.0,0.0,0.444166,0.428766,6.870968,6.896552,7.286603,7.294497,55.163208,52.801862,13.629032,13.931034,12.69589,13.065866,44.806452,44.758621,52.242139,51.334,0.553513,0.540931,0.503202,0.503037,63.790323,63.0,31.983871,31.6,0.357247,0.352383,22.645161,22.6,8.080645,7.966667,17.808184,17.22381,62.364161,61.0086,24.903226,24.0,1.575469,1.490483,4.403226,3.8,4.345981,3.731053,6.977202,6.12505,34.209677,34.3,76.931065,77.05902,108.3325,107.092967,-2.470353,-1.374403,0.511642,0.510043,0.464379,0.462677,86.435484,85.6,40.064516,39.566667,79.885081,78.033333,66.013419,64.54167,0.717521,0.713373,25.854839,27.133333,18.677419,19.533333,10.741935,11.066667,23.99731,24.85962,105.862147,105.718563,20.66129,19.9,1.244073,1.253583,106.887097,106.633333,27.048387,26.966667,26.516129,25.766667,26.048387,26.933333,26.612903,26.533333,0.403226,0.3,0.258065,0.133333,0.0,0.0,0.0,0.0,0.43405,0.43121,8.822581,8.3,8.708532,8.208563,55.611427,50.782187,16.677419,17.166667,14.581548,15.008823,44.951613,45.366667,50.320532,51.052507,0.547527,0.54752
3,4741,2016-03-22,NO,West,Southwest,0,2,MIA,East,Southeast,3,2015-2016,0.505893,0.489188,62.971014,64.060606,31.753623,31.333333,0.332657,0.321588,18.086957,17.454545,6.101449,5.848485,16.419051,16.866042,54.965113,57.892394,20.855072,21.575758,1.602801,1.597282,6.623188,6.575758,7.009442,7.010855,10.701436,10.426494,34.608696,34.69697,78.712804,78.644948,104.043552,104.229012,1.365926,-1.891721,0.50692,0.493079,0.4687,0.456706,81.057971,81.515152,37.855072,37.181818,76.190217,73.742424,63.100761,61.150658,0.750254,0.757403,23.463768,20.909091,17.507246,15.848485,9.623188,9.454545,22.117655,21.355052,105.409478,102.337291,18.115942,17.151515,1.231814,1.182618,99.318841,96.060606,24.695652,23.787879,24.782609,24.666667,23.956522,23.242424,25.144928,23.939394,0.73913,0.424242,0.0,0.0,0.0,0.0,0.0,0.0,0.4424,0.428597,6.637681,6.333333,7.011667,6.690876,51.886933,46.984639,14.173913,14.727273,13.45993,13.985506,44.231884,44.151515,51.657306,51.436348,0.545329,0.530139,0.48443,0.489441,62.043478,61.882353,30.0,30.235294,0.352138,0.389303,24.275362,24.735294,8.608696,9.617647,16.657975,17.958847,56.829516,60.039738,21.913043,23.823529,1.87931,2.101485,4.565217,5.088235,4.705257,5.272385,7.535039,8.370868,33.231884,32.794118,79.15492,79.449609,109.12008,109.669497,-3.194074,0.398947,0.498042,0.516785,0.448103,0.461074,86.318841,86.617647,38.608696,39.852941,75.057971,82.238971,62.212048,68.335062,0.784093,0.788612,22.347826,22.264706,17.449275,17.352941,9.463768,9.794118,21.534184,21.948868,105.926006,110.068444,21.086957,20.441176,1.202641,1.237138,103.275362,106.676471,26.086957,27.117647,25.637681,26.558824,25.275362,26.117647,25.637681,26.617647,0.637681,0.264706,0.0,0.0,0.0,0.0,0.0,0.0,0.429439,0.447379,7.826087,8.205882,7.992806,8.397894,64.747119,68.424397,13.246377,12.411765,12.060003,11.348015,42.695652,42.588235,48.876839,49.429091,0.537859,0.553921
4,3463,2015-03-18,PHI,East,Atlantic,1,2,DET,East,Central,1,2014-2015,0.466933,0.462261,61.089552,62.212121,28.373134,28.636364,0.329599,0.319982,25.029851,24.454545,8.373134,7.939394,16.112315,15.2074,57.643509,54.313709,21.149254,19.818182,1.70039,1.497309,4.477612,4.575758,4.753779,4.865873,7.468085,7.51647,32.19403,32.727273,73.563646,70.840303,106.356039,104.49937,-2.001124,-2.618439,0.476261,0.468482,0.427434,0.42243,86.119403,86.666667,36.746269,36.575758,72.186567,69.098485,59.736521,57.238424,0.706118,0.698288,22.835821,20.666667,16.134328,14.575758,13.029851,13.545455,27.931724,29.159697,104.354915,101.88093,19.014925,19.454545,1.141422,1.106661,98.0,95.666667,24.373134,22.969697,24.41791,24.69697,24.358209,23.090909,24.119403,24.060606,0.58209,0.848485,0.149254,0.0,0.0,0.0,0.0,0.0,0.424043,0.417439,7.656716,7.939394,8.141975,8.44913,59.394257,57.847597,13.61194,14.515152,12.387096,13.175915,45.223881,46.272727,50.452207,50.841591,0.510339,0.50017,0.449291,0.465006,56.283582,56.323529,25.328358,26.205882,0.309878,0.319197,25.462687,25.941176,8.044776,8.382353,15.6359,17.156247,61.419581,66.244953,20.58209,22.911765,1.209769,1.384871,6.0,6.588235,6.237173,6.840541,10.869669,12.046803,31.298507,32.147059,72.954388,74.079818,104.418143,102.808094,-10.176487,-4.970176,0.458775,0.474074,0.409272,0.422718,81.746269,82.264706,33.373134,34.588235,63.229478,69.878676,52.309442,57.877015,0.677846,0.705556,23.985075,24.205882,16.164179,17.058824,11.641791,11.382353,26.38287,24.614191,94.241657,97.837918,21.313433,21.852941,1.11721,1.159753,90.955224,94.617647,22.238806,24.176471,22.656716,23.058824,23.208955,23.029412,22.477612,24.088235,0.373134,0.264706,0.0,0.0,0.0,0.0,0.0,0.0,0.378434,0.391409,9.746269,9.647059,10.103976,9.960015,57.157112,58.683779,18.208955,17.676471,16.485782,16.017844,42.940299,43.529412,48.169725,49.048065,0.494037,0.511841
5,661,2013-01-28,CHI,East,Central,1,2,CHA,East,Southeast,2,2012-2013,0.441914,0.437247,65.581395,65.473684,29.093023,28.789474,0.351256,0.335158,16.232558,16.631579,5.697674,5.578947,14.963977,13.791289,54.774965,50.485695,19.069767,17.263158,1.388535,1.335695,6.534884,5.736842,7.020091,6.234768,10.079186,8.840232,29.55814,29.105263,71.070288,69.851747,111.202621,113.925926,-8.818902,-12.052489,0.460074,0.452658,0.425321,0.418679,81.813953,82.105263,34.790698,34.368421,66.494186,63.546053,54.772702,52.135484,0.753621,0.7291,26.418605,26.157895,19.837209,18.894737,11.581395,12.631579,27.435342,30.148253,102.383719,101.873437,19.697674,19.894737,1.164716,1.137058,95.116279,93.210526,22.302326,21.578947,24.604651,24.947368,23.651163,21.789474,23.44186,23.421053,0.906977,1.0,0.209302,0.473684,0.0,0.0,0.0,0.0,0.412195,0.413837,7.348837,7.842105,7.857444,8.494905,52.799798,58.962458,14.209302,13.631579,13.201916,12.760911,41.139535,41.736842,48.050007,47.661716,0.509312,0.497847,0.457102,0.434312,67.44186,68.96,30.627907,29.84,0.359842,0.389344,13.302326,13.08,4.744186,5.08,17.658721,17.660368,64.139258,65.370532,22.72093,22.84,1.642202,1.68424,5.325581,6.44,5.932556,7.211396,7.934065,9.430368,31.534884,31.8,73.318923,72.960956,100.69803,98.355844,2.519251,3.845352,0.470202,0.458444,0.440756,0.42754,80.744186,82.04,35.372093,34.92,70.645349,71.26,58.361947,59.200556,0.781044,0.77492,22.837209,21.76,17.860465,16.92,12.395349,13.08,28.820205,30.718344,103.217281,102.201196,19.813953,19.4,1.165814,1.128028,93.348837,91.84,24.209302,23.2,22.44186,22.64,23.186047,23.0,22.651163,22.76,0.860465,0.24,0.0,0.0,0.0,0.0,0.0,0.0,0.42566,0.418644,7.139535,7.16,7.885453,7.931756,51.33967,52.91204,14.906977,14.56,14.123091,13.720048,43.930233,44.88,51.815,52.053256,0.516042,0.503152
6,6704,2018-01-03,WAS,East,Southeast,1,3,NY,East,Atlantic,0,2017-2018,0.503724,0.48644,64.324324,67.066667,32.162162,32.533333,0.352746,0.308627,21.675676,20.733333,7.675676,6.6,17.055051,17.892467,57.010843,61.80848,22.756757,24.266667,1.587608,1.73094,4.891892,4.866667,5.089073,4.997227,7.779584,7.43742,33.756757,33.066667,76.059278,75.653653,107.6336,110.263893,-0.156903,-9.31756,0.511243,0.484327,0.465778,0.4466,86.0,87.8,39.837838,39.133333,76.422297,71.9,63.470597,59.849127,0.798819,0.799193,19.972973,17.2,15.918919,13.6,10.918919,10.6,25.930727,24.346347,107.476697,100.946333,20.783784,19.733333,1.212308,1.12424,103.27027,98.466667,26.216216,25.8,25.243243,23.866667,25.081081,23.866667,26.351351,24.933333,0.378378,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.441408,0.421867,6.972973,6.6,7.2491,6.779093,47.981405,47.67908,15.324324,15.533333,13.945868,13.99842,44.675676,43.666667,52.058227,50.86804,0.547668,0.517267,0.502278,0.535094,59.621622,57.944444,29.783784,30.944444,0.364835,0.363517,26.594595,27.944444,9.783784,10.277778,17.002195,17.393044,57.120114,56.645356,22.702703,23.333333,1.765722,1.690589,4.324324,5.166667,4.384595,5.216906,7.267124,8.879267,33.405405,34.722222,77.821746,79.325689,106.385008,105.30875,2.299824,6.387011,0.516616,0.540561,0.459708,0.480872,86.216216,85.888889,39.567568,41.222222,78.307432,84.25,64.849378,70.125617,0.756865,0.771572,22.783784,22.555556,17.351351,17.333333,10.0,10.166667,24.03017,24.481028,108.684832,111.695761,21.72973,21.722222,1.23663,1.285628,106.27027,110.055556,27.405405,28.777778,26.405405,26.722222,26.027027,27.555556,25.972973,27.0,0.459459,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.439586,0.455561,7.810811,8.0,7.974765,8.103256,59.549573,55.737111,13.783784,14.777778,12.527989,13.358417,43.405405,44.888889,49.958903,52.573828,0.553059,0.575172
7,5549,2017-01-18,HOU,West,Southwest,1,1,MIL,East,Central,2,2016-2017,0.519593,0.511756,60.25,59.277778,31.175,30.333333,0.376245,0.365161,23.125,23.833333,8.575,8.5,18.792552,19.009289,62.32562,63.761694,24.85,24.888889,1.960425,1.985689,5.5,5.055556,5.751752,5.413061,9.408658,8.754056,33.025,33.111111,77.311355,76.913356,108.452825,108.910811,1.632127,-1.466367,0.530128,0.520778,0.478208,0.469089,83.375,83.111111,39.75,38.833333,83.0625,79.208333,68.89,65.756511,0.756607,0.757872,22.4,19.222222,17.075,14.555556,9.4,9.833333,22.671482,23.086644,110.084952,107.444444,20.0,19.5,1.268535,1.218406,105.15,100.722222,27.925,27.277778,25.5,23.777778,26.45,25.611111,25.05,24.0,0.225,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.454213,0.4462,8.125,7.388889,8.452535,7.855461,61.198413,56.724,13.8,14.222222,12.852635,13.352217,42.425,42.944444,49.990808,49.412067,0.56511,0.55125,0.550611,0.559385,46.840909,45.95,25.704545,25.55,0.371961,0.37069,39.954545,40.75,14.840909,15.1,18.384789,18.669185,63.015736,64.00513,25.568182,26.1,1.8235,1.94741,4.477273,4.6,4.513236,4.63221,9.63985,10.21345,33.454545,34.6,76.482757,77.98528,108.433736,105.87865,6.865891,10.688245,0.55407,0.556625,0.468143,0.46924,86.795455,86.7,40.545455,40.65,89.09375,93.0875,73.745666,77.46233,0.757305,0.76341,24.613636,25.5,18.659091,19.5,10.977273,11.25,25.214736,25.749205,115.299627,116.566895,19.704545,18.25,1.325264,1.340335,114.590909,115.9,30.681818,31.25,28.795455,28.45,28.659091,30.5,25.659091,25.7,0.590909,0.0,0.204545,0.0,0.0,0.0,0.0,0.0,0.444559,0.44859,8.090909,8.1,8.064159,8.06771,55.801355,59.03962,15.522727,15.1,13.703895,13.341815,44.431818,45.85,50.988377,51.92315,0.588032,0.59251
8,5889,2017-03-10,SAC,West,Pacific,0,2,WAS,East,Southeast,2,2016-2017,0.515227,0.514846,62.587302,63.714286,32.063492,32.5,0.37816,0.342875,24.063492,24.785714,9.063492,8.428571,17.697303,17.236561,57.810559,56.517829,23.809524,23.25,1.819829,1.834543,4.253968,4.035714,4.337321,4.08415,6.907533,6.432893,32.619048,33.357143,75.774137,77.175114,108.227108,108.443729,2.616308,-0.553018,0.529014,0.511443,0.476532,0.463825,86.650794,88.5,41.126984,40.928571,81.968254,78.991071,67.752727,65.134946,0.788751,0.768843,21.904762,21.214286,17.253968,16.357143,10.206349,10.0,23.836884,22.824886,110.843416,107.890711,21.222222,21.321429,1.258392,1.208082,108.571429,106.642857,28.31746,28.571429,27.746032,28.535714,26.126984,24.285714,25.555556,24.25,0.809524,0.964286,0.0,0.0,0.0,0.0,0.0,0.0,0.45436,0.445111,8.698413,8.571429,8.841171,8.601318,65.192516,67.382754,14.222222,13.535714,12.881657,12.166664,42.825397,43.357143,49.766876,49.244668,0.565827,0.546332,0.496437,0.503253,57.375,56.25,28.40625,28.3125,0.369316,0.387828,24.75,24.78125,9.09375,9.53125,17.217333,17.567103,59.944002,60.762519,22.390625,22.875,1.627545,1.600497,4.0,3.90625,4.187939,4.070725,7.092273,7.102237,32.03125,31.96875,77.146666,77.376231,110.726453,110.503022,-3.330891,-1.477566,0.513633,0.526372,0.4577,0.467134,82.125,81.03125,37.5,37.84375,74.474609,75.945312,61.379762,62.883441,0.777817,0.806466,23.734375,23.9375,18.53125,19.375,8.984375,8.28125,21.856256,20.629613,107.395562,109.025456,20.765625,20.9375,1.256019,1.295669,102.625,104.59375,25.34375,25.375,25.421875,25.84375,25.25,25.5625,25.609375,27.09375,1.0,0.71875,0.0,0.0,0.0,0.0,0.0,0.0,0.427005,0.428672,8.015625,8.0,8.329441,8.265481,56.811995,53.08575,14.6875,15.46875,13.664088,14.404266,41.015625,40.25,49.151536,49.444628,0.555675,0.571653
9,1782,2014-01-13,TOR,East,Atlantic,1,2,MIL,East,Central,2,2013-2014,0.448286,0.462105,61.861111,61.526316,27.555556,28.210526,0.343853,0.334689,20.638889,21.421053,7.111111,7.157895,15.936792,15.662947,58.742692,56.7629,20.416667,20.157895,1.394786,1.432537,5.444444,5.684211,5.874464,6.092358,8.739444,9.099726,30.138889,29.684211,71.554911,74.2258,107.332364,107.986168,-9.056717,-8.346421,0.465325,0.4717,0.42205,0.428568,82.5,82.947368,34.666667,35.368421,62.777778,64.506579,51.2539,52.430389,0.760908,0.795068,19.722222,20.263158,15.166667,16.210526,11.111111,10.473684,25.914036,25.7742,98.275647,99.639747,20.888889,20.631579,1.116747,1.140805,91.611111,94.105263,22.666667,22.684211,22.305556,20.526316,21.777778,23.631579,23.333333,25.368421,1.333333,1.894737,0.194444,0.0,0.0,0.0,0.0,0.0,0.398403,0.402521,7.055556,7.421053,7.562619,7.888979,49.566747,55.708058,15.75,15.473684,14.741097,14.519663,41.25,40.157895,47.238867,47.204784,0.503733,0.514058,0.466191,0.46485,60.2,58.6875,27.942857,27.375,0.35058,0.33895,22.114286,23.625,7.8,8.1875,15.33378,15.480956,54.979877,57.048687,19.542857,20.1875,1.385031,1.297494,4.8,4.75,5.119971,5.008081,8.096183,8.205781,31.171429,31.625,75.530357,76.939294,102.776231,103.031256,2.513531,2.754812,0.483469,0.482181,0.435651,0.43185,82.314286,82.3125,35.742857,35.5625,68.335714,70.320312,56.271257,57.930181,0.778069,0.784756,24.971429,26.4375,19.457143,20.75,11.714286,12.75,27.379689,29.426431,105.289763,105.786069,22.457143,21.75,1.207269,1.218869,98.742857,100.0625,25.485714,26.4375,23.257143,22.6875,24.457143,24.1875,24.4,25.8125,0.885714,0.9375,0.257143,0.0,0.0,0.0,0.0,0.0,0.42006,0.41625,7.085714,7.3125,7.516826,7.726788,50.13022,46.714575,14.6,15.9375,13.561949,14.522075,42.885714,44.375,50.420677,52.55195,0.531074,0.533031


In [6]:
data.columns.to_list()

['Unnamed: 0',
 'gmDate',
 'teamAbbr',
 'teamConf',
 'teamDiv',
 'teamRslt',
 'teamDayOff',
 'opptAbbr',
 'opptConf',
 'opptDiv',
 'opptDayOff',
 'seasonID',
 'oppt2P%_cum_mean',
 'opptAway2P%_cum_mean',
 'oppt2PA_cum_mean',
 'opptAway2PA_cum_mean',
 'oppt2PM_cum_mean',
 'opptAway2PM_cum_mean',
 'oppt3P%_cum_mean',
 'opptAway3P%_cum_mean',
 'oppt3PA_cum_mean',
 'opptAway3PA_cum_mean',
 'oppt3PM_cum_mean',
 'opptAway3PM_cum_mean',
 'opptAR_cum_mean',
 'opptAwayAR_cum_mean',
 'opptASST%_cum_mean',
 'opptAwayASST%_cum_mean',
 'opptAST_cum_mean',
 'opptAwayAST_cum_mean',
 'opptAST/TO_cum_mean',
 'opptAwayAST/TO_cum_mean',
 'opptBLK_cum_mean',
 'opptAwayBLK_cum_mean',
 'opptBLK%_cum_mean',
 'opptAwayBLK%_cum_mean',
 'opptBLKR_cum_mean',
 'opptAwayBLKR_cum_mean',
 'opptDRB_cum_mean',
 'opptAwayDRB_cum_mean',
 'opptDREB%_cum_mean',
 'opptAwayDREB%_cum_mean',
 'opptDrtg_cum_mean',
 'opptAwayDrtg_cum_mean',
 'opptEDiff_cum_mean',
 'opptAwayEDiff_cum_mean',
 'opptEFG%_cum_mean',
 'opptAwayEFG%_c

In [7]:
data.dtypes.unique()

array([dtype('int64'), dtype('O'), dtype('float64')], dtype=object)

In [8]:
num_col_list = list(data.columns)
num_col_list.remove('teamRslt')
num_col_list.remove('teamConf')
num_col_list.remove('opptConf')
num_col_list.remove('teamAbbr')
num_col_list.remove('opptAbbr')
num_col_list.remove('teamDiv')
num_col_list.remove('opptDiv')
num_col_list.remove('seasonID')
num_col_list.remove('gmDate')

In [26]:
# from mlflow import log_metric, log_param, log_artifacts

In [27]:
# data['gmDate'] = data['gmDate'].astype('string')

In [28]:
# list_season=data["seasonID"].unique()
# list_season = sorted(list_season)
# list_date =data["gmDate"].unique()
# list_date = sorted(list_date)

In [29]:
# ordinal_dict = {"seasonID" : list_season, "gmDate" : list_date}

In [30]:
# # col_list = dataframe.columns.to_list()
# group_list1 = [ ["team"+x[4:],"oppt"+x[4:]] for x in num_col_list if x[:4] == "team" and x[4:8]!="Home"]
# group_list2 = [ ["team"+"Home"+x[8:],"oppt"+"Away"+x[8:]] for x in num_col_list if x[:4] == "team" and x[4:8]=="Home"]
# group_list = group_list1 + group_list2

In [31]:
# exp_clf102 = setup(data = data, target = 'teamRslt', session_id=123,
#                   normalize = True, 
#                   transformation = False, 
#                   ignore_low_variance = False,
#                   remove_multicollinearity = True, multicollinearity_threshold = 0.95,
#                   numeric_features = num_col_list,
#                   categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
#                    data_split_stratify = True,
#                    pca = False,
#                    feature_selection = True,
#                    fix_imbalance = False)
#                    #group_features = group_list)
# #                   log_experiment = True, experiment_name = 'test1')

In [32]:
# exp_clf102

# Feature Selection only Starting point

In [9]:
exp_clf102 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = False, 
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = False, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = False,
                   pca = False,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,"0: 0, 1: 1"
4,Original Data,"(7010, 208)"
5,Missing Values,False
6,Numeric Features,199
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [13]:
exp_clf102

(False,
 False,
 False,
 [<pandas.io.formats.style.Styler at 0x293f6d09730>,
                                      Model  Accuracy     AUC  Recall   Prec.  \
  et                 Extra Trees Classifier    0.6508  0.6795  0.7776  0.6773   
  catboost              CatBoost Classifier    0.6508  0.6843  0.7787  0.6769   
  gbc          Gradient Boosting Classifier    0.6468  0.6833  0.7755  0.6737   
  ridge                    Ridge Classifier    0.6419  0.0000  0.7683  0.6709   
  rf               Random Forest Classifier    0.6419  0.6777  0.7645  0.6723   
  lightgbm  Light Gradient Boosting Machine    0.6408  0.6656  0.7516  0.6751   
  lda          Linear Discriminant Analysis    0.6392  0.6686  0.7582  0.6712   
  ada                  Ada Boost Classifier    0.6315  0.6540  0.7509  0.6658   
  xgboost         Extreme Gradient Boosting    0.6303  0.6536  0.7364  0.6689   
  nb                            Naive Bayes    0.6042  0.6623  0.5674  0.7055   
  knn                K Neighbors

In [10]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.6508,0.6795,0.7776,0.6773,0.7237,0.2552,0.2602,0.162
catboost,CatBoost Classifier,0.6508,0.6843,0.7787,0.6769,0.724,0.2549,0.2599,9.31
gbc,Gradient Boosting Classifier,0.6468,0.6833,0.7755,0.6737,0.7209,0.2461,0.251,1.952
ridge,Ridge Classifier,0.6419,0.0,0.7683,0.6709,0.7161,0.2368,0.2412,0.021
rf,Random Forest Classifier,0.6419,0.6777,0.7645,0.6723,0.7153,0.2379,0.2416,0.307
lightgbm,Light Gradient Boosting Machine,0.6408,0.6656,0.7516,0.6751,0.7111,0.2399,0.2426,0.408
lda,Linear Discriminant Analysis,0.6392,0.6686,0.7582,0.6712,0.7119,0.2339,0.2374,0.187
ada,Ada Boost Classifier,0.6315,0.654,0.7509,0.6658,0.7056,0.2175,0.2206,0.412
xgboost,Extreme Gradient Boosting,0.6303,0.6536,0.7364,0.6689,0.7009,0.2196,0.2214,1.704
nb,Naive Bayes,0.6042,0.6623,0.5674,0.7055,0.6242,0.2157,0.2237,0.02


In [36]:
lr = create_model('lr')
tuned_lr = tune_model(lr)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6477,0.7,0.7855,0.6716,0.7241,0.2449,0.2507
1,0.6293,0.6531,0.7474,0.6646,0.7036,0.2135,0.2162
2,0.666,0.6816,0.827,0.6771,0.7445,0.2758,0.2875
3,0.6415,0.6683,0.782,0.6667,0.7197,0.2312,0.2369
4,0.6619,0.6972,0.8201,0.6752,0.7406,0.2681,0.2787
5,0.6436,0.6865,0.7509,0.6781,0.7126,0.2467,0.2489
6,0.6408,0.6899,0.7917,0.6628,0.7215,0.2268,0.234
7,0.6551,0.6667,0.7708,0.6831,0.7243,0.2683,0.2718
8,0.6449,0.6976,0.7743,0.6717,0.7194,0.2426,0.2472
9,0.6469,0.7031,0.7682,0.6768,0.7196,0.2481,0.2518


In [37]:
gbc = create_model('gbc')
tuned_gbc = tune_model(gbc)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6701,0.7086,0.8858,0.6649,0.7596,0.2662,0.2957
1,0.6578,0.6752,0.8443,0.6649,0.7439,0.2499,0.2666
2,0.6436,0.682,0.8685,0.6469,0.7415,0.2055,0.23
3,0.6558,0.6978,0.8478,0.6622,0.7436,0.2436,0.2614
4,0.6721,0.7037,0.8858,0.6667,0.7608,0.2714,0.3006
5,0.6293,0.6703,0.7958,0.6516,0.7165,0.1963,0.2046
6,0.6653,0.7015,0.8542,0.6685,0.75,0.2659,0.2848
7,0.6408,0.6426,0.8333,0.6522,0.7317,0.2122,0.2273
8,0.6449,0.6759,0.8472,0.6524,0.7372,0.2175,0.2358
9,0.6633,0.7066,0.8616,0.6658,0.7511,0.2562,0.2774


Note that this is the same setup grid that was shown in __[Binary Classification Tutorial (CLF101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/tutorials/Binary%20Classification%20Tutorial%20Level%20Beginner%20-%20%20CLF101.ipynb)__. The only difference here is the customization parameters that were passed to `setup()` are now set to `True`. Also notice that the `session_id` is the same as the one used in the beginner tutorial, which means that the effect of randomization is completely isolated. Any improvements we see in this experiment are solely due to the pre-processing steps taken in `setup()` or any other modeling techniques used in later sections of this tutorial.

Another difference you may have noticed is the `log_experiment` and `experiment_name` parameter we have used within `setup`. This is to log all the modeling activity in this experiment. You will see at the end of this notebook on how you can benefit from this functionality.

In [38]:
et = create_model('et')
tuned_et = tune_model(et)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6721,0.7207,0.8789,0.6684,0.7593,0.2737,0.3001
1,0.6375,0.6752,0.8131,0.6546,0.7253,0.2103,0.2212
2,0.6558,0.6997,0.8685,0.6571,0.7481,0.2364,0.2605
3,0.6354,0.6941,0.8339,0.6478,0.7292,0.1976,0.2129
4,0.6477,0.6946,0.8581,0.6526,0.7414,0.2196,0.2408
5,0.6477,0.6829,0.8131,0.6638,0.7309,0.2354,0.2458
6,0.6327,0.6963,0.8299,0.6459,0.7264,0.193,0.2076
7,0.6408,0.6569,0.8229,0.6547,0.7292,0.2159,0.2287
8,0.6408,0.6845,0.8333,0.6522,0.7317,0.2122,0.2273
9,0.6633,0.7162,0.8547,0.6676,0.7496,0.2586,0.2776


In [39]:
lightgbm = create_model('lightgbm')
tuned_lightgbm = tune_model(lightgbm)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6599,0.7085,0.8824,0.6572,0.7533,0.2418,0.2707
1,0.6436,0.6644,0.8339,0.6549,0.7336,0.218,0.233
2,0.6497,0.6864,0.8754,0.6504,0.7463,0.2184,0.2452
3,0.6497,0.6875,0.8651,0.6527,0.744,0.2222,0.2455
4,0.6721,0.6922,0.9031,0.6624,0.7643,0.2655,0.3024
5,0.6436,0.6843,0.8062,0.6619,0.727,0.2278,0.2372
6,0.6449,0.686,0.8472,0.6524,0.7372,0.2175,0.2358
7,0.6265,0.6524,0.8125,0.6446,0.7189,0.1841,0.1953
8,0.6531,0.6953,0.875,0.6528,0.7478,0.2282,0.2548
9,0.6571,0.708,0.8616,0.6605,0.7477,0.2408,0.2625


In [40]:
ridge = create_model('ridge')
tuned_ridge = tune_model(ridge)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6436,0.0,0.7924,0.6657,0.7235,0.2326,0.2397
1,0.6171,0.0,0.7128,0.6624,0.6867,0.1962,0.1971
2,0.6558,0.0,0.7924,0.6775,0.7305,0.2624,0.2686
3,0.6253,0.0,0.7439,0.6615,0.7003,0.2049,0.2074
4,0.666,0.0,0.7993,0.6855,0.738,0.2847,0.2912
5,0.6293,0.0,0.7405,0.6667,0.7016,0.2159,0.218
6,0.651,0.0,0.7778,0.6767,0.7237,0.2562,0.2608
7,0.6612,0.0,0.7951,0.6815,0.734,0.2752,0.2815
8,0.6388,0.0,0.7812,0.6637,0.7177,0.2254,0.2312
9,0.6306,0.0,0.7682,0.6607,0.7104,0.2084,0.213


In [41]:
lda = create_model('lda')
tuned_lda = tune_model(lda)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6517,0.7028,0.7993,0.6715,0.7299,0.2502,0.2578
1,0.6273,0.6595,0.7266,0.6688,0.6965,0.2158,0.2171
2,0.6578,0.6866,0.7993,0.6774,0.7333,0.265,0.2722
3,0.6395,0.6723,0.782,0.6647,0.7186,0.2262,0.2321
4,0.666,0.6799,0.7924,0.6877,0.7363,0.2869,0.2923
5,0.6436,0.6903,0.7578,0.6759,0.7145,0.2444,0.2472
6,0.649,0.68,0.7847,0.6726,0.7244,0.249,0.2547
7,0.6612,0.6745,0.7847,0.6848,0.7314,0.2785,0.2833
8,0.649,0.6683,0.7812,0.6737,0.7235,0.2502,0.2553
9,0.6265,0.6828,0.7716,0.6559,0.7091,0.1972,0.2023


In [42]:
rf = create_model('rf')
tuned_rf = tune_model(rf)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6741,0.6699,0.8616,0.6748,0.7568,0.2845,0.3047
1,0.6253,0.6371,0.8374,0.6385,0.7246,0.1706,0.1866
2,0.6334,0.636,0.8201,0.6493,0.7248,0.1976,0.21
3,0.613,0.6501,0.654,0.6774,0.6655,0.2069,0.2071
4,0.6619,0.6553,0.8408,0.6694,0.7454,0.2612,0.2766
5,0.6069,0.6171,0.7197,0.65,0.6831,0.1692,0.1707
6,0.6306,0.6633,0.7743,0.6578,0.7113,0.2079,0.2133
7,0.6306,0.6248,0.8125,0.6482,0.7211,0.1943,0.2054
8,0.6224,0.671,0.7604,0.6537,0.703,0.1929,0.197
9,0.6673,0.698,0.8443,0.674,0.7496,0.2723,0.288


In [43]:
ada = create_model('ada')
tuned_ada = tune_model(ada)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6823,0.7184,0.8201,0.695,0.7524,0.3175,0.3261
1,0.6436,0.6684,0.7682,0.6727,0.7173,0.2409,0.2448
2,0.6558,0.6719,0.8131,0.6714,0.7355,0.2555,0.2652
3,0.6619,0.6725,0.8235,0.6742,0.7414,0.267,0.2783
4,0.6517,0.6691,0.8028,0.6705,0.7307,0.249,0.2572
5,0.6354,0.6752,0.7647,0.6657,0.7118,0.2223,0.2263
6,0.6327,0.6909,0.7639,0.6627,0.7097,0.2165,0.2206
7,0.6367,0.6444,0.7708,0.6647,0.7138,0.224,0.2286
8,0.6449,0.6727,0.7743,0.6717,0.7194,0.2426,0.2472
9,0.6571,0.7032,0.7958,0.6785,0.7325,0.2635,0.2701


In [44]:
nb = create_model('nb')
tuned_nb = tune_model(nb)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.5947,0.6789,0.5467,0.6991,0.6136,0.2007,0.2074
1,0.6171,0.6674,0.5848,0.7131,0.6426,0.2389,0.2444
2,0.5499,0.6736,0.3564,0.7464,0.4824,0.1646,0.2005
3,0.6212,0.6631,0.6747,0.6794,0.6771,0.219,0.219
4,0.5621,0.6508,0.4602,0.6927,0.553,0.1568,0.1695
5,0.6171,0.6897,0.4775,0.7886,0.5948,0.2713,0.3024
6,0.6306,0.662,0.6528,0.6989,0.675,0.2483,0.2491
7,0.6306,0.6442,0.6632,0.6945,0.6785,0.245,0.2454
8,0.6245,0.65,0.6389,0.697,0.6667,0.2386,0.2398
9,0.6082,0.668,0.5848,0.7012,0.6377,0.2186,0.2229


In [46]:
svm = create_model('svm')
tuned_svm = tune_model(svm)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6762,0.0,0.7197,0.7273,0.7235,0.3328,0.3329
1,0.6253,0.0,0.6782,0.6829,0.6806,0.2274,0.2274
2,0.6517,0.0,0.692,0.7092,0.7005,0.2846,0.2847
3,0.668,0.0,0.7336,0.7114,0.7223,0.3099,0.3101
4,0.6517,0.0,0.7682,0.681,0.722,0.2605,0.2639
5,0.6456,0.0,0.692,0.7018,0.6969,0.2704,0.2705
6,0.6429,0.0,0.7118,0.6902,0.7009,0.2581,0.2583
7,0.6469,0.0,0.7153,0.6936,0.7043,0.2666,0.2668
8,0.6551,0.0,0.6875,0.7148,0.7009,0.294,0.2943
9,0.6673,0.0,0.7128,0.7203,0.7165,0.3141,0.3141


In [47]:
knn = create_model('knn')
tuned_knn = tune_model(knn)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6375,0.6723,0.7716,0.6657,0.7147,0.2249,0.2295
1,0.5967,0.6328,0.7474,0.6334,0.6857,0.1338,0.1374
2,0.6191,0.6375,0.7647,0.65,0.7027,0.1825,0.1872
3,0.5967,0.6276,0.7855,0.6253,0.6963,0.1187,0.1258
4,0.6253,0.6335,0.7889,0.6496,0.7125,0.1887,0.1962
5,0.6232,0.631,0.7405,0.6605,0.6982,0.2012,0.2035
6,0.6286,0.6485,0.75,0.6626,0.7036,0.2114,0.2143
7,0.598,0.631,0.7396,0.6358,0.6838,0.1405,0.1436
8,0.6082,0.612,0.7917,0.6333,0.7037,0.146,0.1541
9,0.6245,0.6622,0.7578,0.6577,0.7042,0.1972,0.2009


In [48]:
dt = create_model('dt')
tuned_dt = tune_model(dt)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6171,0.6343,0.7266,0.6583,0.6908,0.1913,0.1929
1,0.6354,0.6583,0.7197,0.6797,0.6992,0.2376,0.2382
2,0.6253,0.6296,0.7993,0.6471,0.7152,0.1849,0.1939
3,0.6232,0.6485,0.6713,0.6831,0.6771,0.2249,0.2249
4,0.6558,0.6523,0.8201,0.6695,0.7372,0.2531,0.2643
5,0.6456,0.6558,0.8685,0.6486,0.7426,0.2106,0.2351
6,0.6204,0.6472,0.7535,0.6536,0.7,0.1904,0.1939
7,0.6306,0.6337,0.7917,0.6533,0.7159,0.2018,0.2095
8,0.6469,0.684,0.691,0.7032,0.697,0.2741,0.2742
9,0.6592,0.6966,0.7232,0.7061,0.7145,0.2919,0.292


In [49]:
qda = create_model('qda')
tuned_qda = tune_model(qda)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6721,0.6934,0.7405,0.7133,0.7267,0.3173,0.3177
1,0.6477,0.6774,0.737,0.6871,0.7112,0.2609,0.262
2,0.6456,0.6726,0.7439,0.6825,0.7119,0.2538,0.2554
3,0.6395,0.6569,0.7509,0.6739,0.7103,0.2369,0.2393
4,0.6517,0.6662,0.7197,0.698,0.7087,0.276,0.2762
5,0.6334,0.652,0.7439,0.6698,0.7049,0.2245,0.2267
6,0.6551,0.7006,0.7083,0.7059,0.7071,0.2878,0.2878
7,0.6327,0.6664,0.7812,0.6579,0.7143,0.2104,0.2166
8,0.6265,0.638,0.7465,0.6615,0.7015,0.2077,0.2104
9,0.6327,0.6788,0.7716,0.6617,0.7125,0.2122,0.217


# Feature Selection and Normalization

In [12]:
exp_clf2 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = False, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = False,
                   pca = False,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,"0: 0, 1: 1"
4,Original Data,"(7010, 208)"
5,Missing Values,False
6,Numeric Features,199
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [51]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
gbc,Gradient Boosting Classifier,0.65,0.6812,0.7752,0.677,0.7226,0.2542,0.259,1.937
et,Extra Trees Classifier,0.6455,0.6808,0.7742,0.6729,0.7198,0.2436,0.2484,0.152
ridge,Ridge Classifier,0.6449,0.0,0.7724,0.6726,0.7189,0.2429,0.2477,0.018
lr,Logistic Regression,0.6431,0.6744,0.7672,0.6724,0.7165,0.2402,0.2445,0.317
lda,Linear Discriminant Analysis,0.6411,0.6682,0.7593,0.6727,0.7131,0.238,0.2417,0.175
lightgbm,Light Gradient Boosting Machine,0.638,0.6728,0.7555,0.6713,0.7105,0.2316,0.2351,0.343
rf,Random Forest Classifier,0.6321,0.6727,0.7517,0.6661,0.7061,0.2189,0.2221,0.296
ada,Ada Boost Classifier,0.6311,0.6537,0.7572,0.6636,0.7072,0.2144,0.218,0.395
knn,K Neighbors Classifier,0.6154,0.6241,0.7371,0.6536,0.6927,0.1835,0.1861,0.216
nb,Naive Bayes,0.606,0.6567,0.5681,0.7084,0.6265,0.2193,0.2276,0.02


# Feature Selection, Normalization, Transformation

In [52]:
exp_clf3 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = True, 
                  ignore_low_variance = False,
                  remove_multicollinearity = False, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = False,
                   pca = False,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(7010, 209)"
5,Missing Values,False
6,Numeric Features,200
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [53]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.6453,0.6777,0.7735,0.6727,0.7195,0.2435,0.2483,0.144
ridge,Ridge Classifier,0.6443,0.0,0.7693,0.673,0.7179,0.2424,0.2466,0.019
lda,Linear Discriminant Analysis,0.6441,0.6695,0.7641,0.6744,0.7164,0.2436,0.2473,0.174
gbc,Gradient Boosting Classifier,0.6439,0.6812,0.7703,0.6722,0.7178,0.2411,0.2457,1.944
lr,Logistic Regression,0.6372,0.6691,0.761,0.6684,0.7115,0.228,0.2318,0.266
rf,Random Forest Classifier,0.6362,0.6727,0.7575,0.6682,0.7098,0.2269,0.2307,0.298
ada,Ada Boost Classifier,0.6317,0.6563,0.7568,0.6643,0.7075,0.2159,0.2194,0.398
lightgbm,Light Gradient Boosting Machine,0.6307,0.6653,0.7516,0.6646,0.7052,0.2154,0.2188,0.323
nb,Naive Bayes,0.6294,0.6689,0.6308,0.708,0.6668,0.2524,0.2548,0.02
knn,K Neighbors Classifier,0.6172,0.635,0.7343,0.6559,0.6928,0.1891,0.1915,0.243


# Feature Selection, Normalization, Ignore Low Variance

In [54]:
exp_clf4 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = False, 
                  ignore_low_variance = True,
                  remove_multicollinearity = False, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = False,
                   pca = False,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(7010, 209)"
5,Missing Values,False
6,Numeric Features,200
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [55]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.649,0.6777,0.7839,0.6734,0.7243,0.2488,0.2545,0.147
ridge,Ridge Classifier,0.6451,0.0,0.7735,0.6726,0.7194,0.243,0.2478,0.021
gbc,Gradient Boosting Classifier,0.6449,0.6806,0.7697,0.6735,0.7182,0.2438,0.2483,1.914
rf,Random Forest Classifier,0.6421,0.6761,0.7669,0.6714,0.7158,0.2379,0.2421,0.295
lr,Logistic Regression,0.6413,0.6772,0.7645,0.6714,0.7148,0.2367,0.2407,0.296
lda,Linear Discriminant Analysis,0.6404,0.6711,0.7575,0.6727,0.7124,0.2371,0.2405,0.161
lightgbm,Light Gradient Boosting Machine,0.6396,0.6714,0.7582,0.6717,0.7121,0.2348,0.2384,0.319
ada,Ada Boost Classifier,0.6315,0.6542,0.7509,0.6658,0.7056,0.2175,0.2206,0.398
knn,K Neighbors Classifier,0.6129,0.6261,0.7246,0.6547,0.6877,0.182,0.1839,0.223
nb,Naive Bayes,0.6119,0.662,0.5861,0.7078,0.6371,0.227,0.234,0.019


# Feature Selection, Normalization, Remove_multicolinearity

In [56]:
exp_clf5 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = False,
                   pca = False,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(7010, 209)"
5,Missing Values,False
6,Numeric Features,200
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [57]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
gbc,Gradient Boosting Classifier,0.6525,0.6803,0.7783,0.6785,0.7249,0.259,0.2638,1.272
et,Extra Trees Classifier,0.6466,0.6826,0.7849,0.6707,0.7232,0.2425,0.2485,0.139
lr,Logistic Regression,0.6451,0.6748,0.7728,0.6727,0.7191,0.2433,0.248,0.181
rf,Random Forest Classifier,0.6437,0.6809,0.7672,0.673,0.7169,0.2416,0.2458,0.222
ridge,Ridge Classifier,0.6421,0.0,0.7711,0.6702,0.7169,0.2365,0.2413,0.018
lda,Linear Discriminant Analysis,0.64,0.6719,0.7686,0.6689,0.7152,0.2323,0.2368,0.11
lightgbm,Light Gradient Boosting Machine,0.6364,0.6686,0.7457,0.6722,0.7069,0.2312,0.2336,0.184
ada,Ada Boost Classifier,0.6359,0.6579,0.7599,0.6677,0.7107,0.2252,0.2289,0.269
knn,K Neighbors Classifier,0.6145,0.6275,0.7343,0.6534,0.6913,0.1826,0.1852,0.172
nb,Naive Bayes,0.6095,0.6549,0.5809,0.7055,0.6358,0.2227,0.2287,0.018


# Feature Selection, Normalization, Remove_multicolinearity, data_split_stratify 

In [58]:
exp_clf6 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = True,
                   pca = False,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(7010, 209)"
5,Missing Values,False
6,Numeric Features,200
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [59]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
rf,Random Forest Classifier,0.6555,0.6839,0.7811,0.6805,0.7272,0.2659,0.2708,0.225
lr,Logistic Regression,0.6535,0.6826,0.7784,0.6792,0.7253,0.2619,0.2665,0.168
lda,Linear Discriminant Analysis,0.6531,0.68,0.7801,0.6783,0.7254,0.2603,0.2654,0.097
ridge,Ridge Classifier,0.6523,0.0,0.7804,0.6774,0.7251,0.2582,0.2633,0.018
gbc,Gradient Boosting Classifier,0.6494,0.6882,0.7732,0.6768,0.7215,0.2538,0.2582,1.227
et,Extra Trees Classifier,0.6484,0.6794,0.7763,0.6746,0.7216,0.2505,0.2555,0.136
lightgbm,Light Gradient Boosting Machine,0.6425,0.6686,0.7524,0.6762,0.712,0.244,0.2468,0.186
ada,Ada Boost Classifier,0.64,0.6649,0.7579,0.672,0.7121,0.2363,0.2397,0.26
knn,K Neighbors Classifier,0.618,0.6305,0.7177,0.6614,0.6881,0.1973,0.1987,0.165
svm,SVM - Linear Kernel,0.6007,0.0,0.6671,0.6604,0.6593,0.1737,0.1771,0.035


# Feature Selection, Normalization, Remove_multicolinearity, data_split_stratify, PCA 

In [60]:
exp_clf7 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = True,
                   pca = True,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(7010, 209)"
5,Missing Values,False
6,Numeric Features,200
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [61]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.6529,0.0,0.7912,0.6746,0.7281,0.2563,0.2628,0.016
lda,Linear Discriminant Analysis,0.6527,0.6857,0.7891,0.675,0.7275,0.2565,0.2627,0.035
lr,Logistic Regression,0.6515,0.6857,0.7874,0.6743,0.7263,0.2541,0.2602,0.026
lightgbm,Light Gradient Boosting Machine,0.6492,0.6725,0.7738,0.6761,0.7215,0.2532,0.2578,0.17
rf,Random Forest Classifier,0.647,0.6794,0.7884,0.6698,0.7241,0.2428,0.2493,0.297
gbc,Gradient Boosting Classifier,0.647,0.6808,0.7794,0.6725,0.7218,0.2458,0.251,1.254
et,Extra Trees Classifier,0.6425,0.6655,0.8456,0.6509,0.7354,0.2119,0.2301,0.12
ada,Ada Boost Classifier,0.625,0.6495,0.736,0.6632,0.6975,0.2074,0.2096,0.273
nb,Naive Bayes,0.6209,0.6399,0.6549,0.6858,0.6691,0.2254,0.2264,0.016
knn,K Neighbors Classifier,0.6186,0.631,0.7194,0.6615,0.6889,0.1981,0.1997,0.097


# Feature Selection, Normalization, Remove_multicolinearity, data_split_stratify, PCA, ordinal

In [62]:
data['gmDate'] = data['gmDate'].astype('string')
list_season=data["seasonID"].unique()
list_season = sorted(list_season)
list_date =data["gmDate"].unique()
list_date = sorted(list_date)
ordinal_dict = {"seasonID" : list_season, "gmDate" : list_date}
# col_list = dataframe.columns.to_list()
# group_list1 = [ ["team"+x[4:],"oppt"+x[4:]] for x in num_col_list if x[:4] == "team" and x[4:8]!="Home"]
# group_list2 = [ ["team"+"Home"+x[8:],"oppt"+"Away"+x[8:]] for x in num_col_list if x[:4] == "team" and x[4:8]=="Home"]
# group_list = group_list1 + group_list2

In [63]:
exp_clf8 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
                  ordinal_features = ordinal_dict,
                   data_split_stratify = True,
                   pca = True,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(7010, 209)"
5,Missing Values,False
6,Numeric Features,200
7,Categorical Features,7
8,Ordinal Features,True
9,High Cardinality Features,False


In [65]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.6525,0.6881,0.7891,0.6749,0.7274,0.2559,0.2621,0.025
lda,Linear Discriminant Analysis,0.6512,0.6885,0.7888,0.6739,0.7267,0.253,0.2591,0.031
ridge,Ridge Classifier,0.6506,0.0,0.7912,0.6726,0.727,0.2507,0.2571,0.017
gbc,Gradient Boosting Classifier,0.6488,0.6831,0.7777,0.6747,0.7224,0.2509,0.2558,1.068
rf,Random Forest Classifier,0.6464,0.6796,0.7867,0.6695,0.7233,0.242,0.2484,0.266
lightgbm,Light Gradient Boosting Machine,0.6435,0.6706,0.7648,0.6733,0.716,0.2423,0.2461,0.159
et,Extra Trees Classifier,0.6405,0.6719,0.8245,0.6539,0.7292,0.2146,0.2282,0.115
ada,Ada Boost Classifier,0.6356,0.6602,0.7482,0.67,0.7068,0.229,0.2317,0.247
qda,Quadratic Discriminant Analysis,0.6201,0.6486,0.6375,0.6916,0.6633,0.229,0.2301,0.019
knn,K Neighbors Classifier,0.6186,0.6319,0.717,0.6622,0.6883,0.1989,0.2002,0.087


# best so far on Default Models # Feature Selection, Normalization, Remove_multicolinearity, data_split_stratify 

# second so far on Default Models # Feature Selection, Normalization, Remove_multicolinearity, data_split_stratify, PCA 

# Optimization is time consuming so we prefer PCA with many reduced features to optimize

In [103]:
# from pycaret.datasets import get_data
# dataset = get_data('credit', profile=True)
import pandas as pd

dataset = pd.read_excel('C:/Users/Stavros/Desktop/0. Dummy/0_Dummy.xlsx')

In [104]:
data = dataset.sample(frac=0.95, random_state=786)
data_unseen = dataset.drop(data.index)

data.reset_index(inplace=True, drop=True)
data_unseen.reset_index(inplace=True, drop=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions ' + str(data_unseen.shape))

Data for Modeling: (7010, 209)
Unseen Data For Predictions (369, 209)


In [105]:
num_col_list = list(data.columns)
num_col_list.remove('teamRslt')
num_col_list.remove('teamConf')
num_col_list.remove('opptConf')
num_col_list.remove('teamAbbr')
num_col_list.remove('opptAbbr')
num_col_list.remove('teamDiv')
num_col_list.remove('opptDiv')
num_col_list.remove('seasonID')
num_col_list.remove('gmDate')

In [73]:
exp_clf7 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True, 
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = True,
                   pca = True,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(7010, 209)"
5,Missing Values,False
6,Numeric Features,200
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [74]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.6529,0.0,0.7912,0.6746,0.7281,0.2563,0.2628,0.016
lda,Linear Discriminant Analysis,0.6527,0.6857,0.7891,0.675,0.7275,0.2565,0.2627,0.034
lr,Logistic Regression,0.6515,0.6857,0.7874,0.6743,0.7263,0.2541,0.2602,0.319
lightgbm,Light Gradient Boosting Machine,0.6492,0.6725,0.7738,0.6761,0.7215,0.2532,0.2578,0.17
rf,Random Forest Classifier,0.647,0.6794,0.7884,0.6698,0.7241,0.2428,0.2493,0.301
gbc,Gradient Boosting Classifier,0.647,0.6808,0.7794,0.6725,0.7218,0.2458,0.251,1.236
et,Extra Trees Classifier,0.6425,0.6655,0.8456,0.6509,0.7354,0.2119,0.2301,0.119
ada,Ada Boost Classifier,0.625,0.6495,0.736,0.6632,0.6975,0.2074,0.2096,0.269
nb,Naive Bayes,0.6209,0.6399,0.6549,0.6858,0.6691,0.2254,0.2264,0.017
knn,K Neighbors Classifier,0.6186,0.631,0.7194,0.6615,0.6889,0.1981,0.1997,0.223


In [75]:
lr = create_model('lr')
tuned_lr = tune_model(lr)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6945,0.7279,0.8333,0.7018,0.7619,0.3444,0.3544
1,0.6212,0.6771,0.7326,0.6594,0.6941,0.2004,0.2023
2,0.6497,0.679,0.7951,0.6696,0.727,0.2482,0.2554
3,0.6375,0.6787,0.8028,0.6572,0.7227,0.214,0.2231
4,0.6314,0.6454,0.7612,0.6627,0.7085,0.2136,0.2175
5,0.6578,0.6946,0.7993,0.6774,0.7333,0.265,0.2722
6,0.6367,0.6899,0.7882,0.6599,0.7184,0.218,0.2249
7,0.6388,0.6723,0.7882,0.6618,0.7195,0.223,0.2298
8,0.6939,0.7233,0.816,0.7078,0.7581,0.3471,0.3536
9,0.6571,0.694,0.7917,0.6786,0.7308,0.2665,0.2725


In [76]:
gbc = create_model('gbc')
tuned_gbc = tune_model(gbc)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.666,0.7334,0.8854,0.6606,0.7567,0.2585,0.2884
1,0.6456,0.6904,0.8507,0.6516,0.738,0.2194,0.2388
2,0.6314,0.6716,0.8507,0.6397,0.7303,0.1836,0.2032
3,0.6232,0.6723,0.8616,0.632,0.7291,0.156,0.1777
4,0.6253,0.6403,0.8166,0.6431,0.7195,0.1785,0.1904
5,0.6701,0.6935,0.8685,0.6693,0.756,0.2721,0.295
6,0.6531,0.7085,0.8785,0.6521,0.7485,0.2269,0.2548
7,0.6388,0.679,0.8681,0.6427,0.7386,0.1945,0.219
8,0.6592,0.7249,0.8542,0.6631,0.7466,0.2507,0.2702
9,0.6571,0.6968,0.8854,0.6538,0.7522,0.2348,0.2652


In [77]:
et = create_model('et')
tuned_et = tune_model(et)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6599,0.7265,0.9201,0.6479,0.7604,0.2311,0.2783
1,0.6456,0.6826,0.8715,0.6469,0.7426,0.2121,0.2379
2,0.6354,0.6806,0.8715,0.6387,0.7372,0.1862,0.2119
3,0.6273,0.6747,0.8824,0.6312,0.7359,0.1583,0.1865
4,0.6253,0.6451,0.8374,0.6385,0.7246,0.1706,0.1866
5,0.6538,0.6946,0.8824,0.6522,0.75,0.2263,0.2555
6,0.6367,0.7139,0.8889,0.6368,0.742,0.1814,0.213
7,0.6531,0.6818,0.9132,0.6446,0.7557,0.2144,0.2576
8,0.6714,0.7177,0.8854,0.6658,0.7601,0.2708,0.3
9,0.6673,0.726,0.9132,0.6559,0.7634,0.251,0.2937


In [78]:
lightgbm = create_model('lightgbm')
tuned_lightgbm = tune_model(lightgbm)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6904,0.725,0.8785,0.6838,0.769,0.3213,0.3452
1,0.6456,0.6805,0.8299,0.6566,0.7331,0.2266,0.2408
2,0.6578,0.6763,0.8472,0.663,0.7439,0.251,0.2687
3,0.6334,0.6729,0.8478,0.643,0.7313,0.1873,0.2059
4,0.6334,0.6503,0.8339,0.6461,0.7281,0.1925,0.2078
5,0.6599,0.7008,0.8512,0.6649,0.7466,0.2526,0.271
6,0.6531,0.7131,0.8681,0.6545,0.7463,0.2306,0.2548
7,0.649,0.6718,0.8542,0.6543,0.741,0.2253,0.2454
8,0.6735,0.7044,0.8576,0.6749,0.7554,0.285,0.304
9,0.6571,0.6994,0.8646,0.6587,0.7477,0.2421,0.2649


In [79]:
ridge = create_model('ridge')
tuned_ridge = tune_model(ridge)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.7006,0.0,0.8403,0.7055,0.767,0.357,0.3678
1,0.6151,0.0,0.7361,0.6523,0.6917,0.1844,0.1868
2,0.6599,0.0,0.7951,0.6795,0.7328,0.2728,0.2793
3,0.6354,0.0,0.8062,0.6545,0.7225,0.2077,0.2175
4,0.6334,0.0,0.7578,0.6657,0.7087,0.2198,0.2232
5,0.668,0.0,0.8028,0.6864,0.74,0.2886,0.2954
6,0.6429,0.0,0.7986,0.6628,0.7244,0.2294,0.2376
7,0.6286,0.0,0.7812,0.6541,0.712,0.2004,0.2068
8,0.698,0.0,0.816,0.7121,0.7605,0.3568,0.3628
9,0.6592,0.0,0.7917,0.6806,0.7319,0.2714,0.2773


In [80]:
lda = create_model('lda')
tuned_lda = tune_model(lda)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6965,0.7381,0.8021,0.7152,0.7561,0.358,0.3621
1,0.6314,0.682,0.7257,0.672,0.6978,0.227,0.2281
2,0.6538,0.6877,0.7743,0.6799,0.724,0.2648,0.2688
3,0.6415,0.6774,0.7612,0.6728,0.7143,0.2383,0.2416
4,0.6456,0.6399,0.7647,0.6758,0.7175,0.2469,0.2504
5,0.664,0.7019,0.7751,0.6914,0.7308,0.2875,0.2909
6,0.6571,0.7082,0.7743,0.684,0.7264,0.272,0.2758
7,0.6184,0.6716,0.7465,0.6535,0.6969,0.1879,0.1909
8,0.6939,0.7299,0.7743,0.724,0.7483,0.3588,0.3601
9,0.6673,0.7125,0.7917,0.6888,0.7367,0.291,0.2962


In [81]:
rf = create_model('rf')
tuned_rf = tune_model(rf)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6762,0.7146,0.6562,0.759,0.7039,0.3507,0.3553
1,0.6212,0.6913,0.6007,0.709,0.6504,0.2432,0.2472
2,0.611,0.6698,0.6076,0.6917,0.647,0.2179,0.2201
3,0.6375,0.658,0.654,0.7079,0.6799,0.2635,0.2646
4,0.5906,0.6379,0.6194,0.663,0.6404,0.1665,0.167
5,0.6395,0.6935,0.6471,0.7137,0.6788,0.2703,0.272
6,0.6612,0.7145,0.6806,0.7259,0.7025,0.3101,0.311
7,0.6122,0.6657,0.6458,0.6788,0.6619,0.208,0.2084
8,0.651,0.7228,0.6215,0.7427,0.6767,0.304,0.3098
9,0.6673,0.7072,0.7153,0.7178,0.7165,0.3141,0.3141


In [82]:
ada = create_model('ada')
tuned_ada = tune_model(ada)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6762,0.7137,0.7917,0.6972,0.7415,0.3129,0.3174
1,0.6415,0.6801,0.7569,0.6728,0.7124,0.2411,0.244
2,0.6314,0.6694,0.7743,0.6578,0.7113,0.2107,0.2161
3,0.6456,0.6638,0.7785,0.6716,0.7212,0.2423,0.2473
4,0.6191,0.6385,0.7682,0.6491,0.7036,0.1813,0.1864
5,0.6782,0.7001,0.827,0.6888,0.7516,0.3055,0.316
6,0.6633,0.7102,0.809,0.6793,0.7385,0.2757,0.2841
7,0.651,0.6797,0.7986,0.6706,0.729,0.2493,0.2569
8,0.6918,0.7115,0.8021,0.7108,0.7537,0.3462,0.3507
9,0.6551,0.7043,0.809,0.6715,0.7339,0.2558,0.2649


In [83]:
nb = create_model('nb')
tuned_nb = tune_model(nb)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6741,0.7242,0.6806,0.7424,0.7101,0.3397,0.3413
1,0.5967,0.6396,0.5868,0.6815,0.6306,0.1921,0.1947
2,0.6538,0.6809,0.6736,0.7185,0.6953,0.2954,0.2962
3,0.6395,0.6659,0.6678,0.7044,0.6856,0.2639,0.2644
4,0.5927,0.6279,0.5433,0.6978,0.6109,0.1972,0.2041
5,0.666,0.6796,0.7232,0.7133,0.7182,0.3083,0.3083
6,0.6592,0.6983,0.6562,0.7354,0.6936,0.3125,0.315
7,0.6,0.6498,0.5521,0.7035,0.6187,0.2107,0.2176
8,0.6796,0.7194,0.7431,0.7205,0.7316,0.3344,0.3346
9,0.6306,0.6886,0.691,0.6838,0.6874,0.2361,0.2361


In [84]:
svm = create_model('svm')
tuned_svm = tune_model(svm)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6762,0.0,0.8194,0.688,0.748,0.3045,0.3137
1,0.6212,0.0,0.7361,0.6584,0.6951,0.1992,0.2013
2,0.6375,0.0,0.7639,0.6667,0.712,0.229,0.2329
3,0.6212,0.0,0.7924,0.6451,0.7112,0.1773,0.1854
4,0.6456,0.0,0.7682,0.6748,0.7184,0.2458,0.2496
5,0.6701,0.0,0.8097,0.6862,0.7429,0.2913,0.2991
6,0.6286,0.0,0.7882,0.6523,0.7138,0.198,0.2053
7,0.6388,0.0,0.7951,0.6599,0.7213,0.2206,0.2284
8,0.6898,0.0,0.816,0.7036,0.7556,0.3373,0.3443
9,0.6612,0.0,0.7917,0.6826,0.7331,0.2763,0.282


In [85]:
knn = create_model('knn')
tuned_knn = tune_model(knn)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6965,0.734,0.8264,0.7062,0.7616,0.3512,0.3595
1,0.6436,0.6661,0.7778,0.6687,0.7191,0.2392,0.2443
2,0.6578,0.6619,0.8021,0.6754,0.7333,0.2657,0.2734
3,0.6293,0.6713,0.8166,0.6466,0.7217,0.1887,0.2005
4,0.6212,0.6294,0.7751,0.6493,0.7066,0.1837,0.1896
5,0.6619,0.6597,0.8097,0.6783,0.7382,0.2715,0.2801
6,0.6653,0.7177,0.8368,0.6732,0.7461,0.2717,0.2858
7,0.6224,0.6629,0.7951,0.6451,0.7123,0.1803,0.1888
8,0.6816,0.7174,0.7986,0.7012,0.7468,0.323,0.328
9,0.6714,0.7103,0.8299,0.6809,0.748,0.2889,0.3007


In [86]:
dt = create_model('dt')
tuned_dt = tune_model(dt)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6741,0.6941,0.7431,0.7133,0.7279,0.3222,0.3226
1,0.6171,0.6327,0.6562,0.6799,0.6678,0.2163,0.2164
2,0.6273,0.6414,0.7431,0.6625,0.7005,0.2115,0.2139
3,0.6314,0.622,0.8166,0.6484,0.7228,0.1938,0.2056
4,0.6212,0.6236,0.7232,0.6635,0.6921,0.2024,0.2036
5,0.6232,0.6469,0.7405,0.6605,0.6982,0.2012,0.2035
6,0.6347,0.6632,0.7951,0.6562,0.719,0.2106,0.2186
7,0.5878,0.6175,0.6076,0.6629,0.6341,0.1641,0.1649
8,0.6571,0.6766,0.75,0.6923,0.72,0.2797,0.2812
9,0.6408,0.6485,0.8611,0.6458,0.7381,0.2022,0.2246


In [87]:
qda = create_model('qda')
tuned_qda = tune_model(qda)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6925,0.7381,0.7118,0.7509,0.7308,0.3728,0.3735
1,0.6293,0.6702,0.6389,0.7023,0.6691,0.2499,0.2514
2,0.6293,0.6682,0.6771,0.6866,0.6818,0.238,0.238
3,0.6497,0.6662,0.7128,0.6983,0.7055,0.2734,0.2735
4,0.6008,0.625,0.6574,0.662,0.6597,0.177,0.177
5,0.666,0.6983,0.7163,0.7163,0.7163,0.3103,0.3103
6,0.6816,0.7066,0.7257,0.7308,0.7282,0.344,0.344
7,0.6102,0.6507,0.6806,0.6644,0.6724,0.1914,0.1915
8,0.6816,0.7221,0.691,0.7481,0.7184,0.3535,0.355
9,0.6571,0.7189,0.7361,0.6974,0.7162,0.284,0.2847


# all before + min-max scaler

In [10]:
exp_clf10 = setup(data = data, target = 'teamRslt', session_id=123,
                  normalize = True,
                  normalize_method = 'minmax',
                  transformation = False, 
                  ignore_low_variance = False,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95,
                  numeric_features = num_col_list,
                  categorical_features = ['teamConf', 'opptConf', 'teamAbbr', 'opptAbbr',  'teamDiv' , 'opptDiv'],
#                   ordinal_features = ordinal_dict,
                   data_split_stratify = True,
                   pca = True,
                   feature_selection = True,
                   fix_imbalance = False)
                   #group_features = group_list)
#                   log_experiment = True, experiment_name = 'test1')

Unnamed: 0,Description,Value
0,session_id,123
1,Target,teamRslt
2,Target Type,Binary
3,Label Encoded,"0: 0, 1: 1"
4,Original Data,"(7010, 208)"
5,Missing Values,False
6,Numeric Features,199
7,Categorical Features,7
8,Ordinal Features,False
9,High Cardinality Features,False


In [11]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lda,Linear Discriminant Analysis,0.6572,0.693,0.7888,0.6797,0.7299,0.2674,0.2735,0.033
lr,Logistic Regression,0.6565,0.6933,0.7902,0.6786,0.7299,0.2655,0.2717,0.021
ridge,Ridge Classifier,0.6565,0.0,0.7919,0.6781,0.7303,0.2649,0.2715,0.016
catboost,CatBoost Classifier,0.647,0.6779,0.7905,0.669,0.7245,0.2423,0.2494,5.283
svm,SVM - Linear Kernel,0.6466,0.0,0.7631,0.678,0.7162,0.2502,0.2565,0.022
rf,Random Forest Classifier,0.6411,0.6645,0.8092,0.6586,0.7259,0.2212,0.2317,0.301
qda,Quadratic Discriminant Analysis,0.6398,0.6761,0.6646,0.706,0.6839,0.2659,0.2673,0.02
gbc,Gradient Boosting Classifier,0.638,0.6663,0.7863,0.6619,0.7185,0.2216,0.2283,1.176
lightgbm,Light Gradient Boosting Machine,0.638,0.665,0.7627,0.669,0.7124,0.2294,0.2333,0.226
xgboost,Extreme Gradient Boosting,0.6245,0.645,0.7471,0.6598,0.7003,0.2024,0.2058,1.034


In [90]:
lr = create_model('lr')
tuned_lr = tune_model(lr)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6965,0.7329,0.8125,0.7112,0.7585,0.3551,0.3608
1,0.6477,0.6856,0.7569,0.6791,0.7159,0.2557,0.2583
2,0.6477,0.67,0.7708,0.6748,0.7196,0.2512,0.2553
3,0.6558,0.6851,0.827,0.6676,0.7388,0.2508,0.2634
4,0.6436,0.6534,0.7405,0.6815,0.7098,0.2501,0.2515
5,0.6456,0.6804,0.7924,0.6676,0.7247,0.2376,0.2445
6,0.6592,0.702,0.8125,0.6744,0.737,0.2646,0.274
7,0.6714,0.6966,0.8021,0.6896,0.7416,0.2976,0.304
8,0.6796,0.7203,0.7882,0.7028,0.743,0.3213,0.325
9,0.6531,0.7039,0.7986,0.6725,0.7302,0.2543,0.2618


In [91]:
gbc = create_model('gbc')
tuned_gbc = tune_model(gbc)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.668,0.7009,0.9236,0.6536,0.7655,0.2507,0.2995
1,0.6395,0.6713,0.875,0.6412,0.7401,0.1953,0.2223
2,0.6151,0.662,0.8715,0.6228,0.7265,0.134,0.1576
3,0.6253,0.6561,0.917,0.6235,0.7423,0.139,0.1802
4,0.6293,0.631,0.8581,0.6375,0.7316,0.173,0.1942
5,0.6517,0.6846,0.9204,0.6425,0.7568,0.207,0.2541
6,0.6224,0.6907,0.8958,0.6247,0.7361,0.1416,0.1738
7,0.6327,0.6784,0.9062,0.6304,0.7436,0.1641,0.2024
8,0.651,0.714,0.8854,0.6489,0.7489,0.2193,0.2498
9,0.6571,0.6955,0.9201,0.6463,0.7593,0.2224,0.2694


In [92]:
et = create_model('et')
tuned_et = tune_model(et)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6904,0.7073,0.9583,0.6635,0.7841,0.2962,0.3678
1,0.6293,0.6678,0.8958,0.6293,0.7393,0.1614,0.1951
2,0.6293,0.6612,0.8854,0.6312,0.737,0.1654,0.1953
3,0.6212,0.6715,0.9239,0.6195,0.7417,0.1252,0.1683
4,0.613,0.643,0.872,0.6222,0.7262,0.1254,0.1483
5,0.6415,0.6912,0.91,0.6368,0.7493,0.1845,0.2254
6,0.6367,0.6885,0.8993,0.6348,0.7443,0.1774,0.2132
7,0.6347,0.6819,0.9306,0.6276,0.7497,0.1598,0.2109
8,0.6224,0.7115,0.8924,0.6253,0.7353,0.143,0.174
9,0.6408,0.7082,0.9375,0.6308,0.7542,0.1732,0.23


In [93]:
lightgbm = create_model('lightgbm')
tuned_lightgbm = tune_model(lightgbm)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6802,0.7147,0.7674,0.7106,0.7379,0.3295,0.3311
1,0.6456,0.6627,0.7604,0.6759,0.7157,0.2497,0.2528
2,0.6273,0.6495,0.75,0.6606,0.7024,0.2091,0.2122
3,0.6334,0.6539,0.8028,0.6535,0.7205,0.2039,0.2132
4,0.6069,0.6191,0.737,0.6455,0.6882,0.1628,0.1654
5,0.6456,0.66,0.7855,0.6696,0.7229,0.24,0.2459
6,0.6449,0.6904,0.7882,0.6676,0.7229,0.2379,0.2444
7,0.6388,0.6596,0.7743,0.6657,0.7159,0.2278,0.2327
8,0.6531,0.7019,0.7639,0.6832,0.7213,0.2656,0.2685
9,0.6612,0.6998,0.8056,0.6784,0.7365,0.2719,0.2798


In [94]:
ridge = create_model('ridge')
tuned_ridge = tune_model(ridge)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.7026,0.0,0.8264,0.7126,0.7653,0.3657,0.3732
1,0.6395,0.0,0.7569,0.6708,0.7113,0.2362,0.2393
2,0.6456,0.0,0.7812,0.6696,0.7212,0.2429,0.2484
3,0.6477,0.0,0.827,0.6602,0.7343,0.2306,0.2438
4,0.6314,0.0,0.7405,0.6688,0.7028,0.2208,0.2228
5,0.6578,0.0,0.8131,0.6734,0.7367,0.2604,0.27
6,0.6571,0.0,0.8125,0.6724,0.7358,0.2597,0.2692
7,0.6673,0.0,0.8125,0.6822,0.7417,0.2845,0.2931
8,0.6735,0.0,0.7917,0.6951,0.7403,0.3057,0.3104
9,0.6551,0.0,0.8125,0.6705,0.7347,0.2547,0.2644


In [95]:
lda = create_model('lda')
tuned_lda = tune_model(lda)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6945,0.736,0.809,0.7104,0.7565,0.3513,0.3566
1,0.6436,0.6835,0.7604,0.6738,0.7145,0.2448,0.248
2,0.6415,0.6676,0.7743,0.6677,0.717,0.2354,0.2402
3,0.6517,0.6879,0.8339,0.6621,0.7381,0.2383,0.2528
4,0.6354,0.6514,0.7509,0.6698,0.708,0.2271,0.2297
5,0.6538,0.6805,0.8166,0.6686,0.7352,0.2493,0.2599
6,0.6633,0.6994,0.8194,0.6762,0.741,0.2723,0.2827
7,0.6694,0.6944,0.809,0.6853,0.742,0.2905,0.2983
8,0.6673,0.7134,0.7847,0.6911,0.735,0.2932,0.2975
9,0.6633,0.711,0.8264,0.6742,0.7426,0.2701,0.282


In [96]:
rf = create_model('rf')
tuned_rf = tune_model(rf)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6151,0.6872,0.566,0.7181,0.633,0.2401,0.2476
1,0.6029,0.6748,0.5799,0.6929,0.6314,0.2082,0.2121
2,0.6008,0.6521,0.6007,0.6811,0.6384,0.1968,0.1988
3,0.6191,0.6639,0.6609,0.6821,0.6714,0.2188,0.219
4,0.611,0.6288,0.6055,0.6944,0.647,0.2183,0.2209
5,0.6334,0.6799,0.6609,0.6996,0.6797,0.252,0.2525
6,0.6531,0.6905,0.6493,0.7305,0.6875,0.3006,0.3033
7,0.6245,0.6712,0.6389,0.697,0.6667,0.2386,0.2398
8,0.6612,0.6998,0.6389,0.748,0.6891,0.322,0.3268
9,0.6388,0.702,0.6875,0.6947,0.6911,0.2562,0.2563


In [97]:
ada = create_model('ada')
tuned_ada = tune_model(ada)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6904,0.6919,0.8264,0.7,0.758,0.3367,0.3457
1,0.6456,0.6683,0.8333,0.6557,0.7339,0.2254,0.2404
2,0.6354,0.6437,0.7882,0.658,0.7172,0.2159,0.2229
3,0.6436,0.6697,0.8408,0.6532,0.7352,0.2155,0.2322
4,0.6354,0.6286,0.7958,0.6571,0.7199,0.2114,0.2195
5,0.6517,0.6897,0.8374,0.6612,0.7389,0.2371,0.2525
6,0.6551,0.6823,0.8264,0.6667,0.738,0.25,0.2626
7,0.6224,0.6628,0.7847,0.6476,0.7096,0.1841,0.1912
8,0.651,0.6947,0.8021,0.6696,0.7299,0.2482,0.2564
9,0.651,0.6841,0.8299,0.662,0.7365,0.2388,0.2525


In [98]:
nb = create_model('nb')
tuned_nb = tune_model(nb)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6802,0.7104,0.7812,0.7053,0.7414,0.3255,0.3285
1,0.6314,0.6575,0.7535,0.6636,0.7057,0.2178,0.2209
2,0.6069,0.6332,0.684,0.6589,0.6712,0.183,0.1832
3,0.6191,0.6494,0.8131,0.6386,0.7154,0.1644,0.1757
4,0.5886,0.6027,0.6678,0.6455,0.6565,0.1441,0.1443
5,0.6314,0.654,0.8062,0.6508,0.7202,0.1976,0.2075
6,0.6061,0.649,0.7049,0.6527,0.6778,0.1732,0.174
7,0.6245,0.6282,0.7326,0.6635,0.6964,0.2075,0.2093
8,0.6714,0.7047,0.7917,0.693,0.7391,0.3008,0.3057
9,0.6694,0.6981,0.8264,0.68,0.7461,0.285,0.2963


In [99]:
svm = create_model('svm')
tuned_svm = tune_model(svm)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6823,0.0,0.7847,0.7062,0.7434,0.3293,0.3325
1,0.6477,0.0,0.7604,0.678,0.7169,0.2546,0.2575
2,0.6436,0.0,0.7708,0.6707,0.7173,0.2414,0.2457
3,0.6456,0.0,0.8201,0.6602,0.7315,0.228,0.2399
4,0.6273,0.0,0.737,0.6656,0.6995,0.2122,0.2141
5,0.6456,0.0,0.7993,0.6657,0.7264,0.2352,0.2433
6,0.651,0.0,0.7917,0.6726,0.7273,0.2516,0.2582
7,0.6653,0.0,0.8125,0.6802,0.7405,0.2795,0.2884
8,0.6776,0.0,0.8056,0.6946,0.746,0.3112,0.3176
9,0.6531,0.0,0.8056,0.6705,0.7319,0.252,0.2606


In [100]:
knn = create_model('knn')
tuned_knn = tune_model(knn)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6741,0.71,0.8715,0.6711,0.7583,0.2833,0.307
1,0.6232,0.627,0.8576,0.6317,0.7275,0.1603,0.1813
2,0.6415,0.6407,0.8785,0.6421,0.7419,0.1992,0.2274
3,0.6395,0.6502,0.917,0.634,0.7496,0.1765,0.2207
4,0.6253,0.632,0.8547,0.635,0.7286,0.1639,0.184
5,0.6151,0.6383,0.8824,0.622,0.7296,0.1264,0.1525
6,0.6306,0.6553,0.875,0.6348,0.7358,0.171,0.1973
7,0.6612,0.6464,0.9062,0.6525,0.7587,0.2378,0.2773
8,0.6367,0.6823,0.8472,0.6455,0.7327,0.197,0.2155
9,0.6347,0.6566,0.9028,0.6326,0.7439,0.1708,0.2078


In [101]:
dt = create_model('dt')
tuned_dt = tune_model(dt)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6314,0.628,0.8021,0.6507,0.7185,0.201,0.2105
1,0.6049,0.6025,0.8368,0.6211,0.713,0.1215,0.1363
2,0.5866,0.5749,0.7083,0.6316,0.6678,0.1253,0.1268
3,0.6293,0.6468,0.7855,0.6542,0.7138,0.2001,0.2069
4,0.5642,0.5785,0.6747,0.619,0.6457,0.0823,0.0828
5,0.6008,0.6168,0.7163,0.6449,0.6787,0.1556,0.1571
6,0.5857,0.5773,0.7812,0.6164,0.6891,0.0935,0.0996
7,0.5939,0.6035,0.7569,0.6282,0.6866,0.1237,0.1281
8,0.6122,0.6417,0.691,0.6633,0.6769,0.1927,0.1929
9,0.6204,0.6373,0.7535,0.6536,0.7,0.1904,0.1939


In [102]:
qda = create_model('qda')
tuned_qda = tune_model(qda)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.6741,0.7161,0.7535,0.7092,0.7306,0.3192,0.3202
1,0.6415,0.6683,0.7569,0.6728,0.7124,0.2411,0.244
2,0.6232,0.6516,0.7188,0.6656,0.6912,0.2099,0.211
3,0.6497,0.6743,0.827,0.662,0.7354,0.2357,0.2487
4,0.6151,0.6348,0.6886,0.6678,0.678,0.1998,0.2
5,0.6456,0.6621,0.8028,0.6648,0.7273,0.234,0.2426
6,0.6469,0.6704,0.7674,0.6758,0.7187,0.2498,0.2535
7,0.6449,0.6574,0.7639,0.6748,0.7166,0.246,0.2495
8,0.6612,0.6949,0.7535,0.6955,0.7233,0.2883,0.2898
9,0.6449,0.6931,0.8056,0.6629,0.7273,0.232,0.2412
