EzStacking: from data to Kubernetes via Scikit-Learn, Keras, FastAPI and Docker

Theoretical preamble

Learning algorithm

Tom Mitchell provides the following definition of a learning program : “A computer program is said to learn from experience $E$ with respect to some class of tasks $T$ and performance measure $P$, if its performance at tasks in $T$, as measured by $P$, improves with experience $E$.”

The algorithm associated to this learning computer program is called a learning algorithm.

Supervised learning problems

Let's assume that there is a function $f:U\subset \mathbb{R}^{n}\rightarrow V\subset \mathbb{R}^{p}$, $f$ is unknown and is called the parent function or the objective function, $U$ is the input space, $V$ is the output space. For the sake of simplicity, let's assume that $p=1$.

The only thing we know about $f$ is a set of samples $L=\lbrace\left(x_i, y_i\left(=f\left(x_i\right)\right)\right)\in U\times V\rbrace_{i\in \lbrace 1,..,I \rbrace}$, $x$ are called features, $y$ the target and $L$ the learning set.

We would like to find an approximation $\tilde{f}$ of $f$ built from the learning set and a learning algorithm, this is a supervised learning problem.

Here, the experience $E$ is the learning set, the task $T$ is to find the function $\tilde{f}$ and the performance measure $P$ is the gap between the prediction and the ground truth (i.e. the target $y_{i}$ in the learning set).

Regression / Classification

If $V$ is continuous (resp. discrete) in the preceding definition, then it is a regression (resp. classification) problem.

Time series forecasting

Let's imagine an experiment during which a result $X_{t}$ is measured over time, $\lbrace X_{t}\rbrace_{t}$ is a time series. How can we predict $X_{t+\tau}$ (where $\tau$ is the time step)?

Let's assume that $X_{t+\tau}$ depends on the $k$ preceding measures ($k$ is the window size or lag number), we can suppose that there is a function $f$ such that: $X_{t+\tau}=f \left(X_{t}, X_{t-\tau},.., X_{t-k\tau} \right)$.

Let's define $L=\lbrace\left(X_{t} ,X_{t-\tau},.., X_{t-k\tau},X_{t+\tau} \right)\rbrace_{t}$, that is typically a regression learning set.

Useful operators

Feature and target extractors

Let's say we have a learning set $L=\lbrace \left( x, y \right) \in U \times V \rbrace$, the feature extractor $\pi_{f}$ is defined by $\pi_{f}\left(L\right)=\lbrace x, \left( x, y \right) \in L \rbrace$, the target extractor $\pi_{t}$ is defined by $\pi_{t}\left(L\right)=\lbrace y, \left( x, y \right) \in L \rbrace$.

Training operator

The training operator $T$ is the operator that transforms a learning algorithm into a model (or estimator) fitted on a learning set. Let $A$ be a learning algorithm and a learning set $L$, $M^{A}_{L}=T\left(A,L \right)$ is the model obtained after training $A$ on $L$.

During the training process, the learning set $L$ is divided into a training set $L_{train}$ and a test set $L_{test}$, and the algorithm $A$ is trained on the training set (i.e. optimised so that the gap between the prediction and the ground truth is minimal). A function $S$ called score measures the gap mean.

$S_{train}=S\left(M^{A}_{L_{train}}\left(\pi_{f}\left(L_{train}\right) \right),\pi_{t}\left(L_{train}\right) \right)$ is the train error, $S_{test}=S\left(M^{A}_{L_{train}}\left(\pi_{f}\left(L_{test}\right) \right),\pi_{t}\left(L_{test}\right) \right)$ is the test error.

Folding and cross-validation

Let's say we have a learning set $L=\lbrace\left(x_{i}, y_{i}\right)\in U\times V\rbrace_{i\in \lbrace 1,..,I\rbrace}$ (it is assumed that $I$ is not prime), $J$ is a divisor of $I$. It is possible to split (randomly for regression and classfication problems, but for time series forecasting, the temporal order must be respected) $L$ into $J$ equal (and usually disjoint) parts $\lbrace L_{j}\rbrace_{j\in \lbrace 1,..,J\rbrace}$.

$L_{j}$ is the test set and $L^{\hat{\jmath}}=L-L_{j}$ is the train set for the $j$th fold of the $J$ fold cross-validation.

Some properties:

If $J==I$:
- $L_{i}\neq L_{j}$, if $i\neq j$
- $\bigcup_{i} L_{i}=L$
- as there is only one element in each test set, this technique is suitable for small amount of data.
If $J|I$:
- $\sharp L^{\hat{\jmath}}=\frac{I}{J}$
- $L^{\hat{\imath}} \cap L^{\hat{\jmath}}=\emptyset$, if $i\neq j$
- $\bigcup_{i} L^{\hat{\jmath}}=L$
- this technique is suitable for large amount of data.

$T$ is the training operator, invoking the learning algorithm $A$ on the training set $L^{\hat{\jmath}}$ induces a model $M^{\hat{\jmath}}_{A}=T\left(A,L^{\hat{\jmath}}\right)$.

The cross-validation error (or cross-validation score) is given by: $$S_{CV}\left(M_{A},L\right)=\frac{1}{J}\sum_{j=1}^{J}\delta\left(M^{\hat{\jmath}}_{A_{j}}\left(\pi_{f}\left(L_{j}\right)\right)-\pi_{t}\left(L_{j}\right) \right) $$ where $\delta$ measures the gap between prediction of $M^{\hat{\jmath}}_{A_{j}}$ and ground truth.

Ideally, $S_{train}$ should be closed to $S_{CV}$; if $S_{train}$ is closed to $S_{CV}$ and $S_{train}$ is large, the model suffers from underfitting (the model has large biais); if $S_{train}$ is small and $S_{CV}$ is largely greater than $S_{train}$, the model suffers from overfitting (the model has large variance)

The technique of minimal cross-validation error says that given a set of candidate learning algorithms $\lbrace A_{k}\rbrace_{k \in \lbrace 1,..,K\rbrace}$ and a learning set $L$, one should generalize from $L$ with a model $M_{l}=T\left(A_{l},L\right) \in \lbrace T\left(A_{k},L\right)\rbrace_{k \in \lbrace 1,..,K\rbrace}$ such that $S_{CV}\left(M_{l},L\right) \lt S_{CV}\left(M_{j},L\right)$, $\forall j \ne l$.

Stacked generalization

Let $\lbrace A_{k}\rbrace_{k \in \lbrace 1,..,K\rbrace}$ be a finite set of learning algorithm. We can define a finite set of models $\lbrace M_{k}^{\hat{\jmath}}\rbrace_{k \in \lbrace 1,..,K\rbrace}$, where $M^{\hat{\jmath}}_{k}=T\left(A_{k},L^{\hat{\jmath}}\right)$, they are called the level 0 models.

Let's define $z_{nk}=M^{\hat{\jmath}}_{k}\left( x_n \right)$ for all $x_n \in \pi_{f}\left(L_{j}\right)$.

At the end of the entire cross-validation process, the dataset assembled from the outputs of the $K$ models is: $L_{CV}=\lbrace\left(z_{1i},..,z_{Ki}, y_i\right)\rbrace_{i\in \lbrace 1,..,I\rbrace}$.

Let $\bar A$ be another learning algorithm, $\bar M=T\left(\bar A,L_{CV}\right)$, $\bar M$ is called the level 1 model.

The level 0 models are retrained on the whole learning set $\lbrace M_{k}\rbrace_{k \in \lbrace 1,..,K\rbrace}=\lbrace T\left(A_{k},L\right)\rbrace_{k \in \lbrace 1,..,K\rbrace}$, and finally $\tilde{f}=\bar M\left( M_{1},..,M_{K} \right)$, which is the stacked model.

What conditions ensure that : $S_{CV}\left(\tilde{f},L\right) \leq S_{CV}\left(M_{j},L\right)$, $\forall j$ ?

Questioning

Wolpert's black art

In his paper on classification and stacked generalization, D. Wolpert underlines 3 points:

no rules saying what level 0 learning algorithms one should use
no rules saying what level 1 learning algorithms one should use
how to choose the number of level 0 learning algorithms.

Model importance

Regression

In his paper on regression and stacked generalization, L. Breiman indicates that if a linear regression is used as level 1 learning algorithm, a non-negativity constraint on the coefficients must be added.

Classification

In their paper, K. M. Ting and I. H. Witten explain that, for classification, the level 1 learning algorithm should be a multi-response least-squares regression based on class probabilities (confidence measure), but no non-negativity constraint is needed.

However, the non-negativity constraint increases the interpretability, and these coefficients can be used to define the model importance of the level 0 models.

Model selection

So, if the initial set of level 0 learning algorithms is large enough, it can be shrinked according to the model importances (moreover, the train and test scores must also be considered, to avoid underfitting and overfitting), this is the model selection.

About the Data

Until now, nothing has been said about the data, yet the proper training of a learning algorithm requires data analysis, this is the exploratory data analysis (or EDA) phase.

Data quality and data protection

Trivially, in some cases the data is not usable:

the rows for which the target is not specified must be deleted
if a given feature has a unique value, the corresponding column must be dropped
the features, for which the percentage of unspecified values is too high, should be deleted (if this percentage is quite low, missing values can be filled using imputation).

On the other side, some data must not be used due to protection laws (e.g. personal information, health data...), the corresponding features must be dropped (or anonymized).

Another important point is outliers (some data differs significantly from other observations), they can be detected using $Z$ score; it must be used with care, some important data can be lost.

This is the data cleaning (or cleansing).

Data description

Depending on the learning algorithm, some data types are not supported, which is why a preprocessing phase is necessary. For each data item, the data description indicates its type (numeric or categorical) and its value range (an interval (resp. a list of values) for a feature of numeric (resp. categorical) type and is stored in a schema. The data description drives the preprocessing. The value range can be used to evaluate input data used to make predictions.

Data correlation and feature importance

Following Occam's razor, simpler models are preferable. A model with fewer features is simpler and often better.

Some reasons to remove correlated features:

speed: fewer features typically improve the speed of the learning algorithm due to the curse of dimensionality
harmful bias: removing correlated features can reduce harmful bias
interpretability: to make a model more interpretable, reducing the number of features is beneficial. Simplified models are easier to understand.

However, if correlated features are informative and correlated with the target, they should be retained.

Feature importance refers to techniques that assign a score to input features based on their usefulness in predicting a target variable. Due to their heterogeneity, stacked models are not capable of directly providing importance; permutation feature importance is then a good alternative. However, this method is also sensitive to the correlation of features.

This process of eliminating unnecessary features is called feature selection; given the previous remarks, it must be carried out diligently.

Requirements

Let's imagine that we want to create notebooks like those that can be found on Kaggle.com, but with the following required steps:

taking into account problems of regression, classification and time series forecasting
EDA with data cleaning, feature selection (based on correlation), schema generation
modeling using stacking including: preprocessing (based on schema), training, model evaluation and model inspection
model selection based on model importance
feature selection based on permutation feature importance
basic testing: performance measurement using test generator and FastAPI
deployment in Docker.

Of course, a graphical interface allows you to select the dataset and the models, set the parameters, generate the notebook, the test set, the deployment files...

EzStacking will try to cover these different requirements.

Introduction

EzStacking is a development tool designed to adress supervised learning problems. It can be viewed like an extension of the idea proposed in the article No Free Lunch:Free at Last!.

EzStacking handles classification, regression and time series forecasting problems for structured data (cf. Notes hereafter) using stacked generalization, that is an ensemble learning approach that trains a level 1 model to integrate the predictions of multiple level 0 models, with the aim of generating more robust and precise predictions than any of the individual base models.

Sometimes, the stacked models can be heavy, slow or not accurate, they can contain bad level 0 models or some features have bad influence on score, EzStacking allows the final model to be optimised along three axes:

the number of level 0 models (reduced using model importance)
the number of features (reduced using feature importance)
the complexity (depth) of the level 0 models (depending on the user's choices).

Notes:

the time series forecasting problem is based on the transformation of time series to supervised learning problem
EzStacking must be used with *.csv dataset using separator ','
the column names must not contain spaces (otherwise it will produce error during server generation)
for the time series forecasting, one of the columns in the dataset must be a temporal value.

The development process produces:

a development notebook (generated by a Jupyter notebook generator based on Scikit-Learn pipelines and stacked generalization) containing:
- an exploratory data analysis (EDA) used to assess data quality
- a modeling building a reduced-size stacked estimator, including at each step an evaluation and an inspection of the model
a server (with its client) returning a prediction, a measure of the quality of input data and the execution time
a test generator that can be used to evaluate server performance
a Docker container generator that contains all the necessary files to build the final Docker container based on FastAPI and uvicorn, and a file for the deployment of the API in Kubernetes
a zip package containing all the files produced during the process.

Some results

I spent some time on Kaggle, some results about the optimisation process described here are given in this site, moreover and analysis of the results can be found here. I became Nobebooks Master with this project.

Time has passed and EzStacking continues to evolve, some full projects (in zip format) can be found in the examples folder on GitHub.

EzStacking - How to install it

First you have to:

install Anaconda
create the virtual environment EzStacking using the following command: conda env create -f ezstacking.yaml
activate the virtual environment using the following command: conda activate ezstacking
install kernel in ipython using the following command: ipython kernel install --user --name=ezstacking
launch the Jupyter server using the following command: jupyter-lab --no-browser

Notes:

jupyter-lab is a comfortable development tool more flexible than jupyter notebook
sometimes, jupyter-lab uses a lot of ressources (the kernel is running even if the work is finished), it is certainly due to varInspector , you can inactivate it using the command jupyter nbextension disable varInspector/main.

How to uninstall it

You simply have to:

deactivate the virtual environment using the following command: conda deactivate
remove the virtual environment using the following command: conda remove --name ezstacking --all
remove the kernel using the following command: jupyter kernelspec uninstall ezstacking

EzStacking - How to use it

Input file and problem characteristics

In Jupyter, first open the notebook named ezstacking.ipynb:

Then click on Run All Cells:

First select your file, then select the target name (i.e. the variable on which we want to make predictions), the problem type (i.e. classification if the target is discrete, regression if the target is continous, if the problem is time dependent, the time indexing column and the lag number must be filled) and the data size:

Notes:

the data size is small, if the number of row is smaller than 3000
the lag number is the number of past observations used to train the model
Random seed is used for replicability.

Development

Now, let's choose the options:

EDA

Visualization options

Option	Notes
Yellow bricks	The graphics will be constructed with Matplotlib and Yellow bricks
Seaborn	The graphics will be constructed with Matplotlib and Seaborn
fastEDA	The graphics will be constructed with Matplotlib and fastEDA
ydata-profiling	The graphics will be constructed with Matplotlib and ydata-profiling

Notes:

time-dependent problems benefit from specific interactive visualization tools based on statmodels:
- autocorrelation and partial autocorrelation
- seasonal decomposition with one period
- seasonal decomposition with two periods
- unobserved components decomposition
the visualisation option Seaborn can produce time consuming graphics.

Thresholds in EDA

Notes:

threshold_cat: if the number of different values in a column is less than this number, the column will be considered as a categorical column
threshold_NaN: if the proportion of NaN is greater than this number the column will be dropped
threshold_Z: if the Z_score (indicating outliers) is greater than this number, the rows will be dropped.

Splitting

Notes:

test size: proportion of the dataset to include in the test split
threshold_E: if target entropy is greater than this number, RepeatedStratifiedKFold will be used.
if the option Undersampling is checked, then an undersampler must be chosen with care.

Modeling

Model	Data size	Model	Data size
Gradient Boosting	both	SGD	both
Support vector	small	Logistic Regression	both
Keras	both	Linear Regression	both
Gaussian Process	small	ElasticNet	both
Decision Tree	small	Multilayer Perceptron	small
Random Forest	both	KNeighbors	small
AdaBoost	both	Gaussian Naive Bayes	small
Histogram-based Gradient Boosting	both

Notes:

if the option "No correlation" is checked, the model will not integrate decorrelation step
if the option "No model optimization" is checked, the number of models and of features will not be reduced automatically
if no estimator is selected, the regressions (resp. classifications) will use linear regressions (resp. logistic regressions)
depending on the data size, EzStacking uses the estimators given in the preceding table for the level 0 models
estimators based on Keras or on Histogram-Based Gradient Boosting benefit from early stopping, those based on gaussian processes do not benefit from it
the Gaussian methodes option is only available for small dataset.

Known bugs using Keras:

for classification problems: the generated API doesn't work with Keras
the ReduceLROnPlateau callback produces an error when saving the model.

Level 1 model options

Level 1 model type	Notes
Regression (linear or logistic)	the option `Non-negativity` should be checked
Decision tree	alternative approach to the importance

Thresholds in modeling

Notes:

threshold_corr: if the correlation is greater than this number the column will be dropped
threshold_score: keep models having test score greater than this number.
threshold_model: keep this number of best models (in the sens of model importance)
threshold_feature: keep this number of most important features.

Build

Simply enter a file name:

Just click on the button , you should find your notebook in the current folder.

Then open this notebook, and click on the button Run All Cells.

Test

You just have to fill the numbers of (passing and non-passing) tests. Then click on the button , it will generate the file test.sh.

Now, at the bottom of the generated notebook, click on the link .

It opens the server notebook, then execute the line run server.py (and check carfully if the server is well started). If you have chosen the link http://127.0.0.1:8000/docs it opens the classical test gui of FastAPI.

If you have clicked on the link client, it opens the client notebook and you just have to execute the first command, the result should look like the following:

Docker

Docker Desktop is the tool used for this part.

The last step of the main notebook is the generation of all the useful files used to build the Docker container associated with the model.

These files are stored in a folder having the name of the project.

Open a terminal in this folder and launch the following command to build the container:

docker build -t <project_name> .

The container can be run directly in Docker Desktop, you can also use the following command line:

docker run --rm -p 80:80 <project_name>

Note:

Models using Keras will not work due to technical problem with SciKeras

Kubernetes

The program also generates a file for the API deployment in Kubernetes:

Deployment in Kubernetes:

kubectl apply -f <project_name>_deployment.yaml

Control the deployment of the service:

kubectl get svc

Delete the service:

kubectl delete -f <project_name>_deployment.yaml

Note:

The generated test script can be used with both Docker and kubernetes.
If the container is running in Docker, it must be stopped before testing it in Kubernetes.

Zip & Clean

If you click on the button , EzStacking generates a zip archive file containing:

the initial dataset
the developement notebook
the model
the data schema
the client and the server
a generic notebook to test FastAPI endpoint.

Further more, it also suppresses from the folder the elements added to the archive, the files generated for the Docker container are simply deleted (it is assumed that the container had been built in Docker).

Note: it is advisable to close properly the generated notebooks (Ctrl + Shift + Q).

EzStacking - As development tool

Development process

Once the first notebook has been generated, the development process can be launched.

You simply have to follows the following workflow:

Fortunately, if the option "No model optimization" is not checked in the modeling step, the entire process is automated, you just have to fix the thresholds and choose your models.

Data quality & EDA

EDA can be seen as a toolbox to evaluate data quality like:

dataframe statistics
cleaning i.e. NaN and outlier dropping
ranking / correlation

Note: the EDA step doest not modify data, it just indicates which actions should be done.

This process returns:

a data schema i.e. a description of the input data with data type and associated domain:
- minimum and maximum for continous features,
- a list for categorical features
a list of columns dropped_cols that should be suppressed (simply adding at the departure of the EDA this list to the variable user_drop_cols, then it is necessary to re-launch from the EDA).

Notes:

Tip: starting with the end of the EDA is a good idea (Run All Above Selected Cell), so you do not execute unnecessary code (at the first step of development)
Yellow Brick offers different graphs associated to ranking and correlation and many more informations
The main steps of data pre-processing:
1. not all estimators support NaN : they must be corrected using iterative imputation (resp. simple imputation) for numerical features (resp. categorical features).
2. data normalization and encoding are also key points for successful learning
3. only the correlations with the target are interesting, the others must be removed (for linear algebra reasons)
Those steps are implemented in the first part of the modeling pipeline.

Modeling

Model construction

The first step of modeling is structured as follow:

During the splitting step:

if the dataset is large, the test set should be reduced to 10% of the whole dataset
imbalanced classes are measured using Shannon entropy, if the score is too low, the splitting is realized using RepeatedStratifiedKFold.

Note: for imbalanced class management, EZStacking also offers different subsampling methods as an option.

This initial model is maybe too large, the modeling process reduces its size in terms of models and features as follow:

the set of estimators is reduced according to the test scores and the importance of each level 0 model

Regression train and test scores	Classification train and test scores

Regression model importance	Classification model importance

the stack based on the reduced estimator set is trained
the feature importance graphic indicates which columns could also be dropped

Regression feature importance	Classification feature importance

those columns are added to the variable dropped_cols depending on the value of threshold_feature
dropped_cols can be added to user_drop_cols at the departure of the EDA (then it is necessary to re-launch from the EDA).

Notes:

the calculation of the model importance is based on the coefficients of level 1 estimator
the feature importance is computed using permutation importance
it is important not to be too stingy, it is not necessary to remove too many estimators and features, as this can lead to a decrease in performance.

Model evaluation

(Class) prediction error

Regression	Classification

Time series forecasting

Specific to classification (with Yellow Brick option)

ROC/AUC

Classification report

Confusion matrix

Specific to regression (with Yellow Brick option)

Residuals plot

Model inspection

Model importance

Feature permutation importance

Partial Dependence & Individual Conditional Expectation

Regression	Classification

Serving the model

EzStacking also generates an API based on FastAPI.

The complete development process produces three objects:

a schema
a model
a server source.

They can be used as basic prediction service returning:

a prediction
a list of columns in error (i.e. the value does not belong to the domain given in the schema)
the elapsed and CPU times.

Example:

Regression with data drift	Classification without data drift

Testing the model

The schema is also used to build the file of passing and non-passing tests, indeed a passing test (resp. a non-passing test) means that all features belong to their domains given in the schema (resp. at least one feature does not belong to its domain).

As we have already seen, the server returns the consumption of each request, in the test phase we also have access to the global consumption linked to the execution of all requests.

A test file for Docker (and Kubernetes) is also created, it is located in the directory associated with the Docker container.

Resources used for this project

Stacking, Model importance and Others
- Machine Learning, Tom Mitchell, McGraw Hill, 1997
- Stacked generalization, David H. Wolpert, Neural Networks, Volume 5, Issue 2, 1992, Pages 241-259
- Stacked regressions, Leo Breiman
- Issues in Stacked Generalization, K. M. Ting, I. H. Witten
- No Free Lunch:Free at Last!, Ali Almashhadani, Neelang Parghi, Weihao Bi, Raman Kannan
- Model stacking to improve prediction and variable importance robustness for soft sensor development, Maxwell Barton, Barry Lennox - Digital Chemical Engineering Volume 3, June 2022, 100034
- Stacking with Neural network for Cryptocurrency investment, Avinash Barnwal, Haripad Bharti, Aasim Ali, and Vishal Singh - Inncretech Inc., Princeton
- Gaussian Processes for Machine Learning, Carl Eduard Rasmussen and Christopher K.I. Williams MIT Press 2006
- The Kernel Cookbook, David Duvenaud
- Time series, MingYu (Ethen) Liu
- Correlated features
- Correlated features and permutation feature importance
- ...
Machine learning tools
- scikit-learn
- pandas
- keras
- seaborn
- statsmodels
- YData Profiling
- ...
Good ressources to learn Python and machine learning (those I used...)
- Python courses (in French), Guillaume Saint-Cirgue
- Machine learning, Andrew Ng & Stanford University
- Deep Learning Specialization, Andrew Ng & DeepLearning.ai
- Machine Learning Engineering for Production, Andrew Ng & DeepLearning.ai
- Machine Learning Mastery, Jason Brownlee
- ...

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset		dataset
examples		examples
modules		modules
screenshots		screenshots
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
ezstacking.ipynb		ezstacking.ipynb
ezstacking.yaml		ezstacking.yaml

License

phbillet/EzStacking

Folders and files

Latest commit

History

Repository files navigation

EzStacking: from data to Kubernetes via Scikit-Learn, Keras, FastAPI and Docker

Theoretical preamble

Learning algorithm

Supervised learning problems

Regression / Classification

Time series forecasting

Useful operators

Feature and target extractors

Training operator

Folding and cross-validation

Stacked generalization

Questioning

Wolpert's black art

Model importance

Regression

Classification

Model selection

About the Data

Data quality and data protection

Data description

Data correlation and feature importance

Requirements

Introduction

Some results

EzStacking - How to install it

How to uninstall it

EzStacking - How to use it

Input file and problem characteristics

Development

EDA

Visualization options

Thresholds in EDA

Splitting

Modeling

Level 1 model options

Thresholds in modeling

Build

Test

Docker

Kubernetes

Zip & Clean

EzStacking - As development tool

Development process

Data quality & EDA

Modeling

Model construction

Model evaluation

(Class) prediction error

Specific to classification (with Yellow Brick option)

ROC/AUC

Classification report

Confusion matrix

Specific to regression (with Yellow Brick option)

Residuals plot

Model inspection

Model importance

Feature permutation importance

Partial Dependence & Individual Conditional Expectation

Serving the model

Testing the model

Resources used for this project

About

Topics

Resources

License

Stars

Watchers

Forks

Languages