Pancake is a Python package which provides a simple API to stack scikit-learn models.
Clone or download
Burak Himmetoglu Burak Himmetoglu
Burak Himmetoglu and Burak Himmetoglu Updated Readme
Latest commit 4c9adda Jan 5, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Pancake First commit Jan 4, 2019
docs File format changes Jan 4, 2019
examples Updated Readme Jan 5, 2019
LICENSE Create LICENSE Jan 4, 2019
MANIFEST.in First commit Jan 4, 2019
README.md Updated Readme Jan 5, 2019
setup.py First commit Jan 4, 2019

README.md

README

PanCake is a Python package that allows users to stack scikit-learn models over a number of folds and train stacker models using out-of-sample predictions of input models.

Stacks

The stacking tool provides the construction of a stacking module composed of in-layer (models being stacked) and out-layer (stacker models) models. The output is a list or matrix of predictions from training of the module, which can either be used as the final results, or fed into a different module.

Installation

After cloning the repository, install from the directory of the package by

pip install .

Usage

Initiating stacker

stacker = Stacker(X, y, splitter, evalMetric, family)

where X is the data matrix (numpy array), y is target vector (numpy array), splitter is a scikit-learn cross-validation generator (KFold or StratifiedKFold), evalMetric is the metric to be maximized during training, and family is the type of the problem (currently "regression" or "binary").

Adding models (in-layer):

Add a scikit-learn model modelObj to in-layer by

stacker.addModelIn(modelObj, trainable, hyperParameters)

If trainable is set to True then the model will be trained across folds using the hyperParameters which is a dictionary of hyper-parameter grid for the model (check scikit-learn's documentation for the model). If it is set to False then the model is assumed fixed and is only fitted across folds.

Adding stacker models (out-layer):

Add a scikit-learn model modelObj to out-layer by

stacker.addModelOut(modelObj, hyperParameters)

Again, hyperParameters is a dictionary containig the grid of hyper-parameters for the model.

Training and Predictions:

To train the model and get predictions on the training data, use

predsTrain = stacker.stackTrain(matrixOut)

which yields final predictions for each out-layer model as a list when matrixOut is set to False. When it is set to True, predictions for each out-model is appended as column vectors is a an array.

For predictions on the test set, use:

predsTest = stacker.stackTest(X_ts, matrixOut)

where X_ts is the test data and matrixOut is the same as above.

Summary, Saving and Loading:

To get a summary on CV scores, fit and training times for each in-layer and out-layer model, use

stacker.summary()

To save the trained stacker for later use, call

saveModel(stacker, savePath)

To load a trained model from disk, call

stacker = loadModel(savePath)

Examples

Jupyter notebooks analyzing the Boston Housing data is included in the repo:

  1. Stacking linear models
  2. Stacking Random Forest and Support Vector Regressors

TODO

  1. Multi-class classification problems
  2. Parallelization at the model and/or hyper-parameter level