GitHub - UBC-MDS/SklearncomPYre: Regression and Classification model comparison wrapper around scikit-learn

Facilitating beautifully efficient comparisons of machine learning classifiers and regression models.

Created by

Birinder Singh · Jes Simkin · Talha Siddiqui

Summary • Install • How To Use • Credits • Related • License • Contribute

Summary

SklearncomPYre harnesses the power of scikit-learn, combining it with pandas dataframes and matplotlib plots for easy, breezy, and beautiful machine learning exploration.

Looking to do the same in R? Check out caretcompaR!

Function 1: `split()`

The function splits the training input samples X, and target values y (class labels in classification, real numbers in regression) into train, test and validation sets according to specified proportions.

Outputs four array like training, validation, test, and combined training and validation sets and four y arrays.

Inputs:

X data set, type: Array like
Y data set, type: Array like
proportion of training data , type: float
proportion of test data , type: float
proportion of validation data, type: float

Outputs:

X train set, type: Array like
y train, type: Array like
X validation set, type: Array like
y validation, type: Array like
X train and validation set, type: Array like
y train and validation, type: Array like
X test set, type: Array like
y test, type: Array like

Function 2: `train_test_acc_time()`

The purpose of this function is to compare different sklearn regressors or classifiers in terms of training and test accuracies, and the time it takes to fit and predict. The function inputs are dictionary of models, input train samples Xtrain(input features), input test samples Xtest, target train values ytrain and target test values ytest (continuous or categorical).

The function outputs a beautiful dataframe with training & test scores, model variance, and the time it takes to fit and predict using different models.

Inputs:

Dictionary of ML classifiers or regressors.
X train set, type: Array-like
Y train set, type: Array-like
X test set, type: Array-like
Y test set, type: Array-like

Outputs:

Dataframe with 7 columns: (1) regressor or classifier name, (2) training accuracy, (3) test accuracy, (4) model variance, (5) time it takes to fit, (6) time it takes to predict and (7) total time. The dataframe will be sorted by test score in descending order.

Function 3: `comparison_viz()`

The purpose of this function is to visualize the output of train_test_acc_time() for easy communication and interpretation. The user has the choice to visualize a comparison of accuracies or time. It takes in a dataframe with 7 attributes i.e. model name, training & test scores, model variance, and the time it takes to fit, predict and total time.

Outputs a beautiful matplotlib bar chart comparison of different models' training and test scores or the time it takes to fit and predict.

Inputs:

Dataframe with 7 columns: (1) regressor or classifier name, (2) training accuracy, (3) test accuracy, (4) model variance, (5) time it takes to fit, (6) time it takes to predict and (7) total time. Type: pandas.Dataframe
Choice of accuracy or time, with the default being 'accuracy' if no string is given. Type: string

Outputs:

Bar chart of accuracies or time comparison by models saved to root directory. Type: png

Install

Pleas use the following command to install the package. :
pip install git+https://github.com/UBC-MDS/SklearncomPYre.git

Once installed, load the package using following commands :

from SklearncomPYre.train_test_acc_time import train_test_acc_time
from SklearncomPYre.comparison_viz import comparison_viz
from SklearncomPYre.split import split

Dependencies

Python==3.6.8
matplotlib==3.0.1
numpy==1.15.4
pandas==0.20.3
scikit-learn==0.20.2
scipy==1.2.0

How To Use

Here is an example of how you can use SklearncomPYre:

# Example usage

# Import libraries
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Importing SklearncomPYre
from SklearncomPYre.train_test_acc_time import train_test_acc_time
from SklearncomPYre.comparison_viz import comparison_viz
from SklearncomPYre.split import split

# Loading the handy iris dataset
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data[:, [2, 3]]
y = iris.target

# Setting up a dictionary of classifiers to test

dictionary = {
    'knn': KNeighborsClassifier(),
    'LogRegression':LogisticRegression() ,
    'RForest': RandomForestClassifier()}

# Let's start by using the SklearncomPYre function split().

# Splitting up datasets into 40% training, 20% vaildation, and 40% tests sets.

X_train, y_train, X_val, y_val, X_train_val, y_train_val, X_test, y_test = split(X,y,0.4,0.2,0.4)

#Now, let's train some models and compare them in a pandas dataframe by using train_test_acc_time().

result = train_test_acc_time(dictionary,X_train,y_train,X_val,y_val)
result

# Next, let's take a look at some some plots with comparison_viz()

#Our plots will be saved to the working directory.

comparison_viz(result, "accuracy")
comparison_viz(result, 'time')

Credits

Function concepts inspired by UBC MDS DSCI 573 lab instructor Varada Kolhatkar.
README formatting inspiration from ptoolkit.
Badges by Shields IO
Logo designed at Canva

Our idea for this package was to facilitate the comparison of machine learning classifiers and models. Our inspiration came from UBC MDS DSCI 573 lab assignments where we learned to combine python's sci-kit learn with pandas in order to produce interpretable comparisons of train and test accuracies and time efficiencies across models.

We are not currently aware of any packages that combine sci-kit learn and pandas for efficient and interpretable model-to-model comparisons. We expect that this combination is used in practice and after having used it while learning machine learning techniques during our UBC MDS coursework, we thought it would be a good combination of tools to formally package together.

We are aware of a new package, sklearn-pandas that combines sci-kit learn and pandas powers but this new package is tailored towards providing full-cycle machine learning functionality (feature selection, transformations, inputting/outputting pandas dataframes, etc.) rather than focusing facilitating model-to-model comparisons via dataframes.

License

MIT License

Contribute

Interested in contributing? See our Contributing Guidelines and Code of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
SklearncomPYre		SklearncomPYre
dist		dist
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Contribution.md		Contribution.md
LICENSE		LICENSE
MANIFEST		MANIFEST
README.md		README.md
logo.png		logo.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Facilitating beautifully efficient comparisons of machine learning classifiers and regression models.

Created by

Birinder Singh · Jes Simkin · Talha Siddiqui

Summary • Install • How To Use • Credits • Related • License • Contribute

Summary

Function 1: `split()`

Function 2: `train_test_acc_time()`

Function 3: `comparison_viz()`

Install

Dependencies

How To Use

Credits

Related

Where does this package fit in?

License

Contribute

About

Releases 4

Packages

Contributors 3

Languages

License

UBC-MDS/SklearncomPYre

Folders and files

Latest commit

History

Repository files navigation

Facilitating beautifully efficient comparisons of machine learning classifiers and regression models.

Created by

Birinder Singh · Jes Simkin · Talha Siddiqui

Summary • Install • How To Use • Credits • Related • License • Contribute

Summary

Function 1: split()

Function 2: train_test_acc_time()

Function 3: comparison_viz()

Install

Dependencies

How To Use

Credits

Related

Where does this package fit in?

License

Contribute

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Function 1: `split()`

Function 2: `train_test_acc_time()`

Function 3: `comparison_viz()`

Packages