Build powerful predictive models with a few lines of code
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
quirk
tests
.gitignore
CHANGELOG.md
LICENSE.txt
MANIFEST.in
Makefile
README.md
requirements.txt
setup.py

README.md

Quirk

Build powerful predictive models with a few lines of code

See a demo

Quirk does:

  • exploratory data analysis
  • feature engineering
  • predictive modeling

Installation

With pip, run:

pip install quirk

Getting Started

For rich visualizations, run Quirk from a Jupyter notebook.

For classification, use:

%matplotlib inline

import quirk

qk = quirk.Classifier(
    train_data='train.csv',
    test_data='test.csv',
    target_col='Survived',
    id_col='PassengerId')

qk.analyze()
qk.model()

For regression, use the quirk.Regressor class.

Tip: To prevent scrolling in notebooks, select Cell > Current Outputs > Toggle Scrolling.

Features

There are two primary methods:

  • analyze runs exploratory data analysis
  • model builds and evaluates different models

Optionally pass test data if you want to generate a CSV file with predictions.

Data

Data can be a file

quirk.Classifier(train_data='train.csv', ...)

Or a data frame

train_df = pd.read_csv('train.csv')

# do preprocessing
# ...

quirk.Classifier(train_data=train_df, ...)

Specify datetime columns with:

quirk.Classifier(datetime_cols=['created'], ...)

Evaluation

Quirk has support for a number of eval metrics.

Classification

  • accuracy - # correct / total (default)
  • auc - area under the ROC curve
  • mlogloss - multi class log loss

Regression

  • rmse - root mean square error (default)
  • rmsle - root mean square logarithmic error

Specify an eval metric with:

quirk.Classifier(eval_metric='mlogloss', ...)

Modeling

Quirk builds and compares different models. Currently, it uses:

  1. boosted trees
  2. simple benchmarks (mode for classification, mean and median for regression)

XGBoost is required for boosted trees. See how to install. On Mac, use:

pip install xgboost

Performance

Dataset Eval Metric v0.1 Current
Titanic Accuracy 0.77512 0.77512
Rental Listing Inquiries Multi Class Log Loss - 0.61861
House Prices RMSLE 0.14069 0.13108

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help: