Skip to content
This repository has been archived by the owner on Dec 28, 2023. It is now read-only.
/ quirk Public archive

Build powerful predictive models with a few lines of code

License

Notifications You must be signed in to change notification settings

ankane/quirk

Repository files navigation

Quirk

Build powerful predictive models with a few lines of code

See a demo

Quirk does:

  • exploratory data analysis
  • feature engineering
  • predictive modeling

Build Status

Installation

With pip, run:

pip install quirk

Getting Started

For rich visualizations, run Quirk from a Jupyter notebook.

For classification, use:

%matplotlib inline

import quirk

qk = quirk.Classifier(
    train_data='train.csv',
    test_data='test.csv',
    target_col='Survived',
    id_col='PassengerId')

qk.analyze()
qk.model()

For regression, use the quirk.Regressor class.

Tip: To prevent scrolling in notebooks, select Cell > Current Outputs > Toggle Scrolling.

Features

There are two primary methods:

  • analyze runs exploratory data analysis
  • model builds and evaluates different models

Optionally pass test data if you want to generate a CSV file with predictions.

Data

Data can be a file

quirk.Classifier(train_data='train.csv', ...)

Or a data frame

train_df = pd.read_csv('train.csv')

# do preprocessing
# ...

quirk.Classifier(train_data=train_df, ...)

Specify datetime columns with:

quirk.Classifier(datetime_cols=['created'], ...)

Evaluation

Quirk has support for a number of eval metrics.

Classification

  • accuracy - # correct / total (default)
  • auc - area under the ROC curve
  • mlogloss - multi class log loss

Regression

  • rmse - root mean square error (default)
  • rmsle - root mean square logarithmic error

Specify an eval metric with:

quirk.Classifier(eval_metric='mlogloss', ...)

Modeling

Quirk builds and compares different models. Currently, it uses:

  1. boosted trees
  2. simple benchmarks (mode for classification, mean and median for regression)

XGBoost is required for boosted trees. Install it with:

pip install xgboost

Performance

Dataset Eval Metric v0.1 Current
Titanic Accuracy 0.77512 0.77512
Rental Listing Inquiries Multi Class Log Loss - 0.61861
House Prices RMSLE 0.14069 0.13108

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/quirk.git
cd quirk
pip install -r requirements.txt
pytest

About

Build powerful predictive models with a few lines of code

Resources

License

Stars

Watchers

Forks

Packages

No packages published