General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
thcrock Throw warning if unscaled logit is used [Resolves #508] (#548)
* Throw warning if unscaled logit is used [Resolves #508]

* Update src/triage/experiments/validate.py

Co-Authored-By: thcrock <tristan.h.crockett@gmail.com>
Latest commit b89fce6 Dec 14, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Merge pull request #537 from dssg/cohort_granular Dec 10, 2018
example/config Add additional feature group CV strategy (#518) Dec 7, 2018
requirement Scheduled monthly dependency update for December (#526) Dec 6, 2018
src Throw warning if unscaled logit is used [Resolves #508] (#548) Dec 14, 2018
.bandit.yml Make codacy stop complaining about asserts (#383) Feb 8, 2018
.codeclimate.yml Remove duplicate/similar checks from codeclimate and bump line length… Oct 11, 2018
.editorconfig Project skeleton populated by cookiecutter Oct 17, 2016
.gitignore development environment wizard (#511) Nov 28, 2018
.python-version.current development environment wizard (#511) Nov 28, 2018
.pyup.yml Switch back to monthly now that we are pinned Sep 13, 2018
.travis.yml New encrypted password for travis Nov 2, 2018
AUTHORS.rst completed integration of results-schema as triage component Dec 12, 2017
CONTRIBUTING.md Move example yaml configuration files into subdirectory (#520) Dec 6, 2018
HISTORY.rst Cookie Cutter defaults Oct 19, 2016
LICENSE change to UChicago license (#110) Apr 15, 2017
MANIFEST.in completed integration of results-schema as triage component Dec 12, 2017
README.rst development environment wizard (#511) Nov 28, 2018
develop support in `develop` script for detection of pyenv installed via Home… Dec 11, 2018
manage.py Enforce alembic use for experiments, add CLI managements for alembic … ( Jun 29, 2018
pytest.ini updated build/test configuration to reflect moved source files Nov 15, 2017
setup.cfg Bump version: 3.2.0 → 3.2.1 Dec 10, 2018
setup.py Bump version: 3.2.0 → 3.2.1 Dec 10, 2018
tox.ini Scheduled monthly dependency update for December (#526) Dec 6, 2018

README.rst

Triage

Risk modeling and prediction

https://travis-ci.org/dssg/triage.svg?branch=master https://codeclimate.com/github/dssg/triage.png

Predictive analytics projects require the coordination of many different tasks, such as feature generation, classifier training, evaluation, and list generation. These tasks are complicated in their own right, but in addition have to be combined in different ways throughout the course of the project.

Triage aims to provide interfaces to these different phases of a project, such as an Experiment. Each phase is defined by configuration specific to the needs of the project, and an arrangement of core data science components that work together to produce the output of that phase.

Installation

Prerequisites

To use Triage, you first need:

  • Python 3+
  • A PostgreSQL database with your source data (events, geographical data, etc) loaded.
  • Ample space on an available disk, (or for example in Amazon Web Services's S3), to store the needed matrices and models for your experiment

Building

Triage is a Python package distributable via setuptools. It may be installed directly using easy_install or pip, or named as a dependency of another package as triage.

To build this package (without installation), its dependencies may alternatively be installed from the terminal using pip:

pip install -r requirement/main.txt

Testing

To add test (and development) dependencies, use test.txt:

pip install -r requirement/test.txt [-r requirement/dev.txt]

Then, to run tests:

pytest

Development

To quickly bootstrap a development environment, having cloned the repository, invoke the executable develop script from your system shell:

./develop

A "wizard" will suggest set-up steps and optionally execute these, for example:

(install) begin

(pyenv) installed ✓

(python-3.6.2) installed ✓

(virtualenv) installed ✓

(activation) installed ✓

(libs) install?
1) yes, install {pip install -r requirement/main.txt -r requirement/test.txt -r requirement/dev.txt}
2) no, ignore
#? 1

Experiment

The first phase implemented in Triage is the Experiment. An experiment represents the initial research work of creating design matrices from source data, and training/testing/evaluating a model grid on those matrices. At the end of the experiment, a relational database with results metadata is populated, allowing for evaluation by the researcher.

Documentation

Background

Triage is developed at the University of Chicago's Center For Data Science and Public Policy. We created it in response to commonly occuring challenges we've encountered and patterns we've developed while working on projects for our partners.

Major Components Used by Triage

Triage makes use of many core data science components developed at DSaPP. These components can be useful in their own right, and are worth checking out if you'd like to make use of a subset of Triage's functionality in an existing pipeline.

Components Within Triage

  • Architect: Plan, design and build train and test matrices. Includes feature and label generation.
  • Catwalk: Training, testing, and evaluating machine learning classifier models
  • Collate: Aggregation SQL Query Builder. This is used by the Architect to build features.
  • Timechop: Generate temporal cross-validation time windows for matrix creation
  • Metta-Data: Train and test matrix storage
  • Results Schema: Generate a database schema suitable for storing the results of modeling runs

Design Goals

There are two overarching design goals for Triage:

  • All configuration necessary to run the full experiment from the external interface (ie, Experiment subclasses) from beginning to end must be easily serializable and machine-constructable, to allow the eventual development of tools for users to design experiments.
  • All core functionality must be usable outside of a specific pipeline context or workflow manager. There are many good workflow managers; everybody has their favorite, and core functionality should not be designed to work with specific execution expectations.

Future Plans

  • Generation and Management of lists (ie for inspections) by various criteria
  • Integration of components with various workflow managers, like Drain and Luigi.
  • Comprehensive leakage testing of an experiment's modeling run
  • Feature Generation Wizard