Skip to content

Releases: dssg/triage

Dried Apricot

27 Aug 05:15
Compare
Choose a tag to compare

WARNING: BREAKING CHANGES!

Note that several changes in triage 5 break backwards compatibility with triage 4. If you are upgrading a project from an earlier version of triage, it is highly recommended that you first create a backup of your current database!

These breaking changes include:

  • Revision in the way the model_hash is calculated means that if you're re-running an experiment from an earlier version of triage, it will re-train your models and give them new model_ids even if the configuration hasn't changed.
  • The built_by_experiment column has been removed from triage_metadata.models in preference of tracking the specific run that built the model. The experiment_hash can still be obtained by joining to triage_metadata.triage_runs (née triage_metadata.experiment_runs). Should you need the data that was in this column at the time of migration, it can be found in triage_metadata.deprecated_models_built_by_experiment, but it will not be restored to the table upon database downgrade.
  • Changes in the structure of matrix metadata means the matrix_hash will no longer be backwards-compatible with oder version of triage (as with models, re-running an old config would result in matrices being re-created)
  • The random_seed column has been removed from triage_metadata.experiments in preference of tracking it at the run level as well. A database upgrade followed by a downgrade would lose this data (but could be recovered from the runs table)

New Functionality

  • Functionality for predicting forward, either with an existing model object or by retraining a new model with the most current data given a model_group_id (#631)
  • Utility for adding predictions to models previously trained/tested with save_predictions=False (#836)
  • Provisioner for easily setting up a postgresql database (via docker) that can be used with triage (#840)
  • More flexibility in parallelization for more resource-intensive model types, like random forests (#853)

Bug Fixes

  • Ensure model-level random seeds are re-used when the config and experiment-level random seed are unchanged (#848)
  • Remove the project path from the model_hash definition: the model_id shouldn't depend on where triage is being run (#830)
  • Ensure that feature groups are sorted in matrix metadata for consistency in downstream calculations (#833)

Thanks To

@tweddielin, @thcrock, @ecsalomon, @kasunamare

AROY-D: The Second Box

26 Aug 21:11
Compare
Choose a tag to compare

Primarily a bugfix release for anyone working on triage 4. New functionality will be introduced with the 5.0 release.

Bug Fixes

  • Fix functionality of bias analysis using aequitas during experiment runs. Previously the attributes for bias analysis were getting scrambled relative to the scores and labels when the latter get sorted for "best case" and "worst case" analyses, invalidating any results produced by these analyses. This release fixes this bug, ensure the same set of entities is provided for attributes and labels/scores, and adds a unit test to cover this issue. (#858)
  • Close database sessions during unit tests to avoid intermittent exceptions during test cleanup. (#851)

AROY-D

22 Apr 18:01
Compare
Choose a tag to compare

New Functionality

  • Added connector for aequitas visualizations (#837)
  • Allow user-specified model grids to extend presets (#843)
  • Audition improvements, including baseline models and stable color schemes (#844)

Bug Fixes

  • Fixed building triage in docker container for dirty duck tutorial (#818, #820)
  • Improve audition's handling of multiple models with different random seeds (#823)

Refactoring/Documentation

  • Switched to github actions for CI testing (#825)
  • Update dependencies (#835)

El "Patched" Paisano

09 Jul 16:05
Compare
Choose a tag to compare
El "Patched" Paisano Pre-release
Pre-release

Patched due some inconsistencies between catwalk and the newest version of sklearn

El Paisano

30 Jun 22:27
e804cf3
Compare
Choose a tag to compare
El Paisano Pre-release
Pre-release

What is in this release?

  • Now the schema is called triage_metadata instead of model_metadata (issue #700)
  • Replace flag now is passed to ModelTrainerTester (issue #784)
  • New folder structure for dirtyduck
  • New folder structure for triage in a docker
  • Fix an inconsistency in the command line option of the tutorial
  • Python version as columns in experiment run (issue #742)
  • Incorrect columns in individual_importances (issue #744)
  • Updated deprecated method calls (issues #734 and #754)
  • Long standing issue with parsedatetime resolved (issue #721 )
  • Several issues with dirtyduck solved (issues #750 #735 #736 #781)

Thanks to

Chengdu

14 Dec 00:12
Compare
Choose a tag to compare

New Functionality:

  • Evaluate on subsets [Resolves #535, #138] (#552)
  • Implement train/testing priority [Resolves #542] (#581)
  • Introduce experiment_runs table, beef up experiments table (#637)
  • Dirty duck (the whole enchilada) (#670)
  • Add compute best/worst/stochastic for each evaluation [Resolves #292] (#674)
  • Insert Ranks for Predictions [Resolves #357] (#671)
  • Support Python 3.7 [Resolves #683] (#684)
  • Bias Part 1: Protected groups generator (#680)
  • Bias part 2 (#688)
  • Added DummyClassifier to the SimpleClassifiers batch (#702)

Bug Fixes:

  • config is a str, not a fd (#610)
  • Keep PyYAML pinned as v5 breaks our usage (#615)
  • Fix cohort in unit tests, remove old code, squash some warnings (#621)
  • Fix logging of which matrix was saved (#623)
  • Harden postmodeling against lack of predictions [Resolves #638] (#645)
  • Validate distinct feature group prefixes (#634)
  • fix imports in example postmodeling notebook (#646)
  • Fixed Audition's docs (#665)
  • MS Triage (#666)
  • Fixed broken links (#675)
  • Fix Travis deploy [Resolves #493] (#677)
  • Fix logging typos that only show up when splits are empty (#685)
  • Fixes Postmodeling Weird Error [Resolves #691] (#693)
  • Don't auto-upgrade db for new Experiments [Resolves #695] (#698)
  • Check for capital letters with validator [Closes #632] (#701)
  • check for empty protected_df (#709)
  • Fixing dirtyduck (#720)
  • Update MANIFEST.in (#723)

General Improvements:

  • Read database connections from process environment (#605)
  • Scheduled monthly dependency update for March (#619)
  • Use compressed CSVs [Resolves #498] (#626)
  • Faster train/test task generation (#628)
  • Remove support for entity-only matrix indices [Resolves #477] (#622)
    -Enable dburl env var in results_schema CLI [Resolves #636] (#639)
  • Run validation by default [Closes #635] (#642)
  • Add feature_importance metric to SLR (#587)
  • Scheduled monthly dependency update for April (#664)
  • Remove redundant imputation flag columns [Resolves #544] (#676)
  • write 5+ GiB (matrices) to S3Store (#687)
  • Add more user database management options to CLI [Resolves #697] (#699)
  • Scheduled monthly dependency update for May (#679)
  • Kit and adolfos amazing adventure (aka experiment config defaults) Closes #717 (#719)

Refactoring/Documentation:

  • Broaden test coverage to CLI and postmodeling (#618)
  • Update model_group_performance.py (#650)
  • Upgrade ohio (#678)
  • Remove site dir (#686)
  • Bump experiment to v7 (#689)
  • Config doc (#694)
  • Repo readme (#682)
  • Added QuickStart guide to documentation

Arepa

20 Feb 00:28
Compare
Choose a tag to compare

New functionality:

  • Postmodel Analysis (#482)
  • Stores Timechop image to disk (#590)
  • Add matrix uuid to evaluations tables [Resolves #591] (#593)
  • Experiment Profiling [Resolves #557] (#558)

Bug fixes:

  • Postmodel fixes (#604)
  • Fixes #598 (#600)
  • Series equality operator [Resolves #563] (#564)
  • Fix MatrixStore memory leak [Resolves #594]
  • Fix empty/columns check on HDFStore [Resolves #589] (#592)
  • Fix upgrade_db to use filehandle [Resolves #572]
  • Fix FromObj.maybe_materialize [Resolves #565] (#566)
  • support 5 GB multipart upload threshold via S3Fs (#546)

General Improvements:

  • Scheduled monthly dependency update for February (#588)
  • Namespace cohort and labels tables by their config [Resolves #574] (#576)
  • Only Build Features for Cohort [Resolves #513] (#567)
  • Colocate Testing with Training [Resolves #560] (#569)
  • Upgrade PyYAML to current security-patched release
  • Skip Prediction Saving [Resolves #559]
  • Scheduled monthly dependency update for January (#562)
  • Materialize Subquery From Objects [Resolves #554] (#555)
  • Skip already-evaluated models [Resolves #540] (#541)
  • Throw warning if unscaled logit is used [Resolves #508] (#548)
  • support in develop script for detection of pyenv installed via Homebrew
  • upgrade install-cli to better support non-GNU (MacOS)
  • Cohort Generation respects replace flag [Resolves #503]

Refactoring/Documentation:

  • Add Audition, Postmodeling, Dirty Duck references to docs (#599)
  • audition_config file
  • Audition config correct (#601)
  • Experiment Architecture Doc [Resolves #579] (#580)
  • docs: make proper list of experiment upgrading links
  • Cohort and Label Deep Dive [Resolves #492] (#577)
  • Disable individual importance in example experiment config (#568)
  • Tweak language in running document

Flaming Hot Cheeto

10 Dec 16:49
Compare
Choose a tag to compare

New functionality:

  • Add additional feature group CV strategy (all-combinations) (#518)
  • Downcasting feature tables (#510)
  • Label Generation Replace Flag [Resolves #499]
  • Audition model group filter [Resovles #494] (#495)
  • development environment wizard (#511)

Bug Fixes:

  • Fix db engine check in Experiment [Resolves #538] (#539)
  • Allow >5GB matrices with S3 [Resolves #530]
  • refined test query to avoid unwarranted failure
  • Prevent experiment hanging when worker is killed by OS [Resolves #501] (#506)
  • develop script should install triage with the rq extra (#521)

General Improvements:

  • Scheduled monthly dependency update for December (#526)
  • Shorten log lines [Resolves #528]
  • Verbose config check (#483)
  • added pytest fixtures to simplify and clean up (architect) tests (#522)
  • Add HDF5 to CLI and doc [Resolves #496] (#497)

Refactoring/Documentation:

  • Move example yaml configuration files into subdirectory (#520)
  • Fix links in results_schema readme [Resolves #524] (#525)
  • Refactoring: Remove cohort options besides query [Resolves #504]

Tim Tam

02 Nov 19:55
Compare
Choose a tag to compare

Flip featuretest CLI arguments to match the doc [Resolves #486] (#487)
Scheduled monthly dependency update for November (#485)
Associate Experiment with all models and matrices [Resolves #411] (#476)
Clean up Session Closing in Predictor [Resolves #478]
Downcast matrices [Resolves #372]
Update Contribution Guide [Resolves #425]
Initial run of Black for code formatting

Introducing the CLI

05 Oct 16:45
a553361
Compare
Choose a tag to compare

Many changes here, largely related to introducing the Command Line Interface ported from DirtyDuck

Refactor functionality to ExperimentBase class [Resolves #400]
Feature testing functionality [Resolves #420]
Storage refactoring, experiment CLI [Resolves #424]
Timechop visualization CLI [Resolves #437]