Unique identifier column to nannyML datasets #348

santiviquez · 2023-12-14T12:22:45Z

This PR adds an identifier column to every dataset that is part of nannyML OSS so they can easily be used in the cloud product when running examples/tutorials.

codecov · 2024-01-18T21:37:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (20cc6f7) 83.05% compared to head (b9613f1) 83.05%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #348   +/-   ##
=======================================
  Coverage   83.05%   83.05%           
=======================================
  Files         100      100           
  Lines        7554     7554           
  Branches     1351     1351           
=======================================
  Hits         6274     6274           
  Misses        956      956           
  Partials      324      324

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

santiviquez · 2024-01-18T21:38:59Z

Oh thanks for fixing the broken tests @nnansters 💜

* add unique ID column * Remove duplicate 'identifier' column * Fix broken tests * isort changes --------- Co-authored-by: Niels Nuyttens <niels@nannyml.com>

* First version of continuous distribution calculator working * Refactor plotting to support drift results for alert specification * Support running ContinuousDistributionCalculator in the Runner * Fix pickling ContinuousDistributionCalculator * Working version of CategoricalDistributionCalculator * This is not how overload works. * Getting index-based plots to work * Support categorical distribution calculator in the runner * Fix Flake8 & mypy * Expose option to downscale resolution of individual joyplots for continuous distribution plots * Expose cumulative density for KDE quartiles * Use first point >= quartile instead of closest * Updated default thresholds for Univariate Drift detection methods * Fix broken ranker tests. This is why we do PR's kids. * Fix linting * Register summary stats in CLI runner (#353) * Unique identifier column to nannyML datasets (#348) * add unique ID column * Remove duplicate 'identifier' column * Fix broken tests * isort changes --------- Co-authored-by: Niels Nuyttens <niels@nannyml.com> --------- Co-authored-by: Michael Van de Steene <michael@nannyml.com> Co-authored-by: Michael Van de Steene <124588413+michael-nml@users.noreply.github.com> Co-authored-by: Santiago Víquez <santi.viquez@gmail.com>

add unique ID column

d906659

santiviquez requested review from nnansters and nikml as code owners December 14, 2023 12:22

nnansters added 4 commits January 17, 2024 23:34

Merge branch 'main' into fork/id_column_datasets

39587f8

Remove duplicate 'identifier' column

3e1c1c1

Fix broken tests

5c390b9

isort changes

b9613f1

nnansters approved these changes Jan 18, 2024

View reviewed changes

nnansters merged commit 7b5b969 into NannyML:main Jan 18, 2024
7 checks passed

nnansters added a commit that referenced this pull request Jan 22, 2024

Unique identifier column to nannyML datasets (#348)

eaef148

* add unique ID column * Remove duplicate 'identifier' column * Fix broken tests * isort changes --------- Co-authored-by: Niels Nuyttens <niels@nannyml.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique identifier column to nannyML datasets #348

Unique identifier column to nannyML datasets #348

santiviquez commented Dec 14, 2023

codecov bot commented Jan 18, 2024

santiviquez commented Jan 18, 2024

Unique identifier column to nannyML datasets #348

Unique identifier column to nannyML datasets #348

Conversation

santiviquez commented Dec 14, 2023

codecov bot commented Jan 18, 2024

Codecov Report

santiviquez commented Jan 18, 2024