Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique identifier column to nannyML datasets #348

Merged
merged 5 commits into from
Jan 18, 2024

Conversation

santiviquez
Copy link
Contributor

This PR adds an identifier column to every dataset that is part of nannyML OSS so they can easily be used in the cloud product when running examples/tutorials.

Copy link

codecov bot commented Jan 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (20cc6f7) 83.05% compared to head (b9613f1) 83.05%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #348   +/-   ##
=======================================
  Coverage   83.05%   83.05%           
=======================================
  Files         100      100           
  Lines        7554     7554           
  Branches     1351     1351           
=======================================
  Hits         6274     6274           
  Misses        956      956           
  Partials      324      324           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nnansters nnansters merged commit 7b5b969 into NannyML:main Jan 18, 2024
7 checks passed
@santiviquez
Copy link
Contributor Author

Oh thanks for fixing the broken tests @nnansters 💜

nnansters added a commit that referenced this pull request Jan 22, 2024
* add unique ID column

* Remove duplicate 'identifier' column

* Fix broken tests

* isort changes

---------

Co-authored-by: Niels Nuyttens <niels@nannyml.com>
nnansters added a commit that referenced this pull request Jan 22, 2024
* First version of continuous distribution calculator working

* Refactor plotting to support drift results for alert specification

* Support running ContinuousDistributionCalculator in the Runner

* Fix pickling ContinuousDistributionCalculator

* Working version of CategoricalDistributionCalculator

* This is not how overload works.

* Getting index-based plots to work

* Support categorical distribution calculator in the runner

* Fix Flake8 & mypy

* Expose option to downscale resolution of individual joyplots for continuous distribution plots

* Expose cumulative density for KDE quartiles

* Use first point >= quartile instead of closest

* Updated default thresholds for Univariate Drift detection methods

* Fix broken ranker tests. This is why we do PR's kids.

* Fix linting

* Register summary stats in CLI runner (#353)

* Unique identifier column to nannyML datasets (#348)

* add unique ID column

* Remove duplicate 'identifier' column

* Fix broken tests

* isort changes

---------

Co-authored-by: Niels Nuyttens <niels@nannyml.com>

---------

Co-authored-by: Michael Van de Steene <michael@nannyml.com>
Co-authored-by: Michael Van de Steene <124588413+michael-nml@users.noreply.github.com>
Co-authored-by: Santiago Víquez <santi.viquez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants