Categorical (binary) column incorrectly treated as continuous for Univariate Drift Detection #171

nikml · 2022-12-08T17:24:23Z

Describe the bug
The binary predictions from the synthetic binary classification are treated as continuous rather than categorical.

To Reproduce
Steps to reproduce the behavior:
Run the Univariate Drift Example Notebook from where documentation is created.
y_pred is treated as continuous instead of categorical.

Expected behavior
Column would be treated as continuous.

Screenshots & scripts
The variable is present in the continuous drift results for v0.8.1:
https://nannyml.readthedocs.io/en/v0.8.1/_images/drift-guide-continuous.svg

The text was updated successfully, but these errors were encountered:

nnansters · 2022-12-14T19:03:19Z

Hey Nikos,

this behavior is correct. The columns are designated by NannyML as continuous or categorical in the base module.

You are right however that this is not the expected behavior given the example in the docs. This can be fixed by explicitly setting the y_pred column as categorical. I'll update this in the documentation.

reference_df['y_pred'] = reference_df['y_pred'].astype("category")
analysis_df['y_pred'] = analysis_df['y_pred'].astype("category")

column_names = ['distance_from_office', 'salary_range', 'gas_price_per_litre', 'public_transportation_cost', 'wfh_prev_workday', 'workday', 'tenure', 'y_pred_proba', 'y_pred']
calc = nml.UnivariateDriftCalculator(
    column_names=column_names,
    timestamp_column_name='timestamp',
    continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
    categorical_methods=['chi2', 'jensen_shannon'],
)

Signed-off-by: niels <niels@nannyml.com>

nikml · 2022-12-14T21:07:01Z

I looked a bit further into this. Quickstart is also affected. And actually the issue was introduced in version 0.7.0 when we removed the StatisticalOutputDriftCalculator.

So we should also fix that and see if documentation needs a little more polishing.

* Many Updates to Univariate Drift Comparison * Update Univariate Drift Tutorial * Update Readme, fixing incorrect images for drift * Remove unneeded drift images * Fix PCA How it works page showing outdated code. * Fix realized regression performance docs and relevant readme plot * Remove unneeded realized performance images * Fix quickstart re #171 Co-authored-by: cartgr <carterblair@uvic.ca> Co-authored-by: Jakub Bialek <jakub@nannyml.com>

nikml · 2022-12-15T19:59:48Z

Closing as quickstart also received a hot fix - we can polish the docs later.

nikml added bug Something isn't working triage Needs to be assessed labels Dec 8, 2022

nikml assigned nnansters Dec 8, 2022

nnansters added documentation Improvements or additions to documentation and removed bug Something isn't working triage Needs to be assessed labels Dec 14, 2022

nnansters added a commit that referenced this issue Dec 14, 2022

Docs: update incorrect interpretation of categorical column (#171)

fc0a753

Signed-off-by: niels <niels@nannyml.com>

nikml mentioned this issue Dec 15, 2022

Update NannyML documentation #170

Merged

nikml closed this as completed Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical (binary) column incorrectly treated as continuous for Univariate Drift Detection #171

Categorical (binary) column incorrectly treated as continuous for Univariate Drift Detection #171

nikml commented Dec 8, 2022

nnansters commented Dec 14, 2022

nikml commented Dec 14, 2022

nikml commented Dec 15, 2022

Categorical (binary) column incorrectly treated as continuous for Univariate Drift Detection #171

Categorical (binary) column incorrectly treated as continuous for Univariate Drift Detection #171

Comments

nikml commented Dec 8, 2022

nnansters commented Dec 14, 2022

nikml commented Dec 14, 2022

nikml commented Dec 15, 2022