Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Format precentage in new category output #1766

Closed
shir22 opened this issue Jul 14, 2022 · 9 comments · Fixed by #1860
Closed

[BUG] Format precentage in new category output #1766

shir22 opened this issue Jul 14, 2022 · 9 comments · Fixed by #1860
Assignees
Labels
Milestone

Comments

@shir22
Copy link
Contributor

shir22 commented Jul 14, 2022

Describe the bug
CleanShot 2022-07-14 at 10 38 05@2x

See "percent of new category in sample"

To Reproduce
Use to following dataset:
https://github.com/AllonHammer/CPI_HRNN/blob/master/resources/cpi_us_dataset.csv
Dataset definition:
ds = Dataset(df, datetime_name='Date', label='Price', cat_features=['Category_id', 'Category', 'Indent', 'Parent', 'Parent_ID'])
Split first 40000 samples to be train, and the rest to be test.
And run the relevant checks (or the train-test-validation suite)

@shir22 shir22 added the bug label Jul 14, 2022
@shir22 shir22 added this to the Copernicus milestone Jul 14, 2022
@github-actions github-actions bot added the needs triage Issue needs to be labeled and prioritized label Jul 14, 2022
@noamzbr noamzbr removed the needs triage Issue needs to be labeled and prioritized label Jul 14, 2022
@kishore-s-15
Copy link
Contributor

@shir22 Could you mention the steps to reproduce this issue ?

@TheSolY
Copy link
Contributor

TheSolY commented Jul 17, 2022

@shir22 If I understood correctly, the issue is that the conditions summary shows "0.02%" but the additional outputs show "0.00" and they should show the same number. Which dataset did you use?

@shir22
Copy link
Contributor Author

shir22 commented Jul 17, 2022

Yes, indeed @TheSolY . And specifically, for consistency, to use the same formatting function as is used in the "More Info" in the Conditions Summary table...

@kishore-s-15 About the dataset + steps to reproduce: I added the specific steps in the edited issue description

@TheSolY TheSolY removed their assignment Jul 17, 2022
@shir22
Copy link
Contributor Author

shir22 commented Jul 18, 2022

@kishore-s-15 would you like to be assigned to this issue?

@kishore-s-15
Copy link
Contributor

@shir22 @noamzbr Sure.

@noamzbr noamzbr assigned kishore-s-15 and unassigned TheSolY Jul 19, 2022
@noamzbr
Copy link
Collaborator

noamzbr commented Jul 19, 2022

Granted, and much appreciated!

@kishore-s-15
Copy link
Contributor

@shir22 Could you provide the code to reproduce the above error?

import pandas as pd

from deepchecks.tabular.dataset import Dataset
from deepchecks.suites import train_test_validation

df = pd.read_csv("./cpi_us_dataset.csv")

train_df = df.iloc[:40000, :]
test_df = df.iloc[40000:, :]

train_ds = Dataset(train_df, datetime_name='Date', label='Price',
        cat_features=['Category_id', 'Category', 'Indent', 'Parent', 'Parent_ID'])

test_ds = Dataset(test_df, datetime_name='Date', label='Price',
        cat_features=['Category_id', 'Category', 'Indent', 'Parent', 'Parent_ID'])

suite = train_test_validation()
suite.run(train_ds, test_ds)

I used the above code but was not able to reproduce the issue.

@shir22
Copy link
Contributor Author

shir22 commented Jul 24, 2022

Can you show the print screen of the Category Mismatch Test?
I just ran your code now, and this was the output, like in the original description it shows 0.00

CleanShot 2022-07-24 at 18 51 55

@kishore-s-15
Copy link
Contributor

My bad, I ran the code as a script file instead of a notebook file. Got the same output now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants