Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ValueError caused by column with nan values #554

Closed
tswsxk opened this issue May 23, 2024 · 3 comments
Closed

[Bug] ValueError caused by column with nan values #554

tswsxk opened this issue May 23, 2024 · 3 comments
Assignees

Comments

@tswsxk
Copy link

tswsxk commented May 23, 2024

When using shapash along with the following codes:

xpl = SmartExplainer(
    model=model,
)

xpl.compile(
    x=test_df,
    ...
)

it will call the init_app in SmartApp class where the following codes are used to calculate the std of a certain column (line 192 - 199):

        for col in list(self.dataframe.columns):
            typ = self.dataframe[col].dtype
            if typ == float:
                std = self.dataframe[col].std()
                if std != 0:
                    digit = max(round(log10(1 / std) + 1) + 2, 0)
                    self.round_dataframe[col] = self.dataframe[col].map(f"{{:.{digit}f}}".format).astype(float)

However, when a column with nan values, std will be nan and execute the following line:

                    digit = max(round(log10(1 / std) + 1) + 2, 0)

and result in ValueError:

File "xxx/.local/lib/python3.10/site-packages/shapash/webapp/smart_app.py", line 197, in init_data
    digit = max(round(log10(1 / std) + 1) + 2, 0)
ValueError: cannot convert float NaN to integer

Python version : python3.10

Shapash version : shapash-2.5.0

Operating System : CentOS Linux release 8.2.2.2004

@guillaume-vignal
Copy link
Collaborator

Thank you to report this issue, if I understand correctly you should have only nan values in your columns, is it right ?
Actually the code should not bug if you have a column with just some nan values in it.
Can you tell us why you have such a column used in your model ? We don't see the use case.
In any case we will look at your PR and see the best way to tackle this issue.

@tswsxk
Copy link
Author

tswsxk commented Jun 13, 2024

Hi, there. Usually, the nan columns could be removed during data preprocessing. However, in my case, the data is several time series and one column contains only nan values before a specific date. Thus, when I conduct experiments, where I need to split the data according to dates for training. However, the all-nan values in this column before a specific date cause this error.

@guillaume-vignal
Copy link
Collaborator

It has been fixed with the version 2.6.0 of shapash (#553)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants