Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on application startup if a column in sampled rows has standard deviation of 0 or NaN #318

Open
doinaleca opened this issue Mar 10, 2022 · 2 comments

Comments

@doinaleca
Copy link

I am using a very large dataset in which some float columns have low cardinality (but it is not an integer column, think of float values with 90% missing data). When rows from these columns are sampled, sometimes in the sample the standard deviation is either 0 or NaN and the following code generates an error:

xpl = SmartExplainer()
xpl.compile(
    contributions = myContributions,
    x = myData,
    model = myEstimator,
)
xpl.run_app(settings={'rows' : 1000})

image

The workaround of increasing the number of sampled rows to 10.000 or 20.000 simply crashes the app. Removing those columns from the data is not an option.

Python version : 3.9.7

Shapash version : 1.6.1

Operating System : Windows

@wahab4114
Copy link

I have the same issue and want to know the sampling technique behind it. I am scared about random sample not depicting the actual distribution of the plot(s).

@ThomasBouche
Copy link
Collaborator

Hi,
The best way to play with number of rows is to change the parameter at the top right of the webapp.
image
By default, the number is 2000.

The sampling technique is just a random.sample()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants