Error on application startup if a column in sampled rows has standard deviation of 0 or NaN #318

doinaleca · 2022-03-10T08:56:55Z

I am using a very large dataset in which some float columns have low cardinality (but it is not an integer column, think of float values with 90% missing data). When rows from these columns are sampled, sometimes in the sample the standard deviation is either 0 or NaN and the following code generates an error:

xpl = SmartExplainer()
xpl.compile(
    contributions = myContributions,
    x = myData,
    model = myEstimator,
)
xpl.run_app(settings={'rows' : 1000})

The workaround of increasing the number of sampled rows to 10.000 or 20.000 simply crashes the app. Removing those columns from the data is not an option.

Python version : 3.9.7

Shapash version : 1.6.1

Operating System : Windows

The text was updated successfully, but these errors were encountered:

wahab4114 · 2022-03-15T09:02:52Z

I have the same issue and want to know the sampling technique behind it. I am scared about random sample not depicting the actual distribution of the plot(s).

ThomasBouche · 2022-03-15T13:00:01Z

Hi,
The best way to play with number of rows is to change the parameter at the top right of the webapp.

By default, the number is 2000.

The sampling technique is just a random.sample()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on application startup if a column in sampled rows has standard deviation of 0 or NaN #318

Error on application startup if a column in sampled rows has standard deviation of 0 or NaN #318

doinaleca commented Mar 10, 2022

wahab4114 commented Mar 15, 2022

ThomasBouche commented Mar 15, 2022

Error on application startup if a column in sampled rows has standard deviation of 0 or NaN #318

Error on application startup if a column in sampled rows has standard deviation of 0 or NaN #318

Comments

doinaleca commented Mar 10, 2022

wahab4114 commented Mar 15, 2022

ThomasBouche commented Mar 15, 2022