Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Dependence can raise TypeConversionError if non-integer grid values are produced for IntegerNullable column #4095

Closed
tamargrey opened this issue Mar 21, 2023 · 0 comments · Fixed by #4096
Labels
bug Issues tracking problems with existing features.

Comments

@tamargrey
Copy link
Contributor

Currently in partial dependence, if you are calculating the partial dependence for an Integer logical type column, when we get the grid values with _grid_from_X, if there are more unique values in the integer column than the grid resolution, we calculate grid values to use with mscipy.sttats.mquantiles. This can introduce fractional values. With the Integer logical type, we can set a column with fractional values to the int64 dtype and pandas will truncate the value. Our IntegerNullable logical type, which uses Int64 dtype, on the other hand, does not allow that. Because of this, we’ll need to make some change to allow partial dependence on IntegerNullable columns (I don’t view this as an integer nullable incompatibility, since I think pandas is probably right to not just silently truncate your data).

Code Sample, a copy-pastable example to reproduce your bug.

    y = ww.init_series(pd.Series([True, False]*25), logical_type="Boolean")
    X = pd.DataFrame({
        "col": pd.Series(range(len(y)))
    })
    X.ww.init(logical_types={"col": "IntegerNullable"})

    pipeline = logistic_regression_binary_pipeline

    pipeline.fit(X, y)
    partial_dependence(
        pipeline,
        X,
        grid_resolution=10,
        features="col",
    )

We will need to handle the IntegerNullable case, and while we're doing that, we should make the handling of fractional values for Integer columns explicit by rounding instead of letting the values get truncated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant