Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify our code to initialize woodwork with partial schemas #2744

Closed
freddyaboulton opened this issue Sep 3, 2021 · 3 comments · Fixed by #2774
Closed

Modify our code to initialize woodwork with partial schemas #2744

freddyaboulton opened this issue Sep 3, 2021 · 3 comments · Fixed by #2774
Assignees
Labels
good first issue Issues which would be a good starting point for new hires. refactor Work being done to refactor code.

Comments

@freddyaboulton
Copy link
Contributor

freddyaboulton commented Sep 3, 2021

Woodwork introduced the ability to initialize typing information with a partial schema in alteryx/woodwork#1100

This means that we can potentially get rid of _retain_custom_types_and_intialize_woodwork and therefore retain more than just the logical types after some components modify the input data in a way that breaks the woodwork schema.

For example, this is how the TargetEncoder would look:

    X_t_df = pd.DataFrame(X_t, columns=X_ww.columns, index=X_ww.index)
    cat_columns = X_ww.ww.select("categorical", return_schema=True).columns
    columns = [c for c in X_t_df.columns if c not in cat_columns]
    # Infer types on newly created columns
    X_t_df.ww.init(schema=original_schema._get_subset_schema(columns))
@chukarsten
Copy link
Contributor

As acceptance criteria for completing this issue, let's evaluate the ability to remove _retain_custom_types_and_intialize_woodwork and refactor using the new WW capabilities.

@chukarsten
Copy link
Contributor

@freddyaboulton do we need to look at the fix in #2752 to refactor to use this as well?

@chukarsten chukarsten added good first issue Issues which would be a good starting point for new hires. refactor Work being done to refactor code. labels Sep 8, 2021
@freddyaboulton
Copy link
Contributor Author

@chukarsten If you're asking if we're going to have to revisit the Imputer as part of this refactor, then you're spot on. I think anywhere _retain_custom_types_and_initalize_woodwork is in scope for this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Issues which would be a good starting point for new hires. refactor Work being done to refactor code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants