New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactions #583
Comments
@tbenthompson @lbittarello @jtilly Is there any official statement concerning this feature? In my perspective, being able to specify interaction terms is the largest blind spot of production grade GLMs in python. |
@MartinStancsicsQC is looking into it in the context of this PR in |
Hey @mayer79, @lorentzenchr, I'd be very interested if the formula interface proposed in #670 would fit your use cases for specifying interactions. You can also find some info in this tutorial instead if the PR itself. |
I 👍. The question is: is it efficient? (Interactions with dummies generate many 0). And: is it safe to load a serialized model and use it to predict on unseen data? |
Good points. It should be efficient. For example, in the case of categegorical-categorical interactions, it never actually expands them to dummies. The new (categorical) variable representing the interaction is created directly from category codes.1 And yes, the model remains pickleable (there is a test for this on the tabmat side), and also keeps track of categorical levels2 so it can still predict correctly if there are missing/unseen levels in the new data. 1: More generally, we are not doing a 2: This feature is also a bit more general, and works with a number of stateful transformations. E.g., if you use the |
Wow, thanks a lot for the explanations. Really looking forward to this! |
Fantastic project.
I would love to see the possibility to add interactions on the fly, just like H20. There, you can provide a list of interaction pairs or, alternatively, a list of columns with pairwise interactions.
This would be especially useful as scikit-learn preprocessing does not allow to create dummy encodings for categorical X and then calculate their product with another feature. (At least not with neat code.)
The text was updated successfully, but these errors were encountered: