Interactions #583

mayer79 · 2022-11-15T11:08:20Z

Fantastic project.

I would love to see the possibility to add interactions on the fly, just like H20. There, you can provide a list of interaction pairs or, alternatively, a list of columns with pairwise interactions.

This would be especially useful as scikit-learn preprocessing does not allow to create dummy encodings for categorical X and then calculate their product with another feature. (At least not with neat code.)

lorentzenchr · 2023-07-28T11:30:25Z

@tbenthompson @lbittarello @jtilly Is there any official statement concerning this feature?

In my perspective, being able to specify interaction terms is the largest blind spot of production grade GLMs in python.

lbittarello · 2023-07-28T11:36:01Z

@MartinStancsicsQC is looking into it in the context of this PR in tabmat. :)

MartinStancsicsQC · 2023-08-02T16:39:29Z

Hey @mayer79, @lorentzenchr, I'd be very interested if the formula interface proposed in #670 would fit your use cases for specifying interactions. You can also find some info in this tutorial instead if the PR itself.

mayer79 · 2023-08-02T18:03:11Z

I 👍. The question is: is it efficient? (Interactions with dummies generate many 0). And: is it safe to load a serialized model and use it to predict on unseen data?

MartinStancsicsQC · 2023-08-03T07:19:56Z

Good points. It should be efficient. For example, in the case of categegorical-categorical interactions, it never actually expands them to dummies. The new (categorical) variable representing the interaction is created directly from category codes.¹

And yes, the model remains pickleable (there is a test for this on the tabmat side), and also keeps track of categorical levels² so it can still predict correctly if there are missing/unseen levels in the new data.

¹: More generally, we are not doing a pandas.DataFrame $\xrightarrow[formula]{formulaic.model\_matrix}$pandas.DataFrame $\xrightarrow[]{tabmat.from\_pandas}$ tabmat.MatrixBase type of multi-step process, but instead use an custom formulaic subclass to perform pandas.DataFrame $\xrightarrow[formula]{tabmat.TabmatMaterializer}$ tabmat.MatrixBase directly, utilizing tabmat's strengths.

²: This feature is also a bit more general, and works with a number of stateful transformations. E.g., if you use the scale function in a formula to normalize your predictors, and then you predict on new data, the latter will be normalized based on the mean and variance of the training data.

mayer79 · 2023-08-03T07:22:33Z

Wow, thanks a lot for the explanations. Really looking forward to this!

MartinStancsicsQC mentioned this issue Aug 2, 2023

Formula interface #670

Merged

1 task

MatthiasSchmidtblaicherQC mentioned this issue Feb 2, 2024

glum v3.0 #677

Merged

1 task

lbittarello closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactions #583

Interactions #583

mayer79 commented Nov 15, 2022 •

edited

lorentzenchr commented Jul 28, 2023

lbittarello commented Jul 28, 2023

MartinStancsicsQC commented Aug 2, 2023

mayer79 commented Aug 2, 2023

MartinStancsicsQC commented Aug 3, 2023

mayer79 commented Aug 3, 2023 •

edited

Interactions #583

Interactions #583

Comments

mayer79 commented Nov 15, 2022 • edited

lorentzenchr commented Jul 28, 2023

lbittarello commented Jul 28, 2023

MartinStancsicsQC commented Aug 2, 2023

mayer79 commented Aug 2, 2023

MartinStancsicsQC commented Aug 3, 2023

mayer79 commented Aug 3, 2023 • edited

mayer79 commented Nov 15, 2022 •

edited

mayer79 commented Aug 3, 2023 •

edited