[Feature request] Customizable feature combinations #3

athewsey · 2022-08-22T08:50:40Z

Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:

I would like to be able to:

Set up categorizing features (let's say, for illustration, CATEGORY=[footwear, t-shirts, socks], SIZE=[S, M, L, US-Mens-8, US-Womens-6) and define Factors on them
Generate time-series with more restricted feature combinations than the outer product (again for illustration, "t-shirt sizes for t-shirts, shoe sizes for footwear")

Today, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.

It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?

Some options I can think of:

Leave the library as-is: Users generate full outer product and limit down what they want in post-processing
- This seems possible already, but very RAM-intensive if your desired combinations are sparse?
Accept an optional dataframe of factor combinations as parameter to the generate() method
- Gives full flexibility over which combinations are kept / ignored, without assuming any particular rigid hierarchies between features
- ...But might need to do a bit of validation to protect against user errors? May not be super easy to use without some documented examples / functions to generate the dataframe
Some more complex API for feature configuration that accommodates specifying valid/invalid feature combinations
- Might be nicer for usability, but difficult to make general: E.g. a straightforward hierarchy could be represented as a nested dict, but in practice many applications have multiple intersecting views of product category information e.g. brand, type, target segment, etc.

The text was updated successfully, but these errors were encountered:

ymwdalex · 2022-08-24T07:38:55Z

@athewsey thanks for your great feature request, and some implementation suggestions! Personally, I like option 3, but as you said, it is not an easy one to make general.

However, I am quite busy recently, and will not have time to work on it in next few months. Feel free to improve the package if you have ideas and time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Customizable feature combinations #3

[Feature request] Customizable feature combinations #3

athewsey commented Aug 22, 2022

ymwdalex commented Aug 24, 2022

[Feature request] Customizable feature combinations #3

[Feature request] Customizable feature combinations #3

Comments

athewsey commented Aug 22, 2022

ymwdalex commented Aug 24, 2022