Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Customizable feature combinations #3

Open
athewsey opened this issue Aug 22, 2022 · 1 comment
Open

[Feature request] Customizable feature combinations #3

athewsey opened this issue Aug 22, 2022 · 1 comment

Comments

@athewsey
Copy link

Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:

I would like to be able to:

  • Set up categorizing features (let's say, for illustration, CATEGORY=[footwear, t-shirts, socks], SIZE=[S, M, L, US-Mens-8, US-Womens-6) and define Factors on them
  • Generate time-series with more restricted feature combinations than the outer product (again for illustration, "t-shirt sizes for t-shirts, shoe sizes for footwear")

Today, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.

It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?

Some options I can think of:

  1. Leave the library as-is: Users generate full outer product and limit down what they want in post-processing
    • This seems possible already, but very RAM-intensive if your desired combinations are sparse?
  2. Accept an optional dataframe of factor combinations as parameter to the generate() method
    • Gives full flexibility over which combinations are kept / ignored, without assuming any particular rigid hierarchies between features
    • ...But might need to do a bit of validation to protect against user errors? May not be super easy to use without some documented examples / functions to generate the dataframe
  3. Some more complex API for feature configuration that accommodates specifying valid/invalid feature combinations
    • Might be nicer for usability, but difficult to make general: E.g. a straightforward hierarchy could be represented as a nested dict, but in practice many applications have multiple intersecting views of product category information e.g. brand, type, target segment, etc.
@ymwdalex
Copy link
Collaborator

@athewsey thanks for your great feature request, and some implementation suggestions! Personally, I like option 3, but as you said, it is not an easy one to make general.

However, I am quite busy recently, and will not have time to work on it in next few months. Feel free to improve the package if you have ideas and time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants