Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better single entity API #1000

Open
kmax12 opened this issue May 29, 2020 · 1 comment
Open

Better single entity API #1000

kmax12 opened this issue May 29, 2020 · 1 comment
Labels
needs design Issues requiring design documentation. new feature suggestions for new functionality

Comments

@kmax12
Copy link
Contributor

kmax12 commented May 29, 2020

If I have a single entity, it'd be great if I could just initialize an Entity and then pass the entity to DFS. We have a lot of users who only have a single table, so we an improvement to streamline the API in this case would help a lot.

Potential API

entity = ft.Entity(
            entity_id, #optional
            dataframe,
            variable_types=variable_types,
            index=index,
            time_index=time_index,
            secondary_time_index=secondary_time_index,
            make_index=make_index
)

ft.dfs(entity, cutoff_time, trans_primitives)

Quick thoughts on how to implement

  • Update the Entity API to not require a an entityset as a param
  • move methods on Entity that require the entityset to EntitySet
  • Update dfs and calculate_feature_matrix to convert the a single entity into an entityset and then run as normal.
    • Maybe disable some arguments to dfs that don't make sense in the single table case
    • before implementing, we can discuss if we should just define a new methods instead of using DFS
  • It'd be cool if I could call normalize_entity on this entity object to then create a second entity and covert it into a entityset

We should also add a documentation guide outline how to use DFS with a single table. Right now we have answers on stackoverlow and the FAQ, but this question comes up frequently.

@kmax12 kmax12 added new feature suggestions for new functionality needs design Issues requiring design documentation. labels May 29, 2020
@dclong
Copy link

dclong commented Dec 30, 2020

IMHO, it makes more sense to work on a single DataFrame (entity).

  1. In real situations, the joining condition might be very complicated and come with additional filtering conditions.
  2. The DFS API makes users think of multiple things at the same. First, how to join tables. Second, how to generate features. By support better APIs on a single DataFrame (entity), users can get more focused on each step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs design Issues requiring design documentation. new feature suggestions for new functionality
Projects
None yet
Development

No branches or pull requests

3 participants