Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Feast extractor #414

Merged
merged 5 commits into from Nov 25, 2020
Merged

Conversation

szczeles
Copy link
Contributor

@szczeles szczeles commented Nov 24, 2020

Summary of Changes

This PR provider Extractor for Feast feature store, announced in amundsen-io/amundsen#815. Apart from FeatureTables definitions, the extractor also pushes the metadata collected by Feast as programmatic descriptions, so they look like this on the Frontend:

Screenshot_2020-11-24 demo driver_trips - Amundsen Table Details

The new "extra" dependency is added: feast (python sdk for Feast), it is distributed on ASF licence.

Fixes amundsen-io/amundsen#815

Tests

Unit tests for the extractor class.

Documentation

A sample job with loader definition, doc strings for FeastExtractor class and extract method

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

Mariusz Strzelecki added 4 commits November 24, 2020 09:19
Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
@feng-tao
Copy link
Member

thanks, I will take a look.

@feng-tao
Copy link
Member

@szczeles great contribution, I wonder how easy to setup databuilder using kubeflow job? Given most people used Airflow for orchestration daily, I am interested in knowing and trying with kubeflow as well.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

columns,
)

if self._describe_feature_tables:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting to know you yield the prog description in the same extractor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was quite easy, since both tables and prog descriptions use same class. I was thinking once about creating multiple extractors - one for table and second for prog descirptions, but decided to not overcomplicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @dikshathakur3119 for FYI as this will be easier for Lyft internal programmatic description use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is super useful

)
)

for index, feature in enumerate(feature_table.features):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any order in feast for entity vs feature? or you just put the entity column first before feature column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feast doesn't define ordering, these are just different properties of the feature table. I put entities first, because they are like "primary keys" of the table.

@feng-tao
Copy link
Member

It would be good to evangelize in the feast community as well given both projects are in LFAI incubation!

@szczeles
Copy link
Contributor Author

szczeles commented Nov 25, 2020

@szczeles great contribution, I wonder how easy to setup databuilder using kubeflow job? Given most people used Airflow for orchestration daily, I am interested in knowing and trying with kubeflow as well.

Thanks! It's super easy to setup in Kubeflow. Kubeflow Pipelines is a system to orchestrate and schedule run of different containers, so I have built one Docker image with databuilder, added the scripts inside and I'm calling them one by one in the pipeline like this:
image

It would be good to evangelize in the feast community as well given both projects are in LFAI incubation!

Absolutely! As soon as the feature is merged I'm going to evangelize there :-) The lack of user interface for features exploration was noticed by Feast devs as well, see: https://docs.feast.dev/#problems-feast-does-not-yet-solve, so it can be a valuable add-on.

@feng-tao
Copy link
Member

thanks @szczeles for the info!

@feng-tao feng-tao merged commit 2343a90 into amundsen-io:master Nov 25, 2020
@szczeles szczeles deleted the feast_extractor branch November 25, 2020 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants