-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openlineage API does not set Dataset current_version_uuid #1361
Comments
Thanks for opening this issue, @vitoravancini! As of Marquez How are input / output datasets for an OpenLineage event handled?If we reference the OL events in the Marquez quickstart, the Should an input dataset be registered and versioned if present in an OL event?As an alternative, we could register all output datasets present in an OL event. But, we'd have to consider:
For ease of usability, we may want to register and version a dataset if it does not yet exist. For example, this would be a common use case for jobs at the edge of a lineage graph. In other words, you may have an ETL job that loads data from a public vendor or there's no convenient way to link a job that produced it. @julienledem @collado-mike: It be great to get your thoughts on this. We may want to also have the OpenLineage standard outline how consumers to handle input datasets? |
@OleksandrDvornik: Since you'll be looking into this, to reproduce the issue, follow the Marquez quickstart guide using Marquez |
It seems that using the openlineage api the current version id is not populated for new datasets and the join that fetches the fields for the dataset never works.
This is the join that fails, 'dv' is never actually found and dv.fields is always null
The screenshot is for the DatasetDao.java file.
marquez/api/src/main/java/marquez/db/DatasetDao.java
Line 81 in 41d4073
Thank you!
The text was updated successfully, but these errors were encountered: