Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runless events - consume job event #2661

Merged
merged 5 commits into from
Nov 16, 2023
Merged

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented Oct 26, 2023

Problem

As a followup of #2641, this PR introduces support for JobEvent. PR shall be merged after #2641.

Solution

  • Schema changes: job_version_uuid column is added to job_facets table. This requires db migration to backfill existing job_facets entries.
  • JobDao as findJobByName method should work without join to runs table.
  • Add spec_event_type to lineage_events table to indicate which type of event is stored.

Limitations:

  • listLineage endpoints filters RunEvent only and does not support introduced event types. This can be implemented within other PR and issue.

Note: All database schema changes require discussion. Please link the issue for context.

One-line summary:

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Oct 26, 2023
@pawel-big-lebowski pawel-big-lebowski force-pushed the static/job-event branch 3 times, most recently from 5158fc2 to bdcfec7 Compare October 31, 2023 12:48
@codecov
Copy link

codecov bot commented Oct 31, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (3a26e50) 83.76% compared to head (0a3f98a) 84.05%.

Files Patch % Lines
api/src/main/java/marquez/db/OpenLineageDao.java 92.10% 4 Missing and 2 partials ⚠️
.../migrations/V66_3_JobFacetsBackfillJobVersion.java 81.81% 2 Missing ⚠️
...src/main/java/marquez/api/OpenLineageResource.java 75.00% 0 Missing and 1 partial ⚠️
api/src/main/java/marquez/db/models/ModelDaos.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2661      +/-   ##
============================================
+ Coverage     83.76%   84.05%   +0.28%     
- Complexity     1338     1379      +41     
============================================
  Files           247      248       +1     
  Lines          6112     6297     +185     
  Branches        281      286       +5     
============================================
+ Hits           5120     5293     +173     
- Misses          843      851       +8     
- Partials        149      153       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pawel-big-lebowski pawel-big-lebowski force-pushed the static/job-event branch 4 times, most recently from 32deb1a to 151ed6b Compare November 1, 2023 11:09
@boring-cyborg boring-cyborg bot added the docs label Nov 1, 2023
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review November 1, 2023 11:32
@pawel-big-lebowski pawel-big-lebowski marked this pull request as draft November 1, 2023 11:54
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review November 1, 2023 12:30
Base automatically changed from static/dataset-event to main November 6, 2023 07:16
CHANGELOG.md Outdated Show resolved Hide resolved
) e
GROUP BY e.run_uuid
) f ON f.run_uuid=jv.latest_run_uuid
LEFT OUTER JOIN job_versions_facets f ON j.current_version_uuid = f.job_version_uuid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

* @param jobRow The job.
* @return A {@link BagOfJobVersionInfo} object.
*/
default BagOfJobVersionInfo upsertRunlessJobVersion(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We can consider factoring out common code in JobVersionDao.upsertJobVersionOnRunTransition() and upsertRunlessJobVersion() to avoid duplication. But, not a major concern as we are aware we'll need to revisit some of this code later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do later further refactor with this section.

CREATE INDEX job_facets_job_version_uuid ON job_facets (job_version_uuid);

ALTER TABLE lineage_events ADD COLUMN spec_event_type VARCHAR(64);
UPDATE lineage_events SET spec_event_type = 'RunEvent';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set all events to RunEvent? Also, I'd define the _event_type enum as RUN_EVENT, DATASET_EVENT , or JOB_EVENT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to enum and set a default value instead of running update table.

Copy link

netlify bot commented Nov 6, 2023

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit 0a3f98a
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/65549cb91e31ca0008c63c38

@@ -0,0 +1,4 @@
CREATE TYPE EVENT_TYPE AS ENUM ('RUN_EVENT', 'DATASET_EVENT', 'JOB_EVENT');
Copy link
Member

@wslulciuc wslulciuc Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We standardized on not defining enums types in the DB layer but rather enforce them in the application layer. This way, it avoids a DB migration every time we (might) add a new event type.

tl;dr, I'd just define _event_type as a string

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added extra commit for that: 0a3f98a

DATASET_EVENT,
JOB_EVENT;
}

@SqlUpdate(
"INSERT INTO lineage_events ("
+ "event_type, "
Copy link
Member

@wslulciuc wslulciuc Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We can consider defining a _run_state column and eventually dropping the event_type. That is, we can consider columns prefixed with _ to be "remappings" of OL properties to Marquez.

Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments / suggests, otherwise 💯 💯 🥇

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@pawel-big-lebowski pawel-big-lebowski merged commit 60d7d90 into main Nov 16, 2023
16 checks passed
@pawel-big-lebowski pawel-big-lebowski deleted the static/job-event branch November 16, 2023 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes docs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants