Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Support timeout for openlineage spark listener #2375

Closed
athityakumar opened this issue Jan 18, 2024 · 2 comments
Closed

[PROPOSAL] Support timeout for openlineage spark listener #2375

athityakumar opened this issue Jan 18, 2024 · 2 comments
Labels
kind:proposal A formal proposal for a spec-related or significant change

Comments

@athityakumar
Copy link

athityakumar commented Jan 18, 2024

Purpose:

Currently, we have seen few of our spark workloads, whose spark jobs complete within the normal duration; but the openlineage spark listener takes hours to complete processing all spark events - which keeps the infrastructure up, resulting in SLA impacts as well as infrastructure cost.

While we have been seeing different root-causes for the spark listener taking a long time to finish processing events, we'd like to have a way to reduce the blast radius. That way, even if openlineage does end up going to the long-running scenario, we have a way to configure a hard timeout limit - so that we can be sure that the jobs don't end up going out of SLA, even if lineage doesn't get captured in such cases.

Proposed implementation
This section describes how you propose to model it.

We can add support for a new spark conf, that can be configured by users like: spark.openlineage.listener.timeout.seconds=120 (say, 2 mins)

Internally, if a timeout is configured by the user, we can have an internal Executor + Future.get() with timeout / use Guava SimpleTimeLimiter to ensure all the spark events handled by OL spark listener have a max timeout rather than running for hours.

Relevant slack thread: https://openlineage.slack.com/archives/C01CK9T7HKR/p1705161285825369

@athityakumar athityakumar added the kind:proposal A formal proposal for a spec-related or significant change label Jan 18, 2024
@mobuchowski
Copy link
Member

Agreed with this approach - just would want to add that if the timeout is not configured, the execution path should be the same as it already is.

@mobuchowski
Copy link
Member

Solved by #2609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:proposal A formal proposal for a spec-related or significant change
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants