Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose an API which can return dagRun status in most optimized time #27782

Open
1 of 2 tasks
sonalprsd opened this issue Nov 18, 2022 · 8 comments
Open
1 of 2 tasks

Expose an API which can return dagRun status in most optimized time #27782

sonalprsd opened this issue Nov 18, 2022 · 8 comments
Assignees
Labels
area:API Airflow's REST/HTTP API good first issue kind:feature Feature Requests

Comments

@sonalprsd
Copy link

Description

Airflow/MWAA does not seem to have any scalable API for returning the status of a dagRun, the APIs states-for-dag-run or list-runs are not scaling well. To fetch the dagRun status, every team seems to have some custom solution using sns_notification or updating the status to some external data store via Airflow callbacks.

The ask is to expose an API which can return dagRun status in most optimized time/by an internal query operation and not a scan.

Discussion #27765

Use case/motivation

My use case is to fetch the Dag status of all the Active runs and update the status tables in the system. There is a poller (with a timeout of 150s configured based on our SLA). The states-for-dag-run API seems to be doing scan operation internally. As the number of DAG runs in system increases, the time to get the status of dagRun increases further. Initially, fetching the status of 100 runs took 2.5 minutes. With increase of dagRuns in the system by 50, the fetch operation to get status for 100 dagRuns is taking more than 5 minutes.

Related issues

NA

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@sonalprsd sonalprsd added the kind:feature Feature Requests label Nov 18, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 18, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@o-nikolas o-nikolas added the area:API Airflow's REST/HTTP API label Nov 19, 2022
@potiuk
Copy link
Member

potiuk commented Nov 25, 2022

Marked as good first issue.

@potiuk
Copy link
Member

potiuk commented Nov 25, 2022

As discussed in #27765 just exposing updated_at should be enough

@Adityamalik123
Copy link
Contributor

@potiuk Can i take this task up?

@potiuk
Copy link
Member

potiuk commented Nov 30, 2022

assigned.

@vincbeck
Copy link
Contributor

Hey @Adityamalik123. Did you get a chance to start working on this task? There is absolutely no rush, I was just asking in case you need help

@Adityamalik123
Copy link
Contributor

Adityamalik123 commented Dec 12, 2022

Hey @Adityamalik123. Did you get a chance to start working on this task? There is absolutely no rush, I was just asking in case you need help

Thanks for checking in @vincbeck. I am planning to get started (and probably wrap this task up) this week, I will definitely reach you out in case i get stuck.

@sonalprsd
Copy link
Author

Hi @Adityamalik123, can you share the documentation for the API you wrote as part of this task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API good first issue kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

5 participants