Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Endpoints for Dataset Creation and Updating #36686

Closed
2 tasks done
edumuellerFSL opened this issue Jan 9, 2024 · 3 comments
Closed
2 tasks done

API Endpoints for Dataset Creation and Updating #36686

edumuellerFSL opened this issue Jan 9, 2024 · 3 comments
Assignees
Labels
area:API Airflow's REST/HTTP API area:datasets Issues related to the datasets feature kind:feature Feature Requests

Comments

@edumuellerFSL
Copy link
Contributor

edumuellerFSL commented Jan 9, 2024

Description

I would like to propose the addition of new API endpoints for creating and updating datasets in Airflow. This feature would be a valuable extension to the current dataset capabilities and would align with the direction Airflow is heading, especially considering the dataset listeners introduced in Airflow 2.8.

Proposed Changes:

  1. Addition of Dataset Create API Endpoint: This endpoint will enable users to create new datasets directly via API.
  2. Addition of Dataset Update API Endpoint: This endpoint will allow users to update existing datasets via API to trigger a change.
  3. Implementation of Tests

Use case/motivation

In a multi-instance Airflow architecture, managing dataset dependencies across instances can be challenging, as we are currently experiencing in our organization.

This feature also aligns with the recent advancements in Airflow 2.8, particularly with the introduction of dataset listeners. These developments have opened the door for improved cross-instance dataset awareness, an area where this proposal would be extremely beneficial.

We believe that with the introduction of these new endpoints, Airflow would offer a more efficient and facilitated approach to cross-instance dataset-aware scheduling. This enhancement would not only benefit our organization but also the broader Airflow community, as it is likely a common challenge faced by many and more will likely encounter in the future.

Related issues

This feature complements the discussions and contributions already seen in the community, especially those related to enhancing dataset management and integration in Airflow.

There have been some ongoing discussions and contributions on GitHub, e.g. #36308 #29162, including a previously closed Pull Request (#29433).

These discussions highlight the community's interest in and need for enhanced dataset management capabilities.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@edumuellerFSL edumuellerFSL added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Jan 9, 2024
Copy link

boring-cyborg bot commented Jan 9, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@naltimari
Copy link

I also believe this would be helpful and aligns with the data-aware scheduling feature, as well as the dataset listener feature.

@dirrao dirrao added area:datasets Issues related to the datasets feature area:API Airflow's REST/HTTP API and removed needs-triage label for new issues that we didn't triage yet labels Jan 9, 2024
@potiuk
Copy link
Member

potiuk commented Jan 10, 2024

I think it is far too big of a feature and should be discussed at the devlist. Currently all the objects (DAGs and Datasaets alike) are created by parsing DAG files, NOT by creating DB entities. IMHO it makes very little sense to start creating those datasets via APIs. Especially that Datasets are not "standalone" entities and that there is nothing that could happen if you create a dataset via API but there is no DAG file that uses it.

I am not sure what would be the consequences of it as I know there are other discussions happening about dataset future - but I think if you want to start anything about that, starting a devlist discussion and explaining what you want is really the right way of approaching it.

Converting it into discussion as this is definitely not a "feature" scope.

@apache apache locked and limited conversation to collaborators Jan 10, 2024
@potiuk potiuk converted this issue into discussion #36723 Jan 10, 2024
edumuellerFSL added a commit to edumuellerFSL/airflow that referenced this issue Jan 20, 2024
Added a new POST endpoint to the Airflow API for creating datasets. This feature includes the necessary OpenAPI specifications, TypeScript type definitions, and unit tests. It enables users to programmatically create datasets, enhancing the integration and automation capabilities of Airflow. The endpoint handles standard responses for success, unauthorized access, permission denial, and not found errors.

This is one of two PRs related to the following discussion: apache#36723

Resolves: apache#36686 (partially)
edumuellerFSL added a commit to edumuellerFSL/airflow that referenced this issue Jan 20, 2024
Added a new POST endpoint to the Airflow API for creating datasets. This feature includes the necessary OpenAPI specifications, TypeScript type definitions, and unit tests. It enables users to programmatically create datasets, enhancing the integration and automation capabilities of Airflow. The endpoint handles standard responses for success, unauthorized access, permission denial, and not found errors.

This is one of two PRs related to the following discussion: apache#36723

Resolves: apache#36686 (partially)

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
area:API Airflow's REST/HTTP API area:datasets Issues related to the datasets feature kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants