Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset list remembers old datasets #26381

Closed
1 of 2 tasks
MatrixManAtYrService opened this issue Sep 14, 2022 · 3 comments · Fixed by #27828
Closed
1 of 2 tasks

Dataset list remembers old datasets #26381

MatrixManAtYrService opened this issue Sep 14, 2022 · 3 comments · Fixed by #27828
Labels
AIP-48 Data-aware Scheduling area:core kind:bug This is a clearly a bug

Comments

@MatrixManAtYrService
Copy link
Contributor

Apache Airflow version

main (development)

What happened

Here's a simple pair of dags linked by a dataset:

from datetime import datetime
from airflow import Dataset
from airflow.operators.empty import EmptyOperator
from airflow.decorators import dag

dataset = Dataset("dataset")

@dag(start_date=datetime(1970, 1, 1))
def upstream():
    EmptyOperator(task_id="empty", outlets=[dataset])

upstream()


@dag(start_date=datetime(1970, 1, 1), schedule=[dataset])
def downstream():
    EmptyOperator(task_id="empty")

downstream()

I let airflow parse this, then I made the following edit:

d̶a̶t̶a̶s̶e̶t̶ ̶=̶ ̶D̶a̶t̶a̶s̶e̶t̶"̶d̶a̶t̶a̶s̶e̶t̶"̶)̶
dataset = Dataset("dataset1")

I let airflow parse it again, then I made another edit:

d̶a̶t̶a̶s̶e̶t̶ ̶=̶ ̶D̶a̶t̶a̶s̶e̶t̶"̶d̶a̶t̶a̶s̶e̶t̶1̶"̶)̶
dataset = Dataset("dataset2")

Then I viewed the datasets page:
Screen Shot 2022-09-14 at 12 07 08 AM

Notice that instead of updating the name of the dataset, as I intended, I ended up creating three datasets.

What you think should happen instead

If there are no dags which reference a dataset, it should not be shown in the dataset list

How to reproduce

update a dataset URI

Operating System

docker/debian

Versions of Apache Airflow Providers

n/a

Deployment

Astronomer

Deployment details

astro dev start

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@MatrixManAtYrService MatrixManAtYrService added area:core kind:bug This is a clearly a bug labels Sep 14, 2022
@uranusjr uranusjr added this to the Airflow 2.4.0 milestone Sep 14, 2022
@uranusjr uranusjr added the AIP-48 Data-aware Scheduling label Sep 14, 2022
@pierrejeambrun
Copy link
Member

pierrejeambrun commented Sep 15, 2022

Hello,

@jedcunningham, @uranusjr, I would be glad to help on this one :)

Should we add query strings to the endpoint to be able to filter out datasets that do not have any consuming_dags and producing_tasks?

@jedcunningham
Copy link
Member

jedcunningham commented Sep 15, 2022

It's probably worth waiting until #26358 is done first.

That said, I'm a little torn on just not showing they at all. Having a way to toggle the behavior might make sense, maybe something like on the homepage with pause/unpaused?

All | Active | Never Updated | Orphaned

@blag
Copy link
Contributor

blag commented Sep 16, 2022

I like the idea of separating out dataset update statuses into separate tabs/categories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-48 Data-aware Scheduling area:core kind:bug This is a clearly a bug
Projects
None yet
7 participants