Skip to content

Dag tags use set instead of list#8233

Closed
zhongjiajie wants to merge 4 commits intoapache:masterfrom
zhongjiajie:dag_tag_change2set
Closed

Dag tags use set instead of list#8233
zhongjiajie wants to merge 4 commits intoapache:masterfrom
zhongjiajie:dag_tag_change2set

Conversation

@zhongjiajie
Copy link
Member

Dag tags store in db without duplicate name in each
dag, but list could contain duplicate elements, so
it's better to change tags attr in Dag from list to
set


Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

Dag tags store in db without duplicate name in each
dag, but list could contain duplicate elements, so
it's better to change tags attr in Dag from list to
set
@boring-cyborg boring-cyborg bot added provider:Apache provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues labels Apr 9, 2020
@codecov-io
Copy link

Codecov Report

Merging #8233 into master will decrease coverage by 0.29%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8233      +/-   ##
==========================================
- Coverage   88.35%   88.05%   -0.30%     
==========================================
  Files         936      936              
  Lines       45305    45316      +11     
==========================================
- Hits        40030    39905     -125     
- Misses       5275     5411     +136     
Impacted Files Coverage Δ
airflow/example_dags/example_bash_operator.py 94.44% <ø> (ø)
airflow/example_dags/example_branch_operator.py 100.00% <ø> (ø)
...ample_dags/example_branch_python_dop_operator_3.py 75.00% <ø> (ø)
airflow/example_dags/example_complex.py 100.00% <ø> (ø)
...w/example_dags/example_external_task_marker_dag.py 100.00% <ø> (ø)
...irflow/example_dags/example_kubernetes_executor.py 85.00% <ø> (ø)
...example_dags/example_kubernetes_executor_config.py 75.00% <ø> (ø)
airflow/example_dags/example_latest_only.py 100.00% <ø> (ø)
...w/example_dags/example_latest_only_with_trigger.py 100.00% <ø> (ø)
...le_dags/example_passing_params_via_test_command.py 100.00% <ø> (ø)
... and 76 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad42c40...f1b794f. Read the comment docs.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BasPH
Copy link
Contributor

BasPH commented Apr 11, 2020

@kaxil I think it might be worth changing this call to a set: https://github.com/apache/airflow/blob/master/airflow/models/dag.py#L1532. It seems currently the DagTags are made unique when fetching from the DB.

Other than that, we can close this pull request IMO.

@zhongjiajie
Copy link
Member Author

Sorry for reply late due to have cold. @kaxil @BasPH I change dag tags from list to set due to two main reason:

  1. In

    airflow/airflow/models/dag.py

    Lines 1654 to 1660 in 6ba672e

    class DagTag(Base):
    """
    A tag name per dag, to allow quick filtering in the DAG view.
    """
    __tablename__ = "dag_tag"
    name = Column(String(100), primary_key=True)
    dag_id = Column(String(ID_LEN), ForeignKey('dag.dag_id'), primary_key=True)
    we have unique key on name and dag_id but in
    tags: Optional[List[str]] = None
    we define tags in list(which may contain duplicate element). Since each dag can not contain duplicate tag, should we define Dag.tags without duplicate element and using set would be a solution
  2. Users use tag when they define DAG file with set, them may came to their mind DAG tags is a set and no duplicate element. I think will be more clear. I know we will drop the duplicate element by code or database constraint, but maybe avoid it by user define side would be better.

WDYT

@zhongjiajie
Copy link
Member Author

BTW, I submit this when I review bulk sync to database code in #8231

@kaxil
Copy link
Member

kaxil commented Apr 12, 2020

Sorry for reply late due to have cold. @kaxil @BasPH I change dag tags from list to set due to two main reason:

  1. In

    airflow/airflow/models/dag.py

    Lines 1654 to 1660 in 6ba672e

    class DagTag(Base):
    """
    A tag name per dag, to allow quick filtering in the DAG view.
    """
    __tablename__ = "dag_tag"
    name = Column(String(100), primary_key=True)
    dag_id = Column(String(ID_LEN), ForeignKey('dag.dag_id'), primary_key=True)

    we have unique key on name and dag_id but in
    tags: Optional[List[str]] = None

    we define tags in list(which may contain duplicate element). Since each dag can not contain duplicate tag, should we define Dag.tags without duplicate element and using set would be a solution
  2. Users use tag when they define DAG file with set, them may came to their mind DAG tags is a set and no duplicate element. I think will be more clear. I know we will drop the duplicate element by code or database constraint, but maybe avoid it by user define side would be better.

WDYT

I think we can just change the line @BasPH mentioned (https://github.com/apache/airflow/blob/master/airflow/models/dag.py#L1532. ) to a set. We can allow user to submit any Iterable (List, Set etc)

@zhongjiajie
Copy link
Member Author

zhongjiajie commented Apr 13, 2020

@kaxil Ok, if we just need to change one line, I prefer to do it in #8231 and close this PR

@zhongjiajie zhongjiajie deleted the dag_tag_change2set branch April 13, 2020 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments