Skip to content

[AIRFLOW-2652] Implement / Enhance baseOperator deepcopy#3528

Closed
feng-tao wants to merge 1 commit intoapache:masterfrom
feng-tao:airflow-2652
Closed

[AIRFLOW-2652] Implement / Enhance baseOperator deepcopy#3528
feng-tao wants to merge 1 commit intoapache:masterfrom
feng-tao:airflow-2652

Conversation

@feng-tao
Copy link
Copy Markdown
Member

@feng-tao feng-tao commented Jun 21, 2018

Make sure you have checked all steps below.

JIRA

  • My PR addresses the following Airflow JIRA issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    When running airflow backfill on pythonOperator, it will do / trigger a deepcopy of the task_instance. If some objects can't be deepcopy in certain python version(e.g Protobuf in python 2.7) , an exception will be thrown. We should just do a shallow copy instead of deep copy for the object.

The pr here is to copy the _deepcopy__ method in BaseOperator, but skip doing deepcopy for op_kwargs and python_callable.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    I can't think of a good way to test. We encounter this in our production.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@feng-tao feng-tao changed the title [AIRFLOW-2652] implement / enhance pythonOperator deepcopy [AIRFLOW-2652] Implement / Enhance pythonOperator deepcopy Jun 21, 2018
@feng-tao
Copy link
Copy Markdown
Member Author

PTAL @mistercrunch @billonahill

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jun 21, 2018

Codecov Report

Merging #3528 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3528      +/-   ##
==========================================
+ Coverage   77.45%   77.46%   +0.01%     
==========================================
  Files         204      204              
  Lines       15235    15233       -2     
==========================================
  Hits        11801    11801              
+ Misses       3434     3432       -2
Impacted Files Coverage Δ
airflow/models.py 88.31% <100%> (+0.06%) ⬆️
airflow/operators/python_operator.py 94.9% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55b56a4...2eaa009. Read the comment docs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy pasted?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this. the function is copied from BaseOperator, but modify based on the need.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably already set from the caller at that point, no need to copy paste

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. will remove this.

Copy link
Copy Markdown
Member

@mistercrunch mistercrunch Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be nice to make this method part of BaseOperator and make shallow_copy_attrs a class attribute. If doing this, I'd push to have a _base_operator_shallow_copy_attrs class attribute that would get merged with shallow_copy_attrs. That would make it easy to make args shallow copied for any operator.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. This is better. pr updated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me there are just two types of arguments here, the ones we deepcopy and the ones we shallow copy. We can assume deepcopy is the default and only have a set of shallow_copy attrs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. This is better. pr updated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done in a way where everything happens in the loop above (all parameter driven) with no per-attribute assignment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the info.

@feng-tao feng-tao changed the title [AIRFLOW-2652] Implement / Enhance pythonOperator deepcopy [AIRFLOW-2652] Implement / Enhance baseOperator deepcopy Jun 21, 2018
@asfgit asfgit closed this in 07db7a3 Jun 21, 2018
feng-tao pushed a commit to lyft/airflow that referenced this pull request Jun 21, 2018
Make sure you have checked _all_ steps below.

- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
    -
https://issues.apache.org/jira/browse/AIRFLOW-2652
    - In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.

- [x] Here are some details about my PR, including
screenshots of any UI changes:
When running ``airflow backfill`` on
pythonOperator, it will do / trigger a deepcopy of
the task_instance. If some objects can't be
deepcopy in certain python version(e.g Protobuf in
python 2.7) , an exception will be thrown. We
should just do a shallow copy instead of deep copy
for the object.

The pr here is to copy the ``_deepcopy__`` method
in BaseOperator, but skip doing deepcopy for
`op_kwargs` and `python_callable`.

- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
I can't think of a good way to test. We encounter
this in our production.

- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

- [x] In case of new functionality, my PR adds
documentation that describes how to use it.
    - When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.

- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`

Closes apache#3528 from feng-tao/airflow-2652
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
Make sure you have checked _all_ steps below.

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
    -
https://issues.apache.org/jira/browse/AIRFLOW-2652
    - In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.

### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
When running ``airflow backfill`` on
pythonOperator, it will do / trigger a deepcopy of
the task_instance. If some objects can't be
deepcopy in certain python version(e.g Protobuf in
python 2.7) , an exception will be thrown. We
should just do a shallow copy instead of deep copy
for the object.

The pr here is to copy the ``_deepcopy__`` method
in BaseOperator, but skip doing deepcopy for
`op_kwargs` and `python_callable`.

### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
I can't think of a good way to test. We encounter
this in our production.

### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

### Documentation
- [x] In case of new functionality, my PR adds
documentation that describes how to use it.
    - When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.

### Code Quality
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`

Closes apache#3528 from feng-tao/airflow-2652
ashb pushed a commit to ashb/airflow that referenced this pull request Mar 27, 2019
Make sure you have checked _all_ steps below.

- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
    -
https://issues.apache.org/jira/browse/AIRFLOW-2652
    - In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.

- [x] Here are some details about my PR, including
screenshots of any UI changes:
When running ``airflow backfill`` on
pythonOperator, it will do / trigger a deepcopy of
the task_instance. If some objects can't be
deepcopy in certain python version(e.g Protobuf in
python 2.7) , an exception will be thrown. We
should just do a shallow copy instead of deep copy
for the object.

The pr here is to copy the ``_deepcopy__`` method
in BaseOperator, but skip doing deepcopy for
`op_kwargs` and `python_callable`.

- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
I can't think of a good way to test. We encounter
this in our production.

- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

- [x] In case of new functionality, my PR adds
documentation that describes how to use it.
    - When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.

- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`

Closes apache#3528 from feng-tao/airflow-2652
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants