[AIRFLOW-2652] Implement / Enhance baseOperator deepcopy#3528
[AIRFLOW-2652] Implement / Enhance baseOperator deepcopy#3528feng-tao wants to merge 1 commit intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3528 +/- ##
==========================================
+ Coverage 77.45% 77.46% +0.01%
==========================================
Files 204 204
Lines 15235 15233 -2
==========================================
Hits 11801 11801
+ Misses 3434 3432 -2
Continue to review full report at Codecov.
|
airflow/operators/python_operator.py
Outdated
There was a problem hiding this comment.
will remove this. the function is copied from BaseOperator, but modify based on the need.
airflow/operators/python_operator.py
Outdated
There was a problem hiding this comment.
This is probably already set from the caller at that point, no need to copy paste
There was a problem hiding this comment.
thanks. will remove this.
airflow/operators/python_operator.py
Outdated
There was a problem hiding this comment.
It could be nice to make this method part of BaseOperator and make shallow_copy_attrs a class attribute. If doing this, I'd push to have a _base_operator_shallow_copy_attrs class attribute that would get merged with shallow_copy_attrs. That would make it easy to make args shallow copied for any operator.
There was a problem hiding this comment.
thanks. This is better. pr updated.
airflow/operators/python_operator.py
Outdated
There was a problem hiding this comment.
To me there are just two types of arguments here, the ones we deepcopy and the ones we shallow copy. We can assume deepcopy is the default and only have a set of shallow_copy attrs.
There was a problem hiding this comment.
thanks. This is better. pr updated.
airflow/operators/python_operator.py
Outdated
There was a problem hiding this comment.
I think this can be done in a way where everything happens in the loop above (all parameter driven) with no per-attribute assignment.
Make sure you have checked _all_ steps below. - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2652 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue. - [x] Here are some details about my PR, including screenshots of any UI changes: When running ``airflow backfill`` on pythonOperator, it will do / trigger a deepcopy of the task_instance. If some objects can't be deepcopy in certain python version(e.g Protobuf in python 2.7) , an exception will be thrown. We should just do a shallow copy instead of deep copy for the object. The pr here is to copy the ``_deepcopy__`` method in BaseOperator, but skip doing deepcopy for `op_kwargs` and `python_callable`. - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I can't think of a good way to test. We encounter this in our production. - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` Closes apache#3528 from feng-tao/airflow-2652
Make sure you have checked _all_ steps below. ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2652 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: When running ``airflow backfill`` on pythonOperator, it will do / trigger a deepcopy of the task_instance. If some objects can't be deepcopy in certain python version(e.g Protobuf in python 2.7) , an exception will be thrown. We should just do a shallow copy instead of deep copy for the object. The pr here is to copy the ``_deepcopy__`` method in BaseOperator, but skip doing deepcopy for `op_kwargs` and `python_callable`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I can't think of a good way to test. We encounter this in our production. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` Closes apache#3528 from feng-tao/airflow-2652
Make sure you have checked _all_ steps below. - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2652 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue. - [x] Here are some details about my PR, including screenshots of any UI changes: When running ``airflow backfill`` on pythonOperator, it will do / trigger a deepcopy of the task_instance. If some objects can't be deepcopy in certain python version(e.g Protobuf in python 2.7) , an exception will be thrown. We should just do a shallow copy instead of deep copy for the object. The pr here is to copy the ``_deepcopy__`` method in BaseOperator, but skip doing deepcopy for `op_kwargs` and `python_callable`. - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I can't think of a good way to test. We encounter this in our production. - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` Closes apache#3528 from feng-tao/airflow-2652
Make sure you have checked all steps below.
JIRA
Description
When running
airflow backfillon pythonOperator, it will do / trigger a deepcopy of the task_instance. If some objects can't be deepcopy in certain python version(e.g Protobuf in python 2.7) , an exception will be thrown. We should just do a shallow copy instead of deep copy for the object.The pr here is to copy the
_deepcopy__method in BaseOperator, but skip doing deepcopy forop_kwargsandpython_callable.Tests
I can't think of a good way to test. We encounter this in our production.
Commits
Documentation
Code Quality
git diff upstream/master -u -- "*.py" | flake8 --diff