[AIRFLOW-1273] Add Google Cloud ML version and model operators#2379
[AIRFLOW-1273] Add Google Cloud ML version and model operators#2379N3da wants to merge 1 commit intoapache:masterfrom
Conversation
|
@N3da, thanks for your PR! By analyzing the history of the files in this pull request, we identified @artwr, @jlowin and @criccomini to be potential reviewers. |
Codecov Report
@@ Coverage Diff @@
## master #2379 +/- ##
==========================================
- Coverage 69.26% 69.24% -0.02%
==========================================
Files 146 146
Lines 11231 11232 +1
==========================================
- Hits 7779 7778 -1
- Misses 3452 3454 +2
Continue to review full report at Codecov.
|
3b44196 to
349adce
Compare
There was a problem hiding this comment.
You can import the logging level from settings.py (settings.LOGGING_LEVEL)
There was a problem hiding this comment.
settings.LOGGING_LEVEL
956bbeb to
12c871b
Compare
There was a problem hiding this comment.
Some pydocs here would be helpful. The pydocs from these classes gets rolled into the public documentation automatically, so being extra verbose is actually good for the community.
There was a problem hiding this comment.
Some pydocs here would be helpful. The pydocs from these classes gets rolled into the public documentation automatically, so being extra verbose is actually good for the community.
airflow/utils/db.py
Outdated
There was a problem hiding this comment.
Not sure how useful this is. Was there any particular reason you added it? If you want to keep it, having an example extra={} defined (as with beeline above) could be of some use, though it'd point to some service account json file that doesn't actually exist.
There was a problem hiding this comment.
Ah. This was needed for the unit tests which start from the empty db, and it would throw when initializing the gcp_cloudml_hook without having a value for google_cloud_default (and the tests would be skipped). Not sure if the extra args would be useful though.
There was a problem hiding this comment.
One thing to consider: Airflow operators, themselves, have a retry setting that developers can use in the case of a failure. If you simply allow failures to propagate up, the task will fail, and the developer can decide whether the task should retry. It might be cleaner/easier to remove this retry logic, and just allow failures to go up the stack to Airflow, where config can dictate how to handle failures.
It is nice to have slightly more sophisticated logic, as you do, to handle retry HTTP error codes, and fail outright on others, so I could see an argument for keeping this. Just something to consider.
There was a problem hiding this comment.
Thanks Chris. The purpose of retry here is slightly different. Some of our http calls (for example for creating a Version, return a long running [Operation](https://cloud.google.com/ml-engine/reference/rest/Shared.Types/ListOperationsResponse#Operation) which the user needs to keep polling to determine when it's done. This function is to avoid polling every n seconds which might hit user's quota restrictions. In the case of errors we mostly don't want to add extra retry logic on top of existing generic logic (except for the 429 status case which can be retried).
There was a problem hiding this comment.
On second thought renamed it to _poll_... instead of _retry_... to make it more clear.
Includes Google Cloud ML hooks for version and model operations, and their unit tests. https://issues.apache.org/jira/browse/AIRFLOW-1273
|
LGTM merged! |
JIRA
Description
Includes the following changes:
https://cloud.google.com/ml-engine/reference/rest/)
CloudMLVersionOperatorandCloudMLModelOperatorsgoogle_cloud_defaultconnection.Tests
[Link to Travis:
https://travis-ci.org/N3da/incubator-airflow/builds/244668457]