[AIRFLOW-729] Add Google Cloud Dataproc cluster creation operator#1971
[AIRFLOW-729] Add Google Cloud Dataproc cluster creation operator#1971bodschut wants to merge 1 commit intoapache:masterfrom
Conversation
Current coverage is 67.13% (diff: 100%)@@ master #1971 diff @@
==========================================
Files 135 135
Lines 10295 10295
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 6911 6912 +1
+ Misses 3384 3383 -1
Partials 0 0
|
d113330 to
84d3024
Compare
bolkedebruin
left a comment
There was a problem hiding this comment.
Some minor nits. Please check if you adhere to the commit guidelines (max columns)
There was a problem hiding this comment.
Is this smart? Should this not be a required setting instead of having a default?
There was a problem hiding this comment.
Agreed, this should not be a default
There was a problem hiding this comment.
we use logging.info across airflow. Besides can this als be "debug"?
There was a problem hiding this comment.
Hey @bolkedebruin , I actually don't agree here. Due to how logging is wired log messages under INFO will never be shown or you need to rebuild Airflow (I'm looking into it though...).
So INFO is very appropriate for Operators and Hooks. Results come in a separate logfile anyway, and you want to see how the progress of a Task goes.
There was a problem hiding this comment.
Idem as above, debug seems more appropiate.
The operator checks if there is already a cluster running with the provided name in the provided project. If so, the operator finishes successfully. Otherwise, the operator issues a rest API call to initiate the cluster creation and waits until the creation is successful before exiting.
84d3024 to
deab7dd
Compare
|
I changed the zone parameter to be required and adopted the logging.info method. However, I also feel that the INFO level is more appropriate to follow the task progress. Another question: I see that that you adhere to the 90 characters limit for line lengths (original PEP-8 recommendation). However, I believe that sometimes this is bad for code readability... A lot of modern open source projects have already extended the limit to 120 characters (which is also the default in IDE's like pycharm). What do you think of this? Thanks, |
|
@bolkedebruin I think it's good, if we get +1 for @criccomini I'll merge it in before tonight. I want to build a new alpha to run on our environment and would like to have this one and the other included. |
The operator checks if there is already a cluster running with the provided name in the provided project. If so, the operator finishes successfully. Otherwise, the operator issues a rest API call to initiate the cluster creation and waits until the creation is successful before exiting. Closes apache#1971 from bodschut/feature/dataproc_operator
Add an operator to create a Google Cloud Dataproc cluster with a given name in a given google cloud project.
The operator checks if there is already a cluster running with the provided name in the provided project.
If so, the operator finishes successfully. Otherwise, the operator issues a rest API call to initiate
the cluster creation and waits until the creation is successful before exiting. Most of the possible cluster customisation parameters available in dataproc are made available when constructing the operator.
Dear Airflow Maintainers,
Please accept this PR that addresses the following issues:
Testing Done: