New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-6759] Added MLEngine operator/hook to cancel MLEngine jobs #7400
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
Cancels a MLEngine job. | ||
|
||
:param project_id: The Google Cloud project id within which MLEngine | ||
job will be launched. If set to None or missing, the default project_id from the GCP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment (describing the behaviour when project_id
is None
) is inconsistent with the code below.
Added types for `job_id` Co-Authored-By: Tomek Urbaszek <turbaszek@gmail.com>
training job. (templated) | ||
:type job_id: str | ||
:param project_id: The Google Cloud project name within which MLEngine training job should run. | ||
If set to None or missing, the default project_id from the GCP connection is used. (templated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment above, it seems like the code raises an error if project_id
is none/missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All operators except BigQuery allow passing project_id as a parameter. However, if this parameter is omitted, the default value will be read from the credentials.
https://github.com/apache/airflow/blob/97a429f/airflow/providers/google/cloud/hooks/base.py#L334-L360
It is very difficult to authorize and not get project_id. This is not even possible with production deployment. You may not have project_id when using gcloud for authorization only.
template_fields = [ | ||
'_project_id', | ||
'_job_id', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let use tuple here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an option, but I thought it'd be better to maintain consistency between the operators (all other ones uses an array). Unless, there is some other non-style related reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It varies. But in my opinion, this is immutable field :)
if not self._project_id: | ||
raise AirflowException('Google Cloud project id is required.') | ||
if not self._job_id: | ||
raise AirflowException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise AirflowException( |
No need for that as job_id
is required parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhmm, I'm not super sure why it was raised either, but I see the same checking in MLEngineStartBatchPredictionJobOperator
and MLEngineStartTrainingJobOperator
. And I thought I better include it as well 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MLEngineOperators are not best ones :D
cleaner formating Co-Authored-By: Tomek Urbaszek <turbaszek@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #7400 +/- ##
==========================================
- Coverage 86.5% 86.37% -0.14%
==========================================
Files 873 878 +5
Lines 40725 41189 +464
==========================================
+ Hits 35231 35576 +345
- Misses 5494 5613 +119
Continue to review full report at Codecov.
|
Awesome work, congrats on your first merged pull request! |
…pache#7400) * [AIRFLOW-6759] Added MLEngine operator/hook to cancel MLEngine jobs * Update airflow/providers/google/cloud/hooks/mlengine.py Added types for `job_id` Co-Authored-By: Tomek Urbaszek <turbaszek@gmail.com> * Updates cancel_job doc * Update airflow/providers/google/cloud/hooks/mlengine.py cleaner formating Co-Authored-By: Tomek Urbaszek <turbaszek@gmail.com> * removed redundant error checking Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
Added
MLEngineTrainingJobFailureOperator
withcancel_job
hook for MLEngine.Issue link: AIRFLOW-6759
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]
. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.