-
Notifications
You must be signed in to change notification settings - Fork 232
Add pipelines 'get' and 'reset' commands #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,7 +26,7 @@ | |
|
|
||
| import click | ||
|
|
||
| from databricks_cli.click_types import PipelineSpecClickType | ||
| from databricks_cli.click_types import PipelineSpecClickType, PipelineIdClickType | ||
| from databricks_cli.utils import eat_exceptions, CONTEXT_SETTINGS | ||
| from databricks_cli.version import print_version_callback, version | ||
| from databricks_cli.pipelines.api import PipelinesApi | ||
|
|
@@ -36,18 +36,24 @@ | |
| @click.command(context_settings=CONTEXT_SETTINGS, | ||
| short_help='Deploys a delta pipeline according to the pipeline specification') | ||
| @click.argument('spec_arg', default=None, required=False) | ||
| @click.option('--spec', default=None, help=PipelineSpecClickType.help) | ||
| @click.option('--spec', default=None, type=PipelineSpecClickType(), help=PipelineSpecClickType.help) | ||
| @debug_option | ||
| @profile_option | ||
| @eat_exceptions | ||
| @provide_api_client | ||
| def deploy_cli(api_client, spec_arg, spec): | ||
| """ | ||
| Deploys a delta pipeline according to the pipeline specification. | ||
| * The pipeline spec is a deployment specification that explains how to run a | ||
| Delta Pipeline on Databricks. | ||
| * The CLI simply forwards the spec to Databricks. | ||
| * All the local libraries referenced in the spec are uploaded to DBFS. | ||
| Deploys a delta pipeline according to the pipeline specification. The pipeline spec is a | ||
| specification that explains how to run a Delta Pipeline on Databricks. All local libraries | ||
| referenced in the spec are uploaded to DBFS. | ||
|
|
||
| Usage: | ||
|
|
||
| databricks pipelines deploy example.json | ||
|
|
||
| OR | ||
|
|
||
| databricks pipelines deploy --spec example.json | ||
| """ | ||
| if bool(spec_arg) == bool(spec): | ||
| raise RuntimeError('The spec should be provided either by an option or argument') | ||
|
|
@@ -57,30 +63,97 @@ def deploy_cli(api_client, spec_arg, spec): | |
|
|
||
|
|
||
| @click.command(context_settings=CONTEXT_SETTINGS, | ||
| short_help='Stops a delta pipeline and cleans ' | ||
| 'up Databricks resources associated with it') | ||
| short_help='Stops a delta pipeline and deletes its associated Databricks resources') | ||
| @click.argument('spec_arg', default=None, required=False) | ||
| @click.option('--spec', default=None, help=PipelineSpecClickType.help) | ||
| @click.option('--pipeline-id', default=None, | ||
| help='id associated with the pipeline to be stopped') | ||
| @click.option('--spec', default=None, type=PipelineSpecClickType(), help=PipelineSpecClickType.help) | ||
| @click.option('--pipeline-id', default=None, type=PipelineIdClickType(), | ||
| help=PipelineIdClickType.help) | ||
| @debug_option | ||
| @profile_option | ||
| @eat_exceptions | ||
| @provide_api_client | ||
| def delete_cli(api_client, spec_arg, spec, pipeline_id): | ||
| """ | ||
| Stops a delta pipeline and cleans up Databricks resources associated with it | ||
| Stops a delta pipeline and deletes its associated Databricks resources. The pipeline can be | ||
| resumed by deploying it again. | ||
|
|
||
| Usage: | ||
|
|
||
| databricks pipelines delete example.json | ||
|
|
||
| OR | ||
|
|
||
| databricks pipelines delete --spec example.json | ||
|
|
||
| OR | ||
|
|
||
| databricks pipelines delete --pipeline-id 1234 | ||
| """ | ||
| # Only one out of spec/pipeline_id/spec_arg should be supplied | ||
| if bool(spec_arg) + bool(spec) + bool(pipeline_id) != 1: | ||
| raise RuntimeError('Either spec should be provided as an argument ' | ||
| 'or option, or the pipeline-id should be provided') | ||
| if bool(spec_arg) or bool(spec): | ||
| src = spec_arg if bool(spec_arg) else spec | ||
| pipeline_id = _read_spec(src)["id"] | ||
| pipeline_id = _get_pipeline_id(spec_arg=spec_arg, spec=spec, pipeline_id=pipeline_id) | ||
| PipelinesApi(api_client).delete(pipeline_id) | ||
|
|
||
|
|
||
| @click.command(context_settings=CONTEXT_SETTINGS, | ||
| short_help='Gets a delta pipeline\'s current spec and status') | ||
| @click.argument('spec_arg', default=None, required=False) | ||
| @click.option('--spec', default=None, type=PipelineSpecClickType(), help=PipelineSpecClickType.help) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to provide the spec to an API that retrieves the spec? Perhaps we should only accept a pipeline id as argument here.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It also gets the status. Arul and Michael and I debated a bunch when we first did this for delete, but we figured it was probably best to support all options there. I think we should do the same here, but check in with early customers and see if this is making it easier or harder to use.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Classic REST will assign a unique ID on resource creation and then use that ID for subsequent requests. Accepting the entire spec for a delete() or similar call may seem harmless. But I have always believed that a good API has exactly one way to do a specific thing. Allowing to do it in multiple ways creates ambiguity, it can cause confusion, and it multiplies the amount of code that needs to be maintained (and kept backwards-compatible) in the future. One drawback of accepting a spec for get or delete etc. is that the user may think the entire spec must match, when in reality everything but ID is ignored. For example, a user may expect that the spec returned is equal to the spec passed in - that would not be the case. Or, on delete, the user may expect that it only deletes resources that match the whole spec, including, say, the name. But it will delete a pipeline with a matching ID, regardless of name. This kind of ambiguities can easily be avoided. |
||
| @click.option('--pipeline-id', default=None, type=PipelineIdClickType(), | ||
| help=PipelineIdClickType.help) | ||
| @debug_option | ||
| @profile_option | ||
| @eat_exceptions | ||
| @provide_api_client | ||
| def get_cli(api_client, spec_arg, spec, pipeline_id): | ||
| """ | ||
| Gets a delta pipeline's current spec and status. | ||
|
|
||
| Usage: | ||
|
|
||
| databricks pipelines get example.json | ||
|
|
||
| OR | ||
|
|
||
| databricks pipelines get --spec example.json | ||
|
|
||
| OR | ||
|
|
||
| databricks pipelines get --pipeline-id 1234 | ||
| """ | ||
| pipeline_id = _get_pipeline_id(spec_arg=spec_arg, spec=spec, pipeline_id=pipeline_id) | ||
| PipelinesApi(api_client).get(pipeline_id) | ||
|
|
||
|
|
||
| @click.command(context_settings=CONTEXT_SETTINGS, | ||
| short_help='Resets a delta pipeline so data can be reprocessed from scratch') | ||
| @click.argument('spec_arg', default=None, required=False) | ||
| @click.option('--spec', default=None, type=PipelineSpecClickType(), help=PipelineSpecClickType.help) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here - I think it the pipeline id is the only argument we should accept here. |
||
| @click.option('--pipeline-id', default=None, type=PipelineIdClickType(), | ||
| help=PipelineIdClickType.help) | ||
| @debug_option | ||
| @profile_option | ||
| @eat_exceptions | ||
| @provide_api_client | ||
| def reset_cli(api_client, spec_arg, spec, pipeline_id): | ||
| """ | ||
| Resets a delta pipeline by truncating tables and creating new checkpoint folders so data is | ||
| reprocessed from scratch. | ||
|
|
||
| Usage: | ||
|
|
||
| databricks pipelines reset example.json | ||
|
|
||
| OR | ||
|
|
||
| databricks pipelines reset --spec example.json | ||
|
|
||
| OR | ||
|
|
||
| databricks pipelines reset --pipeline-id 1234 | ||
| """ | ||
| pipeline_id = _get_pipeline_id(spec_arg=spec_arg, spec=spec, pipeline_id=pipeline_id) | ||
| PipelinesApi(api_client).reset(pipeline_id) | ||
|
|
||
|
|
||
| def _read_spec(src): | ||
| """ | ||
| Reads the spec at src as a JSON if no file extension is provided, or if in the extension format | ||
|
|
@@ -95,6 +168,21 @@ def _read_spec(src): | |
| raise RuntimeError('The provided file extension for the spec is not supported') | ||
|
|
||
|
|
||
| def _get_pipeline_id(spec_arg, spec, pipeline_id): | ||
| """ | ||
| Ensures that the user has either specified a spec (either through argument or option) or a | ||
| pipeline ID directly, and returns the pipeline id to use. | ||
| """ | ||
| # Only one out of spec/pipeline_id/spec_arg should be supplied | ||
| if bool(spec_arg) + bool(spec) + bool(pipeline_id) != 1: | ||
| raise RuntimeError('Either spec should be provided as an argument ' | ||
| 'or option, or the pipeline-id should be provided') | ||
| if bool(spec_arg) or bool(spec): | ||
| src = spec_arg if bool(spec_arg) else spec | ||
| pipeline_id = _read_spec(src)["id"] | ||
| return pipeline_id | ||
|
|
||
|
|
||
| @click.group(context_settings=CONTEXT_SETTINGS, | ||
| short_help='Utility to interact with the Databricks Delta Pipelines.') | ||
| @click.option('--version', '-v', is_flag=True, callback=print_version_callback, | ||
|
|
@@ -110,3 +198,5 @@ def pipelines_group(): | |
|
|
||
| pipelines_group.add_command(deploy_cli, name='deploy') | ||
| pipelines_group.add_command(delete_cli, name='delete') | ||
| pipelines_group.add_command(get_cli, name='get') | ||
| pipelines_group.add_command(reset_cli, name='reset') | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -796,9 +796,8 @@ class DeltaPipelinesService(object): | |
| def __init__(self, client): | ||
| self.client = client | ||
|
|
||
| def deploy(self, pipeline_id=None, id=None, name=None, storage=None, filters=None, | ||
| clusters=None, libraries=None, transformations=None, credentials=None, | ||
| headers=None): | ||
| def deploy(self, pipeline_id=None, id=None, name=None, storage=None, configuration=None, | ||
| clusters=None, libraries=None, transformations=None, filters=None, headers=None): | ||
| _data = {} | ||
| if pipeline_id is not None: | ||
| _data['pipeline_id'] = pipeline_id | ||
|
|
@@ -808,28 +807,35 @@ def deploy(self, pipeline_id=None, id=None, name=None, storage=None, filters=Non | |
| _data['name'] = name | ||
| if storage is not None: | ||
| _data['storage'] = storage | ||
| if filters is not None: | ||
| _data['filters'] = filters | ||
| if not isinstance(filters, dict): | ||
| raise TypeError('Expected databricks.Filters() or dict for field filters') | ||
| if configuration is not None: | ||
| _data['configuration'] = configuration | ||
| if clusters is not None: | ||
| _data['clusters'] = clusters | ||
| if libraries is not None: | ||
| _data['libraries'] = libraries | ||
| if transformations is not None: | ||
| _data['transformations'] = transformations | ||
| if credentials is not None: | ||
| _data['credentials'] = credentials | ||
| if not isinstance(credentials, dict): | ||
| raise TypeError('Expected databricks.Credentials() or dict for field credentials') | ||
| return self.client.perform_query('PUT', '/pipelines/{}'.format(pipeline_id), data=_data, headers=headers) | ||
| if filters is not None: | ||
| _data['filters'] = filters | ||
| if not isinstance(filters, dict): | ||
| raise TypeError('Expected databricks.Filters() or dict for field filters') | ||
| return self.client.perform_query('PUT', '/pipelines/{pipeline_id}', data=_data, headers=headers) | ||
|
|
||
| def delete(self, pipeline_id=None, credentials=None, headers=None): | ||
| def delete(self, pipeline_id=None, headers=None): | ||
| _data = {} | ||
| if pipeline_id is not None: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd expect that pipeline_id must not be None. |
||
| _data['pipeline_id'] = pipeline_id | ||
| if credentials is not None: | ||
| _data['credentials'] = credentials | ||
| if not isinstance(credentials, dict): | ||
| raise TypeError('Expected databricks.Credentials() or dict for field credentials') | ||
| return self.client.perform_query('DELETE', '/pipelines/{}'.format(pipeline_id), data=_data, headers=headers) | ||
| return self.client.perform_query('DELETE', '/pipelines/{pipeline_id}', data=_data, headers=headers) | ||
|
|
||
| def get(self, pipeline_id=None, headers=None): | ||
| _data = {} | ||
| if pipeline_id is not None: | ||
| _data['pipeline_id'] = pipeline_id | ||
|
mukulmurthy marked this conversation as resolved.
|
||
| return self.client.perform_query('GET', '/pipelines/{pipeline_id}', data=_data, headers=headers) | ||
|
|
||
| def reset(self, pipeline_id=None, headers=None): | ||
| _data = {} | ||
| if pipeline_id is not None: | ||
| _data['pipeline_id'] = pipeline_id | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and here |
||
| return self.client.perform_query('POST', '/pipelines/{pipeline_id}/reset', data=_data, headers=headers) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change seems unrelated - how come we don't need this any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I think it is better to make these kind of refactorings in a separate PR, but no biggie.