Skip to content

Add pipelines 'get' and 'reset' commands#273

Merged
mukulmurthy merged 4 commits intodatabricks:pipelines-clientfrom
mukulmurthy:nocred
Dec 11, 2019
Merged

Add pipelines 'get' and 'reset' commands#273
mukulmurthy merged 4 commits intodatabricks:pipelines-clientfrom
mukulmurthy:nocred

Conversation

@mukulmurthy
Copy link
Copy Markdown
Collaborator

  • Add the commands databricks pipelines get and databricks pipelines reset to get and reset Delta Pipelines
  • Stop double-passing credentials to the Delta Pipelines service
  • Improve CLI documentation.

Tested with new and additional unit tests.

@mukulmurthy
Copy link
Copy Markdown
Collaborator Author

@anew and @arulajmani as potential reviewers too

@click.command(context_settings=CONTEXT_SETTINGS,
short_help='Gets a delta pipeline\'s current spec and status')
@click.argument('spec_arg', default=None, required=False)
@click.option('--spec', default=None, type=PipelineSpecClickType(), help=PipelineSpecClickType.help)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to provide the spec to an API that retrieves the spec? Perhaps we should only accept a pipeline id as argument here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also gets the status. Arul and Michael and I debated a bunch when we first did this for delete, but we figured it was probably best to support all options there. I think we should do the same here, but check in with early customers and see if this is making it easier or harder to use.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classic REST will assign a unique ID on resource creation and then use that ID for subsequent requests.

Accepting the entire spec for a delete() or similar call may seem harmless. But I have always believed that a good API has exactly one way to do a specific thing. Allowing to do it in multiple ways creates ambiguity, it can cause confusion, and it multiplies the amount of code that needs to be maintained (and kept backwards-compatible) in the future.

One drawback of accepting a spec for get or delete etc. is that the user may think the entire spec must match, when in reality everything but ID is ignored. For example, a user may expect that the spec returned is equal to the spec passed in - that would not be the case. Or, on delete, the user may expect that it only deletes resources that match the whole spec, including, say, the name. But it will delete a pipeline with a matching ID, regardless of name.

This kind of ambiguities can easily be avoided.

@click.command(context_settings=CONTEXT_SETTINGS,
short_help='Resets a delta pipeline so data can be reprocessed from scratch')
@click.argument('spec_arg', default=None, required=False)
@click.option('--spec', default=None, type=PipelineSpecClickType(), help=PipelineSpecClickType.help)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - I think it the pipeline id is the only argument we should accept here.
And the same applies to delete...

def delete(self, pipeline_id=None, credentials=None, headers=None):
def delete(self, pipeline_id=None, headers=None):
_data = {}
if pipeline_id is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that pipeline_id must not be None.

Comment thread databricks_cli/sdk/service.py
def reset(self, pipeline_id=None, headers=None):
_data = {}
if pipeline_id is not None:
_data['pipeline_id'] = pipeline_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here


@provide_conf
def test_get_cli_spec_option(pipelines_api_mock, tmpdir):
path = tmpdir.join('/spec.json').strpath
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these tests become unnecessary when you allow only the pipeline id as argument

path = '{}/{}.{}'.format(base_pipelines_dir, file_hash, extension)
return path

@staticmethod
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change seems unrelated - how come we don't need this any more?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the

Stop double-passing credentials to the Delta Pipelines service
part of the change. This method pulled CLI credentials from the config file so they could be included in the body of the request. Now that we don't need to include them in the body, we don't need this method. Arul wrote it nicely so that we could just whack this method once we fixed the credentials server-side situation.

Copy link
Copy Markdown
Contributor

@anew anew Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I think it is better to make these kind of refactorings in a separate PR, but no biggie.

Copy link
Copy Markdown
Collaborator Author

@mukulmurthy mukulmurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded to Andreas's comments

path = '{}/{}.{}'.format(base_pipelines_dir, file_hash, extension)
return path

@staticmethod
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the

Stop double-passing credentials to the Delta Pipelines service
part of the change. This method pulled CLI credentials from the config file so they could be included in the body of the request. Now that we don't need to include them in the body, we don't need this method. Arul wrote it nicely so that we could just whack this method once we fixed the credentials server-side situation.

@click.command(context_settings=CONTEXT_SETTINGS,
short_help='Gets a delta pipeline\'s current spec and status')
@click.argument('spec_arg', default=None, required=False)
@click.option('--spec', default=None, type=PipelineSpecClickType(), help=PipelineSpecClickType.help)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also gets the status. Arul and Michael and I debated a bunch when we first did this for delete, but we figured it was probably best to support all options there. I think we should do the same here, but check in with early customers and see if this is making it easier or harder to use.

Comment thread databricks_cli/sdk/service.py
Copy link
Copy Markdown
Contributor

@anew anew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's defer the API discussion (whether to take a spec or a pipeline id as arguments for get/delete/reset) and get this merged.

LGTM

@mukulmurthy mukulmurthy merged commit 2c8d637 into databricks:pipelines-client Dec 11, 2019
@mukulmurthy mukulmurthy deleted the nocred branch December 11, 2019 22:20
null-sleep pushed a commit that referenced this pull request May 18, 2020
* Add the commands `databricks pipelines get` and `databricks pipelines reset` to get and reset Delta Pipelines
* Stop double-passing credentials to the Delta Pipelines service
* Improve CLI documentation.

Tested with new and additional unit tests.
mukulmurthy added a commit that referenced this pull request Jun 1, 2020
* Add the commands `databricks pipelines get` and `databricks pipelines reset` to get and reset Delta Pipelines
* Stop double-passing credentials to the Delta Pipelines service
* Improve CLI documentation.

Tested with new and additional unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants