Skip to content

[PLAT-21956] Add az-token config functionality to db cli#325

Merged
andrewmchen merged 13 commits intodatabricks:masterfrom
sushi1998:PLAT-21956-create-akv-scope
Sep 30, 2020
Merged

[PLAT-21956] Add az-token config functionality to db cli#325
andrewmchen merged 13 commits intodatabricks:masterfrom
sushi1998:PLAT-21956-create-akv-scope

Conversation

@sushi1998
Copy link
Copy Markdown
Contributor

Added a new option to the databricks-cli wherein users can run databricks-cli configure --az-token and add an az-token to the configuration. Environment variables with the bearer token (DATABRICKS_TOKEN) and the az-token (DATABRICKS_AZ_TOKEN) need to be set.

Comment thread databricks_cli/configure/cli.py Outdated

PROMPT_HOST = 'Databricks Host (should begin with https://)'
PROMPT_RESOURCE_ID = 'Resource/Workspace ID'
PROMPT_AZ_TOKEN = 'Azure Token'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This field doesn't seem to be in use anymore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROMPT_AZ_TOKEN is not being used anymore - removed it

Comment thread databricks_cli/configure/cli.py Outdated
config = ProfileConfigProvider(profile).get_config() or DatabricksConfig.empty()
host = click.prompt(PROMPT_HOST, default=config.host, type=_DbfsHost())
token = os.environ.get('DATABRICKS_TOKEN')
az_token = os.environ.get('DATABRICKS_AZ_TOKEN')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any guard against missing/invalid tokens? Seems like if users do are not guided to use the envvar approach yet, and if they don't write the variable correctly, the CLI would just silently fail by passing over the AZ token login option.

Also, it would be worth it to actually check-in with the PMs that env-var based approach would be acceptable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI itself doesn't have any test against invalid tokens - it never did. Talked with PMs about env variable approach and they said it was acceptable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. this means that AAD tokens are not going to be refreshed by our cli and we may get errors for commands running for too long? MSAL/ADAL libs provide token refresh functionality.

Comment thread databricks_cli/sdk/service.py Outdated
Comment thread databricks_cli/secrets/cli.py Outdated
' principal for this option is the group "users", which contains all users in the'
' workspace. If not specified, the initial ACL with MANAGE permission applied to the'
' scope is assigned to the request issuer\'s user identity.')
@click.option('--scope-backend-type', type=click.Choice(['azure_keyvault', 'databricks'], case_sensitive=True),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be important to clarify that one must log in with AZ tokens if the person desires to create an AKV backed scope. Also, I think we must print a user-friendly error message when an user attempts to create an AKV backed scope while not logged in with AZ tokens.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more detailed description saying that users will need AAD token when creating an AKV backed scope - but I'm not aware of any tests to check if users have configured an AAD token aside from maybe the size of the token (compared to the DB PAT). But that seems like a flimsy check, do you know of any others?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we already create AKV scopes?

Comment thread databricks_cli/configure/cli.py Outdated
config = ProfileConfigProvider(profile).get_config() or DatabricksConfig.empty()
host = click.prompt(PROMPT_HOST, default=config.host, type=_DbfsHost())

is_token_env_set = click.prompt(PROMPT_ENV_TOKEN, type=bool)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is prompting in this function twice helpful? Can't we just check the ENV_TOKEN and ENV_AZ_TOKEN keys in environ without the prompt and save the user a couple of keystrokes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and changed workflow

Comment thread databricks_cli/configure/cli.py Outdated
PROMPT_USERNAME = 'Username'
PROMPT_PASSWORD = 'Password' # NOQA
PROMPT_TOKEN = 'Token' # NOQA
ENV_AAD_TOKEN = 'DATABRICKS_TOKEN'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the bug here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the env var is DATABRICKS_TOKEN that's a bit confusing if we're asking for the AAD token right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I thought about this and for the users, to make it as less confusing as possible, I think the var should be DATABRICKS_AAD_TOKEN even though we deal with it in the same as a DATABRICKS TOKEN in the backend. I've changed the variables names to make it consistent with it as well.

Comment thread databricks_cli/configure/cli.py Outdated
from databricks_cli.configure.config import profile_option, get_profile_from_context, debug_option

PROMPT_HOST = 'Databricks Host (should begin with https://)'
PROMPT_RESOURCE_ID = 'Resource/Workspace ID'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove please.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment thread databricks_cli/sdk/__init__.py Outdated
Databricks Python REST Client 2.0 for interacting with various services.

Currently supports services including clusters, clusters policies and jobs.
Currently supports services including clusters and jobs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert changes to this file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revered back to old file

@@ -113,10 +113,10 @@ def perform_query(self, method, path, data = {}, headers = None):
if method == 'GET':
translated_data = {k: _translate_boolean_to_query_param(data[k]) for k in data}
resp = self.session.request(method, self.url + path, params = translated_data,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to revert all changes to this file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back to old file

Comment thread databricks_cli/secrets/cli.py Outdated
' scope is assigned to the request issuer\'s user identity.')
@click.option('--scope-backend-type', type=click.Choice(['AZURE_KEYVAULT', 'DATABRICKS'], case_sensitive=True),
default='DATABRICKS', help='The backend that will be used for this secret scope. '
'Options are (case-sensitive): 1) \'azure_keyvault\' and 2) \'databricks\' '
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very confusing, the message which shows up from help is this

  --scope-backend-type [AZURE_KEYVAULT|DATABRICKS]
                                  The backend that will be used for this
                                  secret scope. Options are (case-sensitive):
                                  1) 'azure_keyvault' and 2) 'databricks'
                                  (default option) Note: To create an Azure
                                  Keyvault, be sure to configure an AAD Token
                                  using'databricks-cli configure --aad-token'

Which one is it? Lower case or upper case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is uppercase - changed the message

Comment thread databricks_cli/secrets/cli.py Outdated
default='DATABRICKS', help='The backend that will be used for this secret scope. '
'Options are (case-sensitive): 1) \'azure_keyvault\' and 2) \'databricks\' '
'(default option)'
'\nNote: To create an Azure Keyvault, be sure to configure an AAD Token using'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing trailing space here!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

_data['timeout_seconds'] = timeout_seconds
return self.client.perform_query('POST', '/jobs/runs/submit', data=_data, headers=headers)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we are adding spaces. Is it desired?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was autogenerated save for a few necessary changes, but nothing around this particular line was perturbed manually.

@profile_option
@eat_exceptions
def secrets_group():
def secrets_group(): # pragma: no cover
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intended?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a result of resolving the merge conflict? Not my change, but certainly came from master

return self.client.perform_query('POST', '/dbfs/create', data=_data, headers=headers)


def create_test(self, path, overwrite=None, headers=None):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did these set of functions come from?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from autogeneration of this file - has nothing to do with my change. Seems like it was out of sync with changes made in universe.

def create_cluster(self, num_workers=None, autoscale=None, cluster_name=None, spark_version=None,
spark_conf=None, aws_attributes=None, node_type_id=None,
driver_node_type_id=None, ssh_public_keys=None, custom_tags=None,
cluster_log_conf=None, init_scripts=None, spark_env_vars=None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you are removing the init_scripts param and the following variable-setting. I'm wondering this is part of what PR intends, and if so, we need to update the descriptions & possibly split the PR I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once again, this was a result of autogeneration, wasn't relevant to my changes at all. Should I revert this particular function back to it's original form?

@andrewmchen andrewmchen merged commit 22a6f38 into databricks:master Sep 30, 2020
return self.client.perform_query('GET', '/dbfs/read', data=_data, headers=headers)


def read_test(self, path, offset=None, length=None, headers=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need /dbfs-testing for?

@nfx
Copy link
Copy Markdown
Contributor

nfx commented Oct 6, 2020

this PR doesn't cover token refresh functionality. az cli refreshes tokens every 10 minutes and tokens that are about to expire in 30 seconds will fail requests.

sd2k pushed a commit to sd2k/databricks-cli that referenced this pull request Nov 18, 2020
This was removed in [this PR][!325] by autogeneration, but the
/clusters/create API endpoint should still accept the init_scripts
argument, and it's useful for creating clusters programmatically.

I'm not sure how autogeneration works so this could happen again in
future - perhaps it should be looked into somewhere upstream instead,
but for now this would be useful!

[!325]: databricks#325 (comment)

def create_scope(self, scope, initial_manage_principal):
return self.client.create_scope(scope, initial_manage_principal)
def create_scope(self, scope, initial_manage_principal, scope_backend_type,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, please try to avoid making breaking API changes. These new required positional arguments break anything that use this API.
Prefer to add new arguments with default values that retain the previous functionality, or instead add new methods if that cannot be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants