[PLAT-21956] Add az-token config functionality to db cli#325
[PLAT-21956] Add az-token config functionality to db cli#325andrewmchen merged 13 commits intodatabricks:masterfrom
Conversation
|
|
||
| PROMPT_HOST = 'Databricks Host (should begin with https://)' | ||
| PROMPT_RESOURCE_ID = 'Resource/Workspace ID' | ||
| PROMPT_AZ_TOKEN = 'Azure Token' |
There was a problem hiding this comment.
nit: This field doesn't seem to be in use anymore.
There was a problem hiding this comment.
PROMPT_AZ_TOKEN is not being used anymore - removed it
| config = ProfileConfigProvider(profile).get_config() or DatabricksConfig.empty() | ||
| host = click.prompt(PROMPT_HOST, default=config.host, type=_DbfsHost()) | ||
| token = os.environ.get('DATABRICKS_TOKEN') | ||
| az_token = os.environ.get('DATABRICKS_AZ_TOKEN') |
There was a problem hiding this comment.
Do we have any guard against missing/invalid tokens? Seems like if users do are not guided to use the envvar approach yet, and if they don't write the variable correctly, the CLI would just silently fail by passing over the AZ token login option.
Also, it would be worth it to actually check-in with the PMs that env-var based approach would be acceptable.
There was a problem hiding this comment.
The CLI itself doesn't have any test against invalid tokens - it never did. Talked with PMs about env variable approach and they said it was acceptable
There was a problem hiding this comment.
e.g. this means that AAD tokens are not going to be refreshed by our cli and we may get errors for commands running for too long? MSAL/ADAL libs provide token refresh functionality.
| ' principal for this option is the group "users", which contains all users in the' | ||
| ' workspace. If not specified, the initial ACL with MANAGE permission applied to the' | ||
| ' scope is assigned to the request issuer\'s user identity.') | ||
| @click.option('--scope-backend-type', type=click.Choice(['azure_keyvault', 'databricks'], case_sensitive=True), |
There was a problem hiding this comment.
It would be important to clarify that one must log in with AZ tokens if the person desires to create an AKV backed scope. Also, I think we must print a user-friendly error message when an user attempts to create an AKV backed scope while not logged in with AZ tokens.
There was a problem hiding this comment.
Added more detailed description saying that users will need AAD token when creating an AKV backed scope - but I'm not aware of any tests to check if users have configured an AAD token aside from maybe the size of the token (compared to the DB PAT). But that seems like a flimsy check, do you know of any others?
There was a problem hiding this comment.
can we already create AKV scopes?
| config = ProfileConfigProvider(profile).get_config() or DatabricksConfig.empty() | ||
| host = click.prompt(PROMPT_HOST, default=config.host, type=_DbfsHost()) | ||
|
|
||
| is_token_env_set = click.prompt(PROMPT_ENV_TOKEN, type=bool) |
There was a problem hiding this comment.
Is prompting in this function twice helpful? Can't we just check the ENV_TOKEN and ENV_AZ_TOKEN keys in environ without the prompt and save the user a couple of keystrokes?
There was a problem hiding this comment.
Removed this and changed workflow
| PROMPT_USERNAME = 'Username' | ||
| PROMPT_PASSWORD = 'Password' # NOQA | ||
| PROMPT_TOKEN = 'Token' # NOQA | ||
| ENV_AAD_TOKEN = 'DATABRICKS_TOKEN' |
There was a problem hiding this comment.
This seems like a bug?
There was a problem hiding this comment.
What's the bug here?
There was a problem hiding this comment.
The name of the env var is DATABRICKS_TOKEN that's a bit confusing if we're asking for the AAD token right?
There was a problem hiding this comment.
You're right. I thought about this and for the users, to make it as less confusing as possible, I think the var should be DATABRICKS_AAD_TOKEN even though we deal with it in the same as a DATABRICKS TOKEN in the backend. I've changed the variables names to make it consistent with it as well.
| from databricks_cli.configure.config import profile_option, get_profile_from_context, debug_option | ||
|
|
||
| PROMPT_HOST = 'Databricks Host (should begin with https://)' | ||
| PROMPT_RESOURCE_ID = 'Resource/Workspace ID' |
| Databricks Python REST Client 2.0 for interacting with various services. | ||
|
|
||
| Currently supports services including clusters, clusters policies and jobs. | ||
| Currently supports services including clusters and jobs. |
There was a problem hiding this comment.
Let's revert changes to this file.
There was a problem hiding this comment.
Revered back to old file
| @@ -113,10 +113,10 @@ def perform_query(self, method, path, data = {}, headers = None): | |||
| if method == 'GET': | |||
| translated_data = {k: _translate_boolean_to_query_param(data[k]) for k in data} | |||
| resp = self.session.request(method, self.url + path, params = translated_data, | |||
There was a problem hiding this comment.
Is it possible to revert all changes to this file?
There was a problem hiding this comment.
Reverted back to old file
| ' scope is assigned to the request issuer\'s user identity.') | ||
| @click.option('--scope-backend-type', type=click.Choice(['AZURE_KEYVAULT', 'DATABRICKS'], case_sensitive=True), | ||
| default='DATABRICKS', help='The backend that will be used for this secret scope. ' | ||
| 'Options are (case-sensitive): 1) \'azure_keyvault\' and 2) \'databricks\' ' |
There was a problem hiding this comment.
This is very confusing, the message which shows up from help is this
--scope-backend-type [AZURE_KEYVAULT|DATABRICKS]
The backend that will be used for this
secret scope. Options are (case-sensitive):
1) 'azure_keyvault' and 2) 'databricks'
(default option) Note: To create an Azure
Keyvault, be sure to configure an AAD Token
using'databricks-cli configure --aad-token'
Which one is it? Lower case or upper case?
There was a problem hiding this comment.
It is uppercase - changed the message
| default='DATABRICKS', help='The backend that will be used for this secret scope. ' | ||
| 'Options are (case-sensitive): 1) \'azure_keyvault\' and 2) \'databricks\' ' | ||
| '(default option)' | ||
| '\nNote: To create an Azure Keyvault, be sure to configure an AAD Token using' |
There was a problem hiding this comment.
missing trailing space here!
| _data['timeout_seconds'] = timeout_seconds | ||
| return self.client.perform_query('POST', '/jobs/runs/submit', data=_data, headers=headers) | ||
|
|
||
There was a problem hiding this comment.
Seems like we are adding spaces. Is it desired?
There was a problem hiding this comment.
This file was autogenerated save for a few necessary changes, but nothing around this particular line was perturbed manually.
| @profile_option | ||
| @eat_exceptions | ||
| def secrets_group(): | ||
| def secrets_group(): # pragma: no cover |
There was a problem hiding this comment.
I think this was a result of resolving the merge conflict? Not my change, but certainly came from master
| return self.client.perform_query('POST', '/dbfs/create', data=_data, headers=headers) | ||
|
|
||
|
|
||
| def create_test(self, path, overwrite=None, headers=None): |
There was a problem hiding this comment.
Where did these set of functions come from?
There was a problem hiding this comment.
from autogeneration of this file - has nothing to do with my change. Seems like it was out of sync with changes made in universe.
| def create_cluster(self, num_workers=None, autoscale=None, cluster_name=None, spark_version=None, | ||
| spark_conf=None, aws_attributes=None, node_type_id=None, | ||
| driver_node_type_id=None, ssh_public_keys=None, custom_tags=None, | ||
| cluster_log_conf=None, init_scripts=None, spark_env_vars=None, |
There was a problem hiding this comment.
Seems like you are removing the init_scripts param and the following variable-setting. I'm wondering this is part of what PR intends, and if so, we need to update the descriptions & possibly split the PR I think.
There was a problem hiding this comment.
once again, this was a result of autogeneration, wasn't relevant to my changes at all. Should I revert this particular function back to it's original form?
| return self.client.perform_query('GET', '/dbfs/read', data=_data, headers=headers) | ||
|
|
||
|
|
||
| def read_test(self, path, offset=None, length=None, headers=None): |
There was a problem hiding this comment.
why do we need /dbfs-testing for?
|
this PR doesn't cover token refresh functionality. |
This was removed in [this PR][!325] by autogeneration, but the /clusters/create API endpoint should still accept the init_scripts argument, and it's useful for creating clusters programmatically. I'm not sure how autogeneration works so this could happen again in future - perhaps it should be looked into somewhere upstream instead, but for now this would be useful! [!325]: databricks#325 (comment)
|
|
||
| def create_scope(self, scope, initial_manage_principal): | ||
| return self.client.create_scope(scope, initial_manage_principal) | ||
| def create_scope(self, scope, initial_manage_principal, scope_backend_type, |
There was a problem hiding this comment.
In the future, please try to avoid making breaking API changes. These new required positional arguments break anything that use this API.
Prefer to add new arguments with default values that retain the previous functionality, or instead add new methods if that cannot be done.
Added a new option to the databricks-cli wherein users can run
databricks-cli configure --az-tokenand add an az-token to the configuration. Environment variables with the bearer token (DATABRICKS_TOKEN) and the az-token (DATABRICKS_AZ_TOKEN) need to be set.