[PLAT-21956] Add az-token config functionality to db cli by sushi1998 · Pull Request #325 · databricks/databricks-cli

sushi1998 · 2020-09-22T20:19:53Z

Added a new option to the databricks-cli wherein users can run databricks-cli configure --az-token and add an az-token to the configuration. Environment variables with the bearer token (DATABRICKS_TOKEN) and the az-token (DATABRICKS_AZ_TOKEN) need to be set.

…ariable

mglee72 · 2020-09-22T20:43:48Z


 PROMPT_HOST = 'Databricks Host (should begin with https://)'
+PROMPT_RESOURCE_ID = 'Resource/Workspace ID'
+PROMPT_AZ_TOKEN = 'Azure Token'


nit: This field doesn't seem to be in use anymore.

PROMPT_AZ_TOKEN is not being used anymore - removed it

mglee72 · 2020-09-22T20:47:13Z

+    config = ProfileConfigProvider(profile).get_config() or DatabricksConfig.empty()
+    host = click.prompt(PROMPT_HOST, default=config.host, type=_DbfsHost())
+    token = os.environ.get('DATABRICKS_TOKEN')
+    az_token = os.environ.get('DATABRICKS_AZ_TOKEN')


Do we have any guard against missing/invalid tokens? Seems like if users do are not guided to use the envvar approach yet, and if they don't write the variable correctly, the CLI would just silently fail by passing over the AZ token login option.

Also, it would be worth it to actually check-in with the PMs that env-var based approach would be acceptable.

The CLI itself doesn't have any test against invalid tokens - it never did. Talked with PMs about env variable approach and they said it was acceptable

e.g. this means that AAD tokens are not going to be refreshed by our cli and we may get errors for commands running for too long? MSAL/ADAL libs provide token refresh functionality.

mglee72 · 2020-09-22T20:55:29Z

              ' principal for this option is the group "users", which contains all users in the'
              ' workspace. If not specified, the initial ACL with MANAGE permission applied to the'
              ' scope is assigned to the request issuer\'s user identity.')
+@click.option('--scope-backend-type', type=click.Choice(['azure_keyvault', 'databricks'], case_sensitive=True),


It would be important to clarify that one must log in with AZ tokens if the person desires to create an AKV backed scope. Also, I think we must print a user-friendly error message when an user attempts to create an AKV backed scope while not logged in with AZ tokens.

Added more detailed description saying that users will need AAD token when creating an AKV backed scope - but I'm not aware of any tests to check if users have configured an AAD token aside from maybe the size of the token (compared to the DB PAT). But that seems like a flimsy check, do you know of any others?

can we already create AKV scopes?

andrewmchen · 2020-09-22T23:54:51Z

+    config = ProfileConfigProvider(profile).get_config() or DatabricksConfig.empty()
+    host = click.prompt(PROMPT_HOST, default=config.host, type=_DbfsHost())
+
+    is_token_env_set = click.prompt(PROMPT_ENV_TOKEN, type=bool)


Is prompting in this function twice helpful? Can't we just check the ENV_TOKEN and ENV_AZ_TOKEN keys in environ without the prompt and save the user a couple of keystrokes?

Removed this and changed workflow

andrewmchen · 2020-09-25T00:36:17Z

 PROMPT_USERNAME = 'Username'
 PROMPT_PASSWORD = 'Password' #  NOQA
 PROMPT_TOKEN = 'Token' #  NOQA
+ENV_AAD_TOKEN = 'DATABRICKS_TOKEN'


This seems like a bug?

What's the bug here?

The name of the env var is DATABRICKS_TOKEN that's a bit confusing if we're asking for the AAD token right?

You're right. I thought about this and for the users, to make it as less confusing as possible, I think the var should be DATABRICKS_AAD_TOKEN even though we deal with it in the same as a DATABRICKS TOKEN in the backend. I've changed the variables names to make it consistent with it as well.

andrewmchen · 2020-09-25T00:36:31Z

 from databricks_cli.configure.config import profile_option, get_profile_from_context, debug_option

 PROMPT_HOST = 'Databricks Host (should begin with https://)'
+PROMPT_RESOURCE_ID = 'Resource/Workspace ID'


Remove please.

andrewmchen · 2020-09-25T00:36:58Z

 Databricks Python REST Client 2.0 for interacting with various services.

-Currently supports services including clusters, clusters policies and jobs.
+Currently supports services including clusters and jobs.


Let's revert changes to this file.

Revered back to old file

andrewmchen · 2020-09-25T00:37:24Z

@@ -113,10 +113,10 @@ def perform_query(self, method, path, data = {}, headers = None):
            if method == 'GET':
                translated_data = {k: _translate_boolean_to_query_param(data[k]) for k in data}
                resp = self.session.request(method, self.url + path, params = translated_data,


Is it possible to revert all changes to this file?

Reverted back to old file

andrewmchen · 2020-09-25T23:43:47Z

              ' scope is assigned to the request issuer\'s user identity.')
+@click.option('--scope-backend-type', type=click.Choice(['AZURE_KEYVAULT', 'DATABRICKS'], case_sensitive=True),
+              default='DATABRICKS', help='The backend that will be used for this secret scope. '
+                                         'Options are (case-sensitive): 1) \'azure_keyvault\' and 2) \'databricks\' '


This is very confusing, the message which shows up from help is this

--scope-backend-type [AZURE_KEYVAULT|DATABRICKS] The backend that will be used for this secret scope. Options are (case-sensitive): 1) 'azure_keyvault' and 2) 'databricks' (default option) Note: To create an Azure Keyvault, be sure to configure an AAD Token using'databricks-cli configure --aad-token'

Which one is it? Lower case or upper case?

It is uppercase - changed the message

andrewmchen · 2020-09-25T23:44:21Z

+              default='DATABRICKS', help='The backend that will be used for this secret scope. '
+                                         'Options are (case-sensitive): 1) \'azure_keyvault\' and 2) \'databricks\' '
+                                         '(default option)'
+                                         '\nNote: To create an Azure Keyvault, be sure to configure an AAD Token using'


missing trailing space here!

mglee72 · 2020-09-28T22:40:20Z

            _data['timeout_seconds'] = timeout_seconds
        return self.client.perform_query('POST', '/jobs/runs/submit', data=_data, headers=headers)
-
+    


Seems like we are adding spaces. Is it desired?

This file was autogenerated save for a few necessary changes, but nothing around this particular line was perturbed manually.

mglee72 · 2020-09-28T22:43:26Z

 @profile_option
 @eat_exceptions
-def secrets_group():
+def secrets_group():  # pragma: no cover


is it intended?

I think this was a result of resolving the merge conflict? Not my change, but certainly came from master

mglee72 · 2020-09-28T22:45:40Z

        return self.client.perform_query('POST', '/dbfs/create', data=_data, headers=headers)
-
+
+    def create_test(self, path, overwrite=None, headers=None):


Where did these set of functions come from?

from autogeneration of this file - has nothing to do with my change. Seems like it was out of sync with changes made in universe.

mglee72 · 2020-09-28T22:50:27Z

    def create_cluster(self, num_workers=None, autoscale=None, cluster_name=None, spark_version=None,
                       spark_conf=None, aws_attributes=None, node_type_id=None,
                       driver_node_type_id=None, ssh_public_keys=None, custom_tags=None,
-                       cluster_log_conf=None, init_scripts=None, spark_env_vars=None,


Seems like you are removing the init_scripts param and the following variable-setting. I'm wondering this is part of what PR intends, and if so, we need to update the descriptions & possibly split the PR I think.

once again, this was a result of autogeneration, wasn't relevant to my changes at all. Should I revert this particular function back to it's original form?

nfx · 2020-10-06T10:04:15Z

        return self.client.perform_query('GET', '/dbfs/read', data=_data, headers=headers)
-
+
+    def read_test(self, path, offset=None, length=None, headers=None):


why do we need /dbfs-testing for?

nfx · 2020-10-06T10:12:31Z

this PR doesn't cover token refresh functionality. az cli refreshes tokens every 10 minutes and tokens that are about to expire in 30 seconds will fail requests.

This was removed in [this PR][!325] by autogeneration, but the /clusters/create API endpoint should still accept the init_scripts argument, and it's useful for creating clusters programmatically. I'm not sure how autogeneration works so this could happen again in future - perhaps it should be looked into somewhere upstream instead, but for now this would be useful! [!325]: databricks#325 (comment)

marekbrysa · 2020-12-31T09:26:59Z


-    def create_scope(self, scope, initial_manage_principal):
-        return self.client.create_scope(scope, initial_manage_principal)
+    def create_scope(self, scope, initial_manage_principal, scope_backend_type,


In the future, please try to avoid making breaking API changes. These new required positional arguments break anything that use this API.
Prefer to add new arguments with default values that retain the previous functionality, or instead add new methods if that cannot be done.

sushi1998 added 2 commits September 22, 2020 09:46

added aad token functionality but aad token is only accepted as env v…

b7c7821

…ariable

ready for review

192b8d5

mglee72 suggested changes Sep 22, 2020

View reviewed changes

improved functionality for input prompts

246408a

andrewmchen reviewed Sep 22, 2020

View reviewed changes

sushi1998 added 3 commits September 23, 2020 10:34

changed workflow of accepting tokens

7547177

autogenerated sdk and removed azToken and resourceID

085adbf

added more detailed description for create-scope command

e2cff96

andrewmchen reviewed Sep 25, 2020

View reviewed changes

ready for review

888fc15

andrewmchen reviewed Sep 25, 2020

View reviewed changes

sushi1998 added 6 commits September 28, 2020 13:27

ready for review - reset sdk files to og and changed variable name

b10d948

addressed review comments

49f83ce

added extra space

78ca2a4

reverted version.py to og

e980cc4

resolved merge conflict

f0b9341

fixed lint errors

86352cd

mglee72 reviewed Sep 28, 2020

View reviewed changes

mglee72 approved these changes Sep 29, 2020

View reviewed changes

andrewmchen merged commit 22a6f38 into databricks:master Sep 30, 2020

nfx reviewed Oct 6, 2020

View reviewed changes

adampicwell mentioned this pull request Oct 8, 2020

Cluster init_scripts option no longer supported in SDK #335

Open

sd2k mentioned this pull request Nov 18, 2020

Re-add init_scripts to ClusterService.create_cluster #351

Closed

marekbrysa reviewed Dec 31, 2020

View reviewed changes

		_data['timeout_seconds'] = timeout_seconds
		return self.client.perform_query('POST', '/jobs/runs/submit', data=_data, headers=headers)

		return self.client.perform_query('POST', '/dbfs/create', data=_data, headers=headers)


		def create_test(self, path, overwrite=None, headers=None):

		return self.client.perform_query('GET', '/dbfs/read', data=_data, headers=headers)


		def read_test(self, path, offset=None, length=None, headers=None):

Conversation

sushi1998 commented Sep 22, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nfx commented Oct 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development