Skip to content
This repository has been archived by the owner on Oct 5, 2022. It is now read-only.

Obtaining Databricks credentials? #165

Closed
jvandenbos opened this issue Aug 5, 2020 · 11 comments
Closed

Obtaining Databricks credentials? #165

jvandenbos opened this issue Aug 5, 2020 · 11 comments
Labels
bug Something isn't working upstream waiting for a fix in one of the dependencies

Comments

@jvandenbos
Copy link

Question -- I'm getting an error running the tf plan, and am confused where one obtains Databricks credentials/tokens (in Azure?). Here's the error I am seeing:

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.http.current_ip[0]: Refreshing state...
data.azurerm_client_config.current: Refreshing state...


Error: failed to get credentials from config file; error msg: Authentication is not configured for provider. Please configure it
through one of the following options:

  1. DATABRICKS_HOST + DATABRICKS_TOKEN environment variables.
  2. host + token provider arguments.
  3. Run databricks configure --token that will create /root/.databrickscfg file.

Please check https://docs.databricks.com/dev-tools/cli/index.html#set-up-authentication for details

on databricks.tf line 41, in provider "databricks":
41: provider "databricks" {

@w0ut0
Copy link

w0ut0 commented Aug 5, 2020

Did you successfully provision the file in /tmp/databricks_token.txt that should contain your token?

https://github.com/datarootsio/terraform-module-azure-datalake/blob/master/databricks.tf#L18

@sdebruyn
Copy link
Member

sdebruyn commented Aug 5, 2020

That file should be created automatically. I think this issue is caused by this one: databricks/terraform-provider-databricks#128

@jvandenbos
Copy link
Author

Thanks all. There was another bug (will post as separate issue) with the Databricks provider: #168

Going to see if rolling back to a previous version of the provider might help in the interim. More to follow.

@sdebruyn
Copy link
Member

sdebruyn commented Aug 5, 2020

On a project where I'm using this module, I disabled all Databricks features using provision_databricks = false

In my code I have the following:

resource "azurerm_databricks_workspace" "dbks" {
  name                        = "dbks${local.name}"
  resource_group_name         = azurerm_resource_group.rg.name
  location                    = local.region
  sku                         = "standard"
}

resource "null_resource" "databricks_token" {
  triggers = {
    build_number = timestamp()
  }

  provisioner "local-exec" {
    command = "${path.module}/files/generate_databricks_token.sh > /tmp/databricks_token.txt"
    environment = {
      DATABRICKS_WORKSPACE_RESOURCE_ID = azurerm_databricks_workspace.dbks.id
      DATABRICKS_ENDPOINT              = format("https://%s", azurerm_databricks_workspace.dbks.workspace_url)
    }
  }
}

data "local_file" "databricks_token" {
  depends_on = [null_resource.databricks_token]
  filename   = "/tmp/databricks_token.txt"
}

The mentioned script looks like this:

#!/bin/bash

set -e

function parse_input() {
  test -n "$DATABRICKS_WORKSPACE_RESOURCE_ID"
  test -n "$DATABRICKS_ENDPOINT"
}

function produce_output() {
  # Get a token for the global Databricks application.
  # The resource name is fixed and never changes.
  token_response=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d)
  token=$(jq .accessToken -r <<< "$token_response")

  # Get a token for the Azure management API
  token_response=$(az account get-access-token --resource https://management.core.windows.net/)
  azToken=$(jq .accessToken -r <<< "$token_response")

  api_response=$(curl -sf $DATABRICKS_ENDPOINT/api/2.0/token/create \
    -d "" \
    -H "Authorization: Bearer $token" \
    -H "X-Databricks-Azure-SP-Management-Token:$azToken" \
    -H "X-Databricks-Azure-Workspace-Resource-Id:$DATABRICKS_WORKSPACE_RESOURCE_ID")
  pat_token=$(jq .token_value -r <<< "$api_response")
  echo $pat_token
}

parse_input
produce_output

Then you can put the Databricks workspace and the token as output variables. Then further in your CI/CD pipeline you can add a new, separate Terraform stage that does all the Databricks config and reads its workspace and token from the remote state from the first stage.

All the Databricks resources provided by this module won't work, but you can copy the code from here and adapt where needed: https://github.com/datarootsio/terraform-module-azure-datalake/blob/master/databricks.tf

It's not an easy workaround, but it might be the easiest solution until they fix that issue. This workaround seems to be stable and I haven't experienced issues with it after using it daily for about a month.

@sdebruyn
Copy link
Member

sdebruyn commented Aug 5, 2020

I just noticed that even the above workaround doesn't work anymore if you're on the latest provider version. It tries to authenticate even when there are no resources to be created. It would probably be best to move all Databricks stuff to a submodule and then disable the submodule all together, but this is only possible in Terraform 0.13 or newer with the count for modules.

@jvandenbos
Copy link
Author

jvandenbos commented Aug 5, 2020

Got it working by using the Databricks Terraform provider v0.2, found here: https://github.com/databrickslabs/terraform-provider-databricks/releases, and manually installed into the ~/.terraform.d/plugins directory (removing the newer one), then re-running terraform init ; terraform plan [].

(At least Plan is working)

@sdebruyn
Copy link
Member

sdebruyn commented Aug 5, 2020

FYI you can get the latest version working by creating a dummy file, just found out...

cat <<EOF > $HOME/.databrickscfg
[DEFAULT]
host = https://northeurope.azuredatabricks.net/?o=0123
token = 0123
EOF

@jvandenbos
Copy link
Author

Awesome. Thank you.

@sdebruyn
Copy link
Member

Will be fixed with #176 (expected in the next few days)

@grgouveia
Copy link

Having this issue after performing a destroy.
It seems like the provider doesn't understand that there are no resources yet still try to authenticate to a non-existing workspace.

@sdebruyn
Copy link
Member

Will also be fixed when I do the upgrade, I have to remove the provider config from the module then.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working upstream waiting for a fix in one of the dependencies
Projects
None yet
Development

No branches or pull requests

4 participants