Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] Cannot add instance pool or cluster after workspace was already created #1045

Closed
plieberg opened this issue Jan 19, 2022 · 18 comments
Closed
Labels
azure Occurring on Azure cloud lazy auth

Comments

@plieberg
Copy link

I’m writing an internal module for managing our Azure Databricks resources. The first iteration that simply created a workspace ran fine. However, I am now trying to add clusters and instance pools and running into an issue. To remove my module from the equation, I simply created some live code resources and am running into the same issue. As you can see, I've added the depends_on to the data resources as well as the cluster/instance_pool resources.

Configuration

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "eastus2"
}

resource "azurerm_databricks_workspace" "example" {
  name                = "databricks-test"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  sku                 = "premium"
  managed_resource_group_name = azurerm_resource_group.example.location

  custom_parameters {
    #Optional
    no_public_ip              = true
    virtual_network_id        = data.azurerm_virtual_network.dma_vnet.id
    public_subnet_network_security_group_association_id  = data.azurerm_subnet.dma_subnet_dbpub.id
    private_subnet_network_security_group_association_id = data.azurerm_subnet.dma_subnet_dbpri.id
     ## Required if virtual_network_id is defined
    public_subnet_name  = data.azurerm_subnet.dma_subnet_dbpri.name
    private_subnet_name = data.azurerm_subnet.dma_subnet_dbpub.name
  }
}
  
data "databricks_node_type" "smallest" {

  depends_on = [
    azurerm_databricks_workspace.example,
  ]
}

resource "databricks_instance_pool" "smallest_nodes" {
  instance_pool_name = "Smallest Nodes"
  min_idle_instances = 0
  max_capacity       = 300
  node_type_id       = data.databricks_node_type.smallest.id
  idle_instance_autotermination_minutes = 10
  disk_spec {
    disk_type {
      azure_disk_volume_type = "PREMIUM_LRS"
    }
    disk_size  = 80
    disk_count = 1
  }

  depends_on = [
    azurerm_databricks_workspace.example,
  ]
}

Expected Behavior

What should have happened?
Instance pool is created

Actual Behavior

What actually happened?
Errors, see below

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform plan or terraform apply

Terraform and provider versions

Terraform v1.0.4
on linux_amd64
+ provider registry.terraform.io/databrickslabs/databricks v0.4.5
+ provider registry.terraform.io/hashicorp/azuread v2.15.0
+ provider registry.terraform.io/hashicorp/azurerm v2.92.0
+ provider registry.terraform.io/hashicorp/external v2.2.0
+ provider registry.terraform.io/hashicorp/http v2.1.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/time v0.7.2
+ provider registry.terraform.io/hashicorp/vault v3.1.1

Please paste the output of terraform version. If version of databricks provider is not the latest (https://github.com/databrickslabs/terraform-provider-databricks/releases), please make sure to use the latest one.

Debug Output

Please add turn on logging, e.g. TF_LOG=DEBUG terraform apply and run command again, paste it to gist & provide the link to gist. If you're still willing to paste in log output, make sure you provide only relevant log lines with requests.

It would make it more readable, if you pipe the log through | grep databricks | sed -E 's/^.* plugin[^:]+: (.*)$/\1/', e.g.:

This is the error during the plan step, if I try to create a cluster resource:

Error: workspace is most likely not created yet, because the `host` is empty. Please add `depends_on = [databricks_mws_workspaces.this]` or `depends_on = [azurerm_databricks_workspace.this]` to every data resource. See https://www.terraform.io/docs/language/resources/behavior.html more info. Please check https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs#authentication for details

  with data.databricks_spark_version.latest_lts,
  on databricks_workspace_pll.tf line 32, in data "databricks_spark_version" "latest_lts":
  32: data "databricks_spark_version" "latest_lts" {

This is the error from an apply if I try to add an instance_pool resource:

Error: cannot create instance pool: authentication is not configured for provider.. Please check https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs#authentication for details

  with databricks_instance_pool.smallest_nodes,
  on databricks_workspace_pll.tf line 32, in resource "databricks_instance_pool" "smallest_nodes":
  32: resource "databricks_instance_pool" "smallest_nodes" {

If Terraform produced a panic, please provide a link to a GitHub Gist containing the output of the crash.log.

Important Factoids

Are there anything atypical about your accounts that we should know?

@nfx
Copy link
Contributor

nfx commented Jan 19, 2022

as a workaround for now, please first apply the workspace creation, and then resources within it.

I'm not even sure this issue can be fixed.

@nfx nfx added the lazy auth label Jan 19, 2022
@callppatel
Copy link

We are getting error message

Error: cannot read cluster: cannot configure azure-client-secret auth: cannot get workspace: somehow resource id is not set. Environment variables used: ARM_CLIENT_SECRET, ARM_CLIENT_ID, ARM_TENANT_ID. Please check https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs#authentication for details

This was working fine, now we are using azurerm provider hashicorp/azurerm v2.92.0 and databrickslabs/databricks v0.4.4

This could be related.

@nfx
Copy link
Contributor

nfx commented Jan 19, 2022

@callppatel You need to set azure resource id provider configuration property, because you don't have it set now, according to error message. That's different issue. Probably need to change error message to be more clear. Can you create a separate issue, where you also confirm that setting https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs#azure_workspace_resource_id solves it? I'll update documentation as well. Resource id is needed only on the first request of service principal to a Databricks workspace

@callppatel
Copy link

@callppatel You need to set azure resource id provider configuration property, because you don't have it set now, according to error message. That's different issue. Probably need to change error message to be more clear. Can you create a separate issue, where you also confirm that setting https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs#azure_workspace_resource_id solves it? I'll update documentation as well. Resource id is needed only on the first request of service principal to a Databricks workspace

I had earlier

provider "databricks" {
  azure_workspace_resource_id = module.azure_databricks.databricks_id
}

Which I changed to

provider "databricks" {
  host = module.azure_databricks.databricks_workspace_url
}

Shall I revert and try

@nfx
Copy link
Contributor

nfx commented Jan 19, 2022

@callppatel why did you remove resource id? Is there another misleading doc somewhere I need to fix?..

@callppatel
Copy link

@callppatel why did you remove resource id? Is there another misleading doc somewhere I need to fix?..

I was getting message that it will be deprecated soon, so thought of changing it. Shall I keep resource_id or both. Thanks for your help

@nfx
Copy link
Contributor

nfx commented Jan 19, 2022

Both

@aravishdatabricks
Copy link

Got a similar error

│ Error: cannot create instance pool: Worker environment overlay workerenv-8245452661638664 not found temporarily, please contact databricks support if the issue persist.

│ with databricks_instance_pool.smallest_nodes,
│ on main.tf line 77, in resource "databricks_instance_pool" "smallest_nodes":
│ 77: resource "databricks_instance_pool" "smallest_nodes" {

@callppatel
Copy link

Both

Still same issue, after setting both

provider "databricks" {
  host                        = module.azure_databricks.databricks_workspace_url
  azure_workspace_resource_id = module.azure_databricks.databricks_id
}

Error: cannot read cluster: cannot configure azure-client-secret auth: cannot get workspace: somehow resource id is not set. Environment variables used: ARM_CLIENT_SECRET, ARM_CLIENT_ID, ARM_TENANT_ID. Please check https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs#authentication for details

As suggested, I have logged another ticket.. thanks for checking for us.
github.com/databrickslabs/terraform-provider-databricks/issues/1049

@aravishdatabricks
Copy link

@callppatel This seems to be happening with azurerm 2.92, can you please use a different version of azurerm module (Ex: 2.78) as a workaround?

@callppatel
Copy link

@callppatel This seems to be happening with azurerm 2.92, can you please use a different version of azurerm module (Ex: 2.78) as a workaround?

Thank you ! My TF plan is crossed now with 2.78 version.

@plieberg
Copy link
Author

as a workaround for now, please first apply the workspace creation, and then resources within it.

I'm not even sure this issue can be fixed.

I have tried doing this layered approach to no avail. I have also tried changing the authentication from MSI to "azure with service principal" and that also did not help.

@plieberg
Copy link
Author

I also tried using azurerm 2.78 as mentioned above and that did not solve my problem.

@nfx
Copy link
Contributor

nfx commented Jan 21, 2022

@plieberg What errors did you get?

@plieberg
Copy link
Author

I will clarify. Using, azurerm 2.78, I was able to manually create these resources by just using live resource blocks and creating the workspace first and then an instance_pool on a second plan/apply.

I have tried to do this same thing using my module code and it still fails. So confused.

This is the latest error, which is the same thing I see when trying to do the apply to add the instance_pool.


  with module.test_module_01.databricks_instance_pool.pool_nodes["test01"],
  on .terraform/modules/test_module_01/main.tf line 92, in resource "databricks_instance_pool" "pool_nodes":
  92: resource "databricks_instance_pool" "pool_nodes" {

@nfx nfx closed this as completed Jan 26, 2022
@CSummersbyAltius
Copy link

CSummersbyAltius commented Jan 26, 2022

@callppatel This seems to be happening with azurerm 2.92, can you please use a different version of azurerm module (Ex: 2.78) as a workaround?

What is the solution if using functionality in azurerm that requires module version >2.78?

Can confirm that downgrading the module version to 2.78 does appear to remove the issue and yet I am having problems using azurerm module attributes elsewhere in my code (namely 'fqdns' attribute in azurerm_data_factory_managed_private_endpoint resource).

@usv-andyplamann
Copy link

I was able to get around the issue but putting an explicit depends_on block in each resource and data source using the databricks provider that references the azurerm databricks workspace.

@tbugfinder
Copy link

I think upgrade of azure-cli fixed it on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure Occurring on Azure cloud lazy auth
Projects
None yet
Development

No branches or pull requests

7 participants