Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] Notebooks content always seen as different even when no file has changed #42

Closed
damoodamoo opened this issue May 6, 2020 · 14 comments · Fixed by #111
Closed
Assignees
Labels
azure Occurring on Azure cloud bug Something isn't working

Comments

@damoodamoo
Copy link

Terraform Version

Terraform v0.12.24
+ provider.azurerm v1.44.0
+ provider.databricks v0.1.0

Affected Resource(s)

Please list the resources as a list, for example:

  • databricks_notebook

Terraform Configuration Files

resource "databricks_notebook" "notebook" {
  content = filebase64("${path.module}/nb/notebook1.scala")
  path = "/Shared/Notebooks/notebook1.scala"
  overwrite = false
  mkdirs = true
  format = "SOURCE"
  language = "SCALA"
}

Debug Output

On first run, the resource is seen as an add, correctly, and deploys.
On subsequent tf apply it sees the content has having changed:

  # databricks_notebook.notebook must be replaced
-/+ resource "databricks_notebook" "notebook" {
      ~ content     = "3750311991" -> "2327128740" # forces replacement
        format      = "SOURCE"
      ~ id          = "/Shared/Notebooks/notebook1.scala" -> (known after apply)
        language    = "SCALA"
        mkdirs      = true
      ~ object_id   = 4081355166030977 -> (known after apply)
      ~ object_type = "NOTEBOOK" -> (known after apply)
        overwrite   = false
        path        = "/Shared/Notebooks/notebook1.scala"
    }

This also exacerbates the issue #41 when deploying multiple files as it's trying to re-create all of them every time.

Expected Behavior

The content should be seen as the same and no-op

Actual Behavior

The content is seen as different and a delete/create is needed

Steps to Reproduce

Use the above hcl to deploy a notebook, then run tf apply again.

@stikkireddy
Copy link
Contributor

stikkireddy commented May 7, 2020

Hey @damoodamoo thanks for pointing this out, the issue seems to be that there is no normalization factor for the notebooks before the checksum is calculated. Unfortunately, across different clouds and potentially different Databricks deployment versions, what you import and what you export may not yield the same values.

Below is a potential solution that seems to work for me for non DBC notebooks.

func normalizeB64Data(b64Content string) string {
	dataArr, _ := base64.StdEncoding.DecodeString(b64Content)
	scalaMagic := "// MAGIC "
	pythonRMagic := "# MAGIC "
	sqlMagic := "-- MAGIC "
	filteredDataArr := filter(strings.Split(string(dataArr), "\n"), func(s string) bool {
		trimmedS := strings.TrimRight(s, " ")
		scalaMagicStatements := trimmedS != strings.Trim(scalaMagic, " ")
		pythonRMagicStatements := trimmedS != strings.Trim(pythonRMagic, " ")
		sqlMagicStatements := trimmedS != strings.Trim(sqlMagic, " ")
		ignoreScalaCmd := trimmedS != "// COMMAND ----------"
		ignorePythonRCmd := trimmedS != "# COMMAND ----------"
		ignoreSqlCmd := trimmedS != "-- COMMAND ----------"
		ignoreNotebookHeader := !strings.HasSuffix(trimmedS, "Databricks notebook source")
		return scalaMagicStatements && pythonRMagicStatements && sqlMagicStatements &&
			ignoreScalaCmd && ignorePythonRCmd && ignoreSqlCmd && ignoreNotebookHeader && trimmedS != ""
	})
	transformedDataArr := transform(filteredDataArr, func(s string) string {
		if strings.HasPrefix(s, scalaMagic) {
			return strings.TrimPrefix(s, scalaMagic)
		}
		if strings.HasPrefix(s, pythonRMagic) {
			return strings.TrimPrefix(s, pythonRMagic)
		}
		if strings.HasPrefix(s, sqlMagic) {
			return strings.TrimPrefix(s, sqlMagic)
		}
		return s
	})
	newFileData := strings.Join(transformedDataArr, "\n")
	log.Println(newFileData)
	return base64.StdEncoding.EncodeToString([]byte(newFileData))
}

@stikkireddy stikkireddy added the bug Something isn't working label May 7, 2020
@damoodamoo
Copy link
Author

Hi - thanks for looking into this - do you see something like this making it's way into the code?

Also noticed the tests for notebooks were all removed - were they problematic? I'm happy to look into integrating your code and getting a first pass as some tests ready in a PR, but wanted to check what your plans are/were for it perhaps.

Cheers!

@stikkireddy
Copy link
Contributor

Ahh, 😳 it was a copy and paste issue, I deleted the tests because it was copy and pasted. Did not want to confuse the reader into thinking that there was coverage. If you want you can take this, if not I can pick this up as it seems. But yes this should be something that makes it to the code and has a reasonable test coverage of different format types.

Please let me know if you will also be implementing folders as well if not I will pick that up. To address #41.

@stikkireddy
Copy link
Contributor

missed these two functions in the normalizeMethodAbove:

func filter(ss []string, test func(string) bool) (ret []string) {
	for _, s := range ss {
		if test(s) {
			ret = append(ret, s)
		}
	}
	return
}

func transform(ss []string, t func(string) string) (ret []string) {
	for _, s := range ss {
		ret = append(ret, t(s))
	}
	return
}

@damoodamoo
Copy link
Author

damoodamoo commented May 7, 2020

Quick update from me - given the current project time constraints I need to focus on an approach using the CLI to do notebook deployment, so won't be able to work on this or #41 imminently. I'm hoping i'll get time to circle back after the initial deadlines are met though. Thanks again for looking into them.

@stikkireddy
Copy link
Contributor

stikkireddy commented May 7, 2020

Hey no problem! I will pick up #41 and this issue as it would need to be resolved for other customers. I have a working solution, just needs acceptance tests. I hope you do not mind

@mikemowgli
Copy link

I confirm the problem and have the same issue on Azure Databricks for SOURCE notebooks.

@damoodamoo
Copy link
Author

Hi there @stikkireddy - is this still open as-is? I have some time to look into it now if that would still be helpful...

@stikkireddy
Copy link
Contributor

@damoodamoo hey sorry about that I am making the pr in the afternoon regarding this, I lumped a few notebook issues together. with integration testing.

@damoodamoo
Copy link
Author

awesome - does any of that cover the 429 issue too (#41) ?

@stikkireddy
Copy link
Contributor

yep it does

@stikkireddy stikkireddy linked a pull request Jun 16, 2020 that will close this issue
@nfx
Copy link
Contributor

nfx commented Jun 24, 2020

@stikkireddy i'd suggest not to track notebook content at all for now and always replace the notebook at every run. this will simplify resource and behaviour would be consistent with Stack CLI

@stikkireddy
Copy link
Contributor

@nfx So to do that there are three options:

  1. Stop support of this resource entirely and suggest to use provisioners (Not a great idea)
  2. Only support source type which makes it very easy to identify difference. (I am leaning towards this as DBC is also very annoying to resolve) (this will make the resource very simple to program & read)
  3. Accept that the resource will always identify a difference (I am against this option, may as well use local-provisioner and stack cli to manage this at that point)

@nfx
Copy link
Contributor

nfx commented Jun 25, 2020

Closing this as duplicate of #111

@nfx nfx closed this as completed Jun 25, 2020
@nfx nfx added the azure Occurring on Azure cloud label Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure Occurring on Azure cloud bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants