Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Single Node Clusters (issue 411) #454

Merged
merged 17 commits into from
Jan 22, 2021
Merged

Support for Single Node Clusters (issue 411) #454

merged 17 commits into from
Jan 22, 2021

Conversation

alexott
Copy link
Contributor

@alexott alexott commented Jan 4, 2021

this should fix #411

azcli status

  • TestAccClusterResource_CreateClusterWithLibraries (222.85s) <= has to be fixed
  • TestAccDatabricksDBFSFile_CreateViaSource (0.57s) <= can be ignored
  • TestAccDatabricksPermissionsResourceFullLifecycle (0.80s) <= can be ignored
  • TestAccDeleteInvalidMountFails (11.86s) <= could you test if we can make mounts from single node clusters?...
  • TestAccGroupMemberResource (17.73s)
  • TestAccSecretAclResourceDefaultPrincipal (63.33s)
  • TestAccSecretResource (18.47s)
  • TestAccServicePrincipalResource (10.71s)
  • TestAccTokenResource (7.59s)
  • TestAzureAccClusterResource_CreateClusterViaInstancePool (267.38s)
  • access.TestAccPermissionsClusterPolicy (21.53s)
  • access.TestAccPermissionsClusters (297.11s)
  • access.TestAccPermissionsInstancePool (16.17s)
  • access.TestAccPermissionsJobs (17.79s)
  • access.TestAccPermissionsNotebooks (22.75s)
  • access.TestAccPermissionsTokens (14.98s)
  • access/acceptance.TestAccInitialManagePrincipals (3.04s)
  • access/acceptance.TestAccInitialManagePrincipalsGroup (2.53s)
  • access/acceptance.TestAccSecretAclResource (27.52s)
  • access/acceptance.TestAccSecretScopeResource (22.91s)
  • compute.TestAccContext (55.76s)
  • compute.TestAccInstancePools (5.62s)
  • compute.TestAccLibraryCreate (68.14s)
  • compute.TestAccListClustersIntegration (262.62s)
  • compute.TestAzureAccNodeTypes (2.77s)
  • compute/acceptance.TestAccClusterPolicyResourceFullLifecycle (34.33s)
  • compute/acceptance.TestAccJobResource (10.51s)
  • identity.TestAccCreateToken (4.30s)
  • identity.TestAccCreateUser (11.61s)
  • identity.TestAccFilterGroup (4.76s)
  • identity.TestAccGroup (26.48s)
  • identity.TestAccReadUser (4.26s)
  • identity/acceptance.TestAccGroupResource (35.01s)
  • identity/acceptance.TestAccGroupResource_verify_entitlements (25.96s)
  • identity/acceptance.TestAccUserResource (26.98s)
  • internal/sanity.TestAccMissingResourcesInWorkspace (5.60s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Cluster_Policies (0.46s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Cluster_Policies_Delete (0.50s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Clusters (0.23s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/DBFS_Files (0.63s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Groups (0.69s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Instance_Pools (0.24s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Jobs (0.21s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Notebooks (0.23s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Secret_ACLs (0.22s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Secret_Scopes (0.16s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Secrets (0.26s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Tokens (1.11s)
  • internal/sanity.TestAccMissingResourcesInWorkspace/Users (0.66s)
  • internal/sanity.TestAccMutiworkspaceUsedFromNormalMode (6.62s)
  • storage.TestAccCreateFile (15.12s)
  • storage.TestAccInvalidSecretScopeFails (6.15s)
  • storage.TestAccSourceOnInvalidMountFails (33.68s)
  • storage/acceptance.TestAccDatabricksDBFSFile_CreateViaContent (29.04s)
  • workspace/acceptance.TestAccNotebookResourceScalability (24.50s)
  • workspace/acceptance.TestAccWorkspaceConfFullLifecycle (11.47s)

compute/resource_job.go Outdated Show resolved Hide resolved
@nfx
Copy link
Contributor

nfx commented Jan 4, 2021

@alexott could you attach the output of test.out file with test report from make test-azcli or make test-azsp?

@codecov
Copy link

codecov bot commented Jan 4, 2021

Codecov Report

Merging #454 (5d94130) into master (7a01703) will increase coverage by 0.05%.
The diff coverage is 68.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #454      +/-   ##
==========================================
+ Coverage   81.20%   81.26%   +0.05%     
==========================================
  Files          68       68              
  Lines        5193     5236      +43     
==========================================
+ Hits         4217     4255      +38     
- Misses        655      660       +5     
  Partials      321      321              
Impacted Files Coverage Δ
compute/model.go 76.31% <ø> (ø)
compute/resource_cluster_policy.go 84.84% <ø> (ø)
storage/adls_gen1_mount.go 100.00% <ø> (ø)
storage/adls_gen2_mount.go 100.00% <ø> (ø)
storage/azure_blob_mount.go 94.91% <ø> (ø)
storage/mounts.go 67.60% <0.00%> (-6.82%) ⬇️
compute/resource_cluster.go 72.64% <100.00%> (+4.43%) ⬆️
compute/resource_job.go 85.08% <100.00%> (+5.87%) ⬆️

@alexott
Copy link
Contributor Author

alexott commented Jan 4, 2021

Running the tests again (this will take some time) - there are failures that aren't directly related to this PR...

@alexott
Copy link
Contributor Author

alexott commented Jan 4, 2021

test-report.log

Terraform 0.14 produces the warning

```
Terraform 0.13 and earlier allowed provider version constraints inside the
provider configuration block, but that is now deprecated and will be removed
in a future version of Terraform. To silence this warning, move the provider
version constraint into the required_providers block.
```

the change should work on Terraform 0.12 as well
@alexott
Copy link
Contributor Author

alexott commented Jan 4, 2021

One thing is that with current design, the num_workers is always in the JSON payload, even if autoscale is specified. But autoscale always wins. This could be reworked into declaring the ClusterNumWorkers as pointer to int32 and using omitempty, but then when we'll need to write everywhere:

numWorkers := int32(X)
Cluster{NumWorkers: &numWorkers....}

@nfx WDYT?

compute/acceptance/job_test.go Outdated Show resolved Hide resolved
compute/acceptance/job_test.go Outdated Show resolved Hide resolved
compute/acceptance/job_test.go Outdated Show resolved Hide resolved
compute/resource_cluster.go Show resolved Hide resolved
compute/resource_job.go Outdated Show resolved Hide resolved
compute/resource_job_test.go Outdated Show resolved Hide resolved
compute/resource_job_test.go Outdated Show resolved Hide resolved
compute/resource_job_test.go Outdated Show resolved Hide resolved
@@ -1,5 +1,10 @@
terraform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work with terraform 0.12? :) plenty of customers still use two-release old version.

@nfx
Copy link
Contributor

nfx commented Jan 5, 2021

One thing is that with current design, the num_workers is always in the JSON payload, even if autoscale is specified. But autoscale always wins. This could be reworked into declaring the ClusterNumWorkers as pointer to int32 and using omitempty, but then when we'll need to write everywhere:

numWorkers := int32(X)
Cluster{NumWorkers: &numWorkers....}

@nfx WDYT?

it should ideally be handled with omitempty. I still don't understand why it doesn't work as expected. @pietern, you were programming GoLang full time for couple of years - do you know the intricacies of Go JSON marshalling and omitempty for integers?

@alexott , there are two options available

  1. Marshaller and Umarshaller implementation for compute.Cluster and compute.ClusterInfo (I guess it's more elegant and protected from errors)
  2. changing int to int pointer

if you decide to change it to int pointer, besides of changing 17 direct invocation places, you'd need add special handling in:

image

  1. StructToSchema - https://github.com/databrickslabs/terraform-provider-databricks/blob/master/internal/reflect_resource.go#L158
  2. check if StructToData still works - https://github.com/databrickslabs/terraform-provider-databricks/blob/master/internal/reflect_resource.go#L349
  3. DataToStructPointer - https://github.com/databrickslabs/terraform-provider-databricks/blob/master/internal/reflect_resource.go#L380

@pietern
Copy link
Contributor

pietern commented Jan 5, 2021

@nfx Looking at the Go tests, omitempty on an int field should work as expected, see encoding/json/encode_test.go. The field Optionals.Io will be zero-initialised and is not included in the encoded output.

@alexott
Copy link
Contributor Author

alexott commented Jan 5, 2021

@nfx it works as designed - if it's 0, then it's omitted, but it's not what we want - we need to output 0 if autoscale is not provided... I'll think about custom marshaller

@nfx
Copy link
Contributor

nfx commented Jan 5, 2021

@alexott i trust your years of C++ programming to make the best decision on int pointer vs custom marshaller for those structs :) it's equal amount of code, as far as i see it.

@alexott
Copy link
Contributor Author

alexott commented Jan 6, 2021

@nfx comments are addressed. I went with custom serialization of JSON for Cluster struct (and added more checks into it).
Tested with Terraform 0.12.29, 0.13.5

Makefile Outdated

install-tf-0.12: build
@echo "✓ Installing provider for Terraform 0.12 ..."
@mkdir -p '$(HOME)/.terraform.d/plugins/$(shell go version | awk '{print $$4}' | sed 's#/#_#')/'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tf12 doesn't actually need any dir version prefix. look at installation instructions:

curl https://raw.githubusercontent.com/databrickslabs/databricks-terraform/master/godownloader-databricks-provider.sh | bash -s -- -b $HOME/.terraform.d/plugins

my testing setup is: $PATH has 0.12 binary, terraform13 and terraform14 for other version as well.

Makefile Outdated
@echo "✓ Use the following configuration for Terraform 0.12 to enable the version you've built"
@echo ""
@echo 'provider "databricks" {'
@echo ' version = "0.3.0"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version pinning is not really needed for 0.12, 🤷 . it's going against installer script we provide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with current make install it doesn't fine provider, unfortunately...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after removing ~/.terraform.d ?...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it works if I do install like our install script, and remove the version. I'll remove this target completely...

repo = "https://mavencentral.org"
exclusions = ["slf4j:slf4j"]
library {
maven {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this :)

"sort"
"strings"
"time"

"github.com/databrickslabs/databricks-terraform/common"
"github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource"
"github.com/ulule/deepcopier"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need external dependency? can we do without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copying of the data using reflection is kind of black magic (for me), I've tried to implement it myself, but gave up after several tries - maybe if I spend more time, I could do it, but not sure that this is a best use of the time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly speaking, it works without this custom marshaling - it simply sends both num_workers & autoscale if both present, and control plane always prefer autoscale. I simply afraid that it may stop this way at some point of time, and we'll need to do hotfix release...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single node clusters aren’t major dbu bringers, so I’m okay with that risk.

Additional libraries just require more processes and adding it just to use in one place might be an overkill :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern around this marshalling - if API stop accept both num_worker & autoscale, this will affect everything, not just single node. If you're ok, I'll revert that commit.

@alexott alexott requested a review from nfx January 7, 2021 16:44
Copy link
Contributor

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge after running acceptance tests locally :)

@nfx nfx merged commit dacd91d into master Jan 22, 2021
@nfx nfx deleted the issue-411 branch January 22, 2021 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Single Node cluster
3 participants