Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter: performance improvements for big workspaces #3167

Merged
merged 18 commits into from
Jan 31, 2024

Conversation

alexott
Copy link
Contributor

@alexott alexott commented Jan 26, 2024

Changes

Performance improvements for the exporter:

  • Parallel generation of resources: for each identified resource we're generating its body separately from other resources using the EXPORTER_RESOURCE_GENERATORS (default is 50) goroutines - this helps with processing references to other resources. Generated bodies are sent to dedicated channels that are responsible for writing the code into files.
  • stateApproximation is reimplemented to avoid linear search, and also locking the whole structure. We'll get a per-resource list of resource approximations together with the direct lookup structure that is used in the Has call.
  • importContext.Find is optimized to perform the direct lookup for most of the match types. Only prefix match and some cases of case-insensitive match will lead to iteration over approximations of the given type.
  • Dedicated channels are now created only for specific resources (SCIM-related right now) - could be overridden with the EXPORTER_DEDICATED_RESOUSE_CHANNELS environment variable. The rest is handled by the shared channel, so we'll have better resource utilization. The shared channel by default is handled by 15 goroutines, but this could be adjusted with the EXPORTER_PARALLELISM_default environment variable.
  • Added the -trace command-line flag to enable trace logging level.
  • For notebooks, we're not waiting for the listing to finish, but instead, emit from the visitor function (that is also done indirectly to avoid blocking on the SCIM API)

Tests

  • make test run locally
  • relevant change in docs/ folder
  • covered with integration tests in internal/acceptance
  • relevant acceptance tests are passing
  • using Go SDK

@alexott alexott requested review from a team as code owners January 26, 2024 16:42
@alexott alexott requested review from hectorcast-db and removed request for a team January 26, 2024 16:42
@alexott alexott changed the title Exporter: performance improvements for big workspaces WIP: Exporter: performance improvements for big workspaces Jan 26, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2024

Codecov Report

Attention: 74 lines in your changes are missing coverage. Please review.

Comparison is base (86d78e2) 83.16% compared to head (9b31ab5) 83.21%.
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3167      +/-   ##
==========================================
+ Coverage   83.16%   83.21%   +0.05%     
==========================================
  Files         168      168              
  Lines       14811    15013     +202     
==========================================
+ Hits        12317    12493     +176     
- Misses       1770     1786      +16     
- Partials      724      734      +10     
Files Coverage Δ
workspace/resource_notebook.go 91.11% <100.00%> (+2.34%) ⬆️
exporter/importables.go 78.70% <80.00%> (-0.02%) ⬇️
exporter/command.go 77.77% <57.14%> (-1.39%) ⬇️
exporter/model.go 82.20% <80.55%> (-3.52%) ⬇️
exporter/util.go 79.93% <82.55%> (+1.31%) ⬆️
exporter/context.go 81.88% <86.30%> (+2.21%) ⬆️

... and 1 file with indirect coverage changes

@alexott alexott changed the title WIP: Exporter: performance improvements for big workspaces Exporter: performance improvements for big workspaces Jan 29, 2024
@alexott alexott changed the title Exporter: performance improvements for big workspaces WIP: Exporter: performance improvements for big workspaces Jan 29, 2024
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but one small suggestion to refactor some methods to the exporter package to decrease friction when changing them.

workspace/resource_notebook.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@alexott alexott changed the title WIP: Exporter: performance improvements for big workspaces Exporter: performance improvements for big workspaces Jan 29, 2024
The first part of performance improvements - parallel generation of resources:

For each identified resource we're generating its body separately from other resources
using the `EXPORTER_RESOURCE_HANDLERS` (default is 50) goroutines - this helps with
processing references to other resources (although we still have some performance problems
for complex resources, like, jobs, but it will be improved in next commits).  Generated
bodies are sent to a dedicated channels that are responsible for writing the code into
files.

Next steps:

- reimplement `resourceApproximation` structure to avoid linear search.
- optimization of reference search.
…ookup

Prefix lookup & case-insensitive lookup is still done by iterating over resources
…ect lookup

Also, for case-insensitive matching, first try to do direct lookup before iteration
* Introduce lightweight check for user existence
* Emit workspace objects from separate goroutines to avoid workspace listing stuck on
  users lookup
It checks if a service is enabled before performing other checks - this should decrease
the number of lookups in the state approximation
@alexott alexott force-pushed the exporter-performance-improvements branch from 7aa9450 to 9b31ab5 Compare January 31, 2024 12:05
@alexott alexott added this pull request to the merge queue Jan 31, 2024
Merged via the queue into main with commit 510fd60 Jan 31, 2024
5 checks passed
@alexott alexott deleted the exporter-performance-improvements branch January 31, 2024 12:29
tanmay-db added a commit that referenced this pull request Feb 5, 2024
### New Features and Improvements
* Exporter: timestamps are now added to log entries ([#3146](#3146)).
* Validate metastore id for databricks_grant and databricks_grants resources ([#3159](#3159)).
* Exporter: Skip emitting of clusters that come from more cluster sources ([#3161](#3161)).
* Fix typo in docs ([#3166](#3166)).
* Migrate cluster schema to use the go-sdk struct ([#3076](#3076)).
* Introduce Generic Settings Resource ([#2997](#2997)).
* Update actions/setup-go to v5 ([#3154](#3154)).
* Change default branch from `master` to `main` ([#3174](#3174)).
* Add .codegen.json configuration ([#3180](#3180)).
* Exporter: performance improvements for big workspaces ([#3167](#3167)).
* update ([#3192](#3192)).
* Exporter: fix generation of cluster policy resources ([#3185](#3185)).
* Fix unit test ([#3201](#3201)).
* Suppress diff should apply to new fields added in the same chained call to CustomizableSchema ([#3200](#3200)).
* Various documentation updates ([#3198](#3198)).
* Use common.Resource consistently throughout the provider ([#3193](#3193)).
* Extending customizable schema with `AtLeastOneOf`, `ExactlyOneOf`, `RequiredWith` ([#3182](#3182)).
* Fix `databricks_connection` regression when creating without owner ([#3186](#3186)).
* add test code for job task order ([#3183](#3183)).
* Allow using empty strings as job parameters ([#3158](#3158)).
* Fix notebook parameters in acceptance test ([#3205](#3205)).
* Exporter: Add retries for `Search`, `ReadContext` and `Import` operations when importing the resource ([#3202](#3202)).
* Fixed updating owners for UC resources ([#3189](#3189)).
* Adds `databricks_volumes` as data source  ([#3150](#3150)).

### Documentation Changes

### Exporter

### Internal Changes
@tanmay-db tanmay-db mentioned this pull request Feb 5, 2024
github-merge-queue bot pushed a commit that referenced this pull request Feb 6, 2024
* Release v1.35.1

### New Features and Improvements
* Exporter: timestamps are now added to log entries ([#3146](#3146)).
* Validate metastore id for databricks_grant and databricks_grants resources ([#3159](#3159)).
* Exporter: Skip emitting of clusters that come from more cluster sources ([#3161](#3161)).
* Fix typo in docs ([#3166](#3166)).
* Migrate cluster schema to use the go-sdk struct ([#3076](#3076)).
* Introduce Generic Settings Resource ([#2997](#2997)).
* Update actions/setup-go to v5 ([#3154](#3154)).
* Change default branch from `master` to `main` ([#3174](#3174)).
* Add .codegen.json configuration ([#3180](#3180)).
* Exporter: performance improvements for big workspaces ([#3167](#3167)).
* update ([#3192](#3192)).
* Exporter: fix generation of cluster policy resources ([#3185](#3185)).
* Fix unit test ([#3201](#3201)).
* Suppress diff should apply to new fields added in the same chained call to CustomizableSchema ([#3200](#3200)).
* Various documentation updates ([#3198](#3198)).
* Use common.Resource consistently throughout the provider ([#3193](#3193)).
* Extending customizable schema with `AtLeastOneOf`, `ExactlyOneOf`, `RequiredWith` ([#3182](#3182)).
* Fix `databricks_connection` regression when creating without owner ([#3186](#3186)).
* add test code for job task order ([#3183](#3183)).
* Allow using empty strings as job parameters ([#3158](#3158)).
* Fix notebook parameters in acceptance test ([#3205](#3205)).
* Exporter: Add retries for `Search`, `ReadContext` and `Import` operations when importing the resource ([#3202](#3202)).
* Fixed updating owners for UC resources ([#3189](#3189)).
* Adds `databricks_volumes` as data source  ([#3150](#3150)).

### Documentation Changes

### Exporter

### Internal Changes

* upd

* readable

* upd

* upd
@alexott alexott added the exporter TF configuration generator label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter TF configuration generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants