-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exporter: performance improvements for big workspaces #3167
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #3167 +/- ##
==========================================
+ Coverage 83.16% 83.21% +0.05%
==========================================
Files 168 168
Lines 14811 15013 +202
==========================================
+ Hits 12317 12493 +176
- Misses 1770 1786 +16
- Partials 724 734 +10
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but one small suggestion to refactor some methods to the exporter
package to decrease friction when changing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
The first part of performance improvements - parallel generation of resources: For each identified resource we're generating its body separately from other resources using the `EXPORTER_RESOURCE_HANDLERS` (default is 50) goroutines - this helps with processing references to other resources (although we still have some performance problems for complex resources, like, jobs, but it will be improved in next commits). Generated bodies are sent to a dedicated channels that are responsible for writing the code into files. Next steps: - reimplement `resourceApproximation` structure to avoid linear search. - optimization of reference search.
…ookup Prefix lookup & case-insensitive lookup is still done by iterating over resources
…ect lookup Also, for case-insensitive matching, first try to do direct lookup before iteration
… waiting its finished
* Introduce lightweight check for user existence * Emit workspace objects from separate goroutines to avoid workspace listing stuck on users lookup
It checks if a service is enabled before performing other checks - this should decrease the number of lookups in the state approximation
7aa9450
to
9b31ab5
Compare
### New Features and Improvements * Exporter: timestamps are now added to log entries ([#3146](#3146)). * Validate metastore id for databricks_grant and databricks_grants resources ([#3159](#3159)). * Exporter: Skip emitting of clusters that come from more cluster sources ([#3161](#3161)). * Fix typo in docs ([#3166](#3166)). * Migrate cluster schema to use the go-sdk struct ([#3076](#3076)). * Introduce Generic Settings Resource ([#2997](#2997)). * Update actions/setup-go to v5 ([#3154](#3154)). * Change default branch from `master` to `main` ([#3174](#3174)). * Add .codegen.json configuration ([#3180](#3180)). * Exporter: performance improvements for big workspaces ([#3167](#3167)). * update ([#3192](#3192)). * Exporter: fix generation of cluster policy resources ([#3185](#3185)). * Fix unit test ([#3201](#3201)). * Suppress diff should apply to new fields added in the same chained call to CustomizableSchema ([#3200](#3200)). * Various documentation updates ([#3198](#3198)). * Use common.Resource consistently throughout the provider ([#3193](#3193)). * Extending customizable schema with `AtLeastOneOf`, `ExactlyOneOf`, `RequiredWith` ([#3182](#3182)). * Fix `databricks_connection` regression when creating without owner ([#3186](#3186)). * add test code for job task order ([#3183](#3183)). * Allow using empty strings as job parameters ([#3158](#3158)). * Fix notebook parameters in acceptance test ([#3205](#3205)). * Exporter: Add retries for `Search`, `ReadContext` and `Import` operations when importing the resource ([#3202](#3202)). * Fixed updating owners for UC resources ([#3189](#3189)). * Adds `databricks_volumes` as data source ([#3150](#3150)). ### Documentation Changes ### Exporter ### Internal Changes
* Release v1.35.1 ### New Features and Improvements * Exporter: timestamps are now added to log entries ([#3146](#3146)). * Validate metastore id for databricks_grant and databricks_grants resources ([#3159](#3159)). * Exporter: Skip emitting of clusters that come from more cluster sources ([#3161](#3161)). * Fix typo in docs ([#3166](#3166)). * Migrate cluster schema to use the go-sdk struct ([#3076](#3076)). * Introduce Generic Settings Resource ([#2997](#2997)). * Update actions/setup-go to v5 ([#3154](#3154)). * Change default branch from `master` to `main` ([#3174](#3174)). * Add .codegen.json configuration ([#3180](#3180)). * Exporter: performance improvements for big workspaces ([#3167](#3167)). * update ([#3192](#3192)). * Exporter: fix generation of cluster policy resources ([#3185](#3185)). * Fix unit test ([#3201](#3201)). * Suppress diff should apply to new fields added in the same chained call to CustomizableSchema ([#3200](#3200)). * Various documentation updates ([#3198](#3198)). * Use common.Resource consistently throughout the provider ([#3193](#3193)). * Extending customizable schema with `AtLeastOneOf`, `ExactlyOneOf`, `RequiredWith` ([#3182](#3182)). * Fix `databricks_connection` regression when creating without owner ([#3186](#3186)). * add test code for job task order ([#3183](#3183)). * Allow using empty strings as job parameters ([#3158](#3158)). * Fix notebook parameters in acceptance test ([#3205](#3205)). * Exporter: Add retries for `Search`, `ReadContext` and `Import` operations when importing the resource ([#3202](#3202)). * Fixed updating owners for UC resources ([#3189](#3189)). * Adds `databricks_volumes` as data source ([#3150](#3150)). ### Documentation Changes ### Exporter ### Internal Changes * upd * readable * upd * upd
Changes
Performance improvements for the exporter:
EXPORTER_RESOURCE_GENERATORS
(default is 50) goroutines - this helps with processing references to other resources. Generated bodies are sent to dedicated channels that are responsible for writing the code into files.stateApproximation
is reimplemented to avoid linear search, and also locking the whole structure. We'll get a per-resource list of resource approximations together with the direct lookup structure that is used in theHas
call.importContext.Find
is optimized to perform the direct lookup for most of the match types. Only prefix match and some cases of case-insensitive match will lead to iteration over approximations of the given type.EXPORTER_DEDICATED_RESOUSE_CHANNELS
environment variable. The rest is handled by the shared channel, so we'll have better resource utilization. The shared channel by default is handled by 15 goroutines, but this could be adjusted with theEXPORTER_PARALLELISM_default
environment variable.-trace
command-line flag to enable trace logging level.Tests
make test
run locallydocs/
folderinternal/acceptance