[1.7.x] Indy promote perf#1205
Conversation
|
@ruhan1 I don't think it is a good idea to apply No.2 to current rules, because it is original way that these scripts have been used before the paralleled refactoring, and has been proved to have very bad performance. So I agree to add these metrics per rule to see what's pain points and then decide next step. |
|
@ligangt I guess it depends on how many threads and how many tasks. We have 200 threads. It is better for them to run hundreds of a bit bigger tasks than 60,000 very smaller ones. But I agree this is uncertain. To not mislead us, I will revert the groovy files, only keep the tool.forEach in case of need. |
ligangty
left a comment
There was a problem hiding this comment.
Ok. Looks good to me now.
jdcasey
left a comment
There was a problem hiding this comment.
This looks pretty good to me, but I'm not sure how much it will help performance. I think @ligangty is right, that we can't really go back to a fully serialized rule execution approach. At the same time, it does seem like perceived promotion performance has been damaged by the new parallelized approach. This seems to be because the serialized approach might allow smaller by-path promotions to pass larger by-path promotions, since they wouldn't share the same (over-saturated) thread pool.
I can see two other things that might make a big difference, though:
First, batching rule validations in parallelledEach() such that maybe every 100 tasks run in a conventional serialized forEach(), and the batches of 100 run in parallel. This might still allow parallelism without blocking smaller builds from promoting. It's a numbers game though, and at some point it will always saturate if we're using a thread pool...but on the other side, serialized loop performance can never be improved without improving the underlying operation. Since we're using things like HTTP HEAD and XML parsing there, that's a tough nut to crack.
Second, I think another major component to promotion performance is the way we lock the target repository / group. In the case of by-group promotion, that lock is very aggressive, encompassing the validation process as well as the membership change. We could produce a listing of the paths in the source repository that we intended to validate, then only lock those paths in the target group. It would allow multiple promotions targeting the same group to run, as long as they didn't collide on the paths.
In the case of by-path promotions, we literally have the full list of paths we're planning to promote at the point where we lock the whole target repository. If we locked the paths instead, at the point where we were ready to start submitting copy jobs for execution, then we could allow two builds to promote in parallel as long as they didn't collide on paths.
In the longer term, it might also be possible to initialize the hosted repository with some annotation that signals an intention to promote it into some target group or repository. Each file that got stored in this source hosted repository could be validated async as it was uploaded...though the lack of locking on those paths for the target could end up invalidating a lot of those results, producing things like race conditions, etc.
* Performance fixes for promotion * Throw timeout exception in runParallelAndWait * Revert default promotion groovy to use paralleledEach
|
I add a new PR for the batching. Not sure about the locking. we need jira to track. I see you have commented in the NOS-1685. |
* Remove exception stacktrace for remote url Malformed exception I really don't think this exception stack trace is needed as all useful info has been logged, And it occupied bunch of the log space and polluted the log reading. So removed it. * Use same package type for promotion tmp remote repo * Add separator between name and timestamp in promotion tmp remote repo name * Fix issue for temporary remote not deleted after promotion validation * Add system level gauges for metrics * Avoid any exception break metrics reporting in SystemGaugeSet * Add path promotion files count metrics * Fix license headers * Align SNAPSHOT versions * Disable docker for release for now * [maven-release-plugin] prepare release indy-parent-1.7.4 * [maven-release-plugin] prepare for next development iteration * Remove koji ftests dep and align SNAPSHOT * Update weft to 1.14 * [maven-release-plugin] prepare release indy-parent-1.7.5 * [maven-release-plugin] prepare for next development iteration * Add metrics to monitor store disk usage * Use MetricSetProvider for SystemGaugeSet to enable CDI * Use storage directory directly for monitoring * Use storage path in indy configuration instead for storage metrics * Fix test failure * Support jdbc store for folo-sealed ISPN cache * Fix TrackingKey mapper for WrappedByteArray type Add WrappedByteArray type check in TrackingKey2StringMapper which refers ISPN DefaultTwoWayKey2StringMapper implementation. And also adjust some ISPN config for default settings. * Fix way of WrappedByteArray for TrackingKey by using ObjInputStream * Use externalize but not direct obj stream of TrackingKey2StringMapper * Extract abstract class for key2StringMapper with WrappedArrayByte handling * Allow workflow like promotion to bypass readonly flag * Add configurable envars under mdc logging (#1183) (#1194) * Add configurable envars under mdc logging * ADD static SYSTEM_ OPENSHIFT ENV * EnvironmentConfig refactor, make it focusing on variable keys extracting, add MDC entry by JsonLayout extender * fixing serialization / deserialization in tracking record entries (#1195) * Catch all exceptions in promotion executeValidationRule (#1201) * Fix typo of Storertifact.copyBase() * [1.7.x] Refactor CustomJsonLayout, and test it locally (#1200) * Refactor CustomJsonLayout to use default logback layout configuration for envars This will skip the period when CDI isn't started, when it's still trying to log things. It also skips the need for CDI.current() to lookup the environment configuration, putting the envar extraction config directly in logback.xml Conflicts: deployments/launcher/src/main/etc/indy/logging/logback.xml * Remove old CustomJsonLayout class * remove unused constant * Be more careful about empty strings when pulling envars * adjust example kafka config to include SSL params * Generate metadata for a group containing Maven plugins artifacts (#1207) * [1.7.x] Indy promote perf (#1205) * Performance fixes for promotion * Throw timeout exception in runParallelAndWait * Revert default promotion groovy to use paralleledEach * Add skipVersionTest flag when doing Koji path mask repair (#1210) * [1.7.x] Support paralleledInBatch in promotion tool (#1209) * Support paralleledInBatch in promotion tool * Move batchsize to config and fix commented issues * Update license header * [maven-release-plugin] prepare release indy-parent-1.7.6 * [maven-release-plugin] prepare for next development iteration * put tracking id into the MDC (#1214) * [1.7.x] Fix baseUrl (#1217) * Add MDC entries to mark progress in content promotion (#1216) * Add MDC entries to mark progress in content promotion * fix iteration depth MDC injection clear validation tool iteration MDC items,and give a warning when iteration depth is > 0 * [1.7.x] Promotion error consistency and improved by-path lock scoping (#1219) * Make error reporting consistent for promotion results Set a top-level error if validation fails, and add a succeeded() method too. Also, pull the error/validation handling up into the abstract class, and set a constructor so we can call super() * lock the target repo for paths promotion such that validation is included in scope * fix two tests broken by error/succeeded changes in promotion result classes * stop using ThreadContext for parallel iteration depth counting (#1221) * Remove nested paralleledEach from default promotion groovy (#1222) * Fix byPath promotion gauges (#1225) * Add MDC entries to mark progress in content promotion (#1216) * Add MDC entries to mark progress in content promotion * fix iteration depth MDC injection clear validation tool iteration MDC items,and give a warning when iteration depth is > 0 * [1.7.x] Promotion error consistency and improved by-path lock scoping (#1219) * Make error reporting consistent for promotion results Set a top-level error if validation fails, and add a succeeded() method too. Also, pull the error/validation handling up into the abstract class, and set a constructor so we can call super() * lock the target repo for paths promotion such that validation is included in scope * fix two tests broken by error/succeeded changes in promotion result classes * stop using ThreadContext for parallel iteration depth counting (#1221) * Fix license headers * Use storage path in indy configuration instead for storage metrics * Update license header * Remove exception stacktrace for remote url Malformed exception I really don't think this exception stack trace is needed as all useful info has been logged, And it occupied bunch of the log space and polluted the log reading. So removed it. * Fix test failure * Update license header * Allow workflow like promotion to bypass readonly flag
* Remove exception stacktrace for remote url Malformed exception I really don't think this exception stack trace is needed as all useful info has been logged, And it occupied bunch of the log space and polluted the log reading. So removed it. * Use same package type for promotion tmp remote repo * Add separator between name and timestamp in promotion tmp remote repo name * Fix issue for temporary remote not deleted after promotion validation * Add system level gauges for metrics * Avoid any exception break metrics reporting in SystemGaugeSet * Add path promotion files count metrics * Fix license headers * Align SNAPSHOT versions * Disable docker for release for now * [maven-release-plugin] prepare release indy-parent-1.7.4 * [maven-release-plugin] prepare for next development iteration * Remove koji ftests dep and align SNAPSHOT * Update weft to 1.14 * [maven-release-plugin] prepare release indy-parent-1.7.5 * [maven-release-plugin] prepare for next development iteration * Add metrics to monitor store disk usage * Use MetricSetProvider for SystemGaugeSet to enable CDI * Use storage directory directly for monitoring * Use storage path in indy configuration instead for storage metrics * Fix test failure * Support jdbc store for folo-sealed ISPN cache * Fix TrackingKey mapper for WrappedByteArray type Add WrappedByteArray type check in TrackingKey2StringMapper which refers ISPN DefaultTwoWayKey2StringMapper implementation. And also adjust some ISPN config for default settings. * Fix way of WrappedByteArray for TrackingKey by using ObjInputStream * Use externalize but not direct obj stream of TrackingKey2StringMapper * Extract abstract class for key2StringMapper with WrappedArrayByte handling * Allow workflow like promotion to bypass readonly flag * Add configurable envars under mdc logging (Commonjava#1183) (Commonjava#1194) * Add configurable envars under mdc logging * ADD static SYSTEM_ OPENSHIFT ENV * EnvironmentConfig refactor, make it focusing on variable keys extracting, add MDC entry by JsonLayout extender * fixing serialization / deserialization in tracking record entries (Commonjava#1195) * Catch all exceptions in promotion executeValidationRule (Commonjava#1201) * Fix typo of Storertifact.copyBase() * [1.7.x] Refactor CustomJsonLayout, and test it locally (Commonjava#1200) * Refactor CustomJsonLayout to use default logback layout configuration for envars This will skip the period when CDI isn't started, when it's still trying to log things. It also skips the need for CDI.current() to lookup the environment configuration, putting the envar extraction config directly in logback.xml Conflicts: deployments/launcher/src/main/etc/indy/logging/logback.xml * Remove old CustomJsonLayout class * remove unused constant * Be more careful about empty strings when pulling envars * adjust example kafka config to include SSL params * Generate metadata for a group containing Maven plugins artifacts (Commonjava#1207) * [1.7.x] Indy promote perf (Commonjava#1205) * Performance fixes for promotion * Throw timeout exception in runParallelAndWait * Revert default promotion groovy to use paralleledEach * Add skipVersionTest flag when doing Koji path mask repair (Commonjava#1210) * [1.7.x] Support paralleledInBatch in promotion tool (Commonjava#1209) * Support paralleledInBatch in promotion tool * Move batchsize to config and fix commented issues * Update license header * [maven-release-plugin] prepare release indy-parent-1.7.6 * [maven-release-plugin] prepare for next development iteration * put tracking id into the MDC (Commonjava#1214) * [1.7.x] Fix baseUrl (Commonjava#1217) * Add MDC entries to mark progress in content promotion (Commonjava#1216) * Add MDC entries to mark progress in content promotion * fix iteration depth MDC injection clear validation tool iteration MDC items,and give a warning when iteration depth is > 0 * [1.7.x] Promotion error consistency and improved by-path lock scoping (Commonjava#1219) * Make error reporting consistent for promotion results Set a top-level error if validation fails, and add a succeeded() method too. Also, pull the error/validation handling up into the abstract class, and set a constructor so we can call super() * lock the target repo for paths promotion such that validation is included in scope * fix two tests broken by error/succeeded changes in promotion result classes * stop using ThreadContext for parallel iteration depth counting (Commonjava#1221) * Remove nested paralleledEach from default promotion groovy (Commonjava#1222) * Fix byPath promotion gauges (Commonjava#1225) * Add MDC entries to mark progress in content promotion (Commonjava#1216) * Add MDC entries to mark progress in content promotion * fix iteration depth MDC injection clear validation tool iteration MDC items,and give a warning when iteration depth is > 0 * [1.7.x] Promotion error consistency and improved by-path lock scoping (Commonjava#1219) * Make error reporting consistent for promotion results Set a top-level error if validation fails, and add a succeeded() method too. Also, pull the error/validation handling up into the abstract class, and set a constructor so we can call super() * lock the target repo for paths promotion such that validation is included in scope * fix two tests broken by error/succeeded changes in promotion result classes * stop using ThreadContext for parallel iteration depth counting (Commonjava#1221) * Fix license headers * Use storage path in indy configuration instead for storage metrics * Update license header * Remove exception stacktrace for remote url Malformed exception I really don't think this exception stack trace is needed as all useful info has been logged, And it occupied bunch of the log space and polluted the log reading. So removed it. * Fix test failure * Update license header * Allow workflow like promotion to bypass readonly flag
This is for NOSSUP-45. Basically, the promotion ran very slowly. I looked at the grafana for Apr 4 and 5, and found (for some period):
I add a few changes in this PR:
I am not sure how the #2 works. Maybe we just add more threads to the pool? We may stick to the current prod groovy scripts and see what is going on by #4 metric. Once we identify the slowest rule, we can try the #2.