Fixes #16382: Improve performance of policy generation writer #2666

fanf · 2019-12-09T16:49:05Z

https://issues.rudder.io/issues/16382

This PR try to make the abysmal performances of policy generation in 6.0 less abysmal.

Most of that PR was split in smaller amount, but there is a big part remaining: I change (most) of the the write logic, and that part can't really be done in smaller bit.

The big changes are:

port a bit more GitTemplateReader & friends to ZIO to avoid switching from in/out ZIO;
change the way policies are done so that now, it' not by node but by template, which avoid to lock on them
add A LOT of timing information, which are now pretty cool
use atomic move when possible, but it doesn't yield a lot of perf,
parallelise move of policies to their final place,
use ArraySeq for string template values, which avoid to switch between list and array which cost a lot in ZIO.

It can be merded.

--- for records, what was done before --

Here, we explore three paths to earn some performances:

1/ try to use OS atomic move operation in place of copy+delete for moving policies from rules.new to rules
2/ adapt ZIO blocking thread executor, giving it a bit more chance to reuse existing threads
3/ refactoring from Box to PureResult and IOResult. There is something extremely costly with toIO and toBox (and even more for layers of them).
4/ add a parrallel call in prepareTechniqueTemplate

Tests and measure were done on WriteSystemTechniquesTest, so the test case is rather specific and small (even for the 500 nodes cases). So the experiment should be replicated in real load testing environment to draw any conclusions.

1/ That change does not seems to change anything, but my test case didn't exibited the big problems seen elsewhere. It could be an easy win.

2/ Changing the threadpool ergonomics leads to ~7% better perf, but it may be due to the very specific patterns in the unit test.

3/ This change is the biggest in number of lines. The most dramatic improvement is due to the changes in STVariable for validation, which drives on itself a ~7% improvement. Validation is called a lot of time. Other changes leads to 5-7% improvment.

4/ This last change lead to ~30% improvment (but uniquely on prepareTechniqueTemplate phase)

All in all, we get a consistant 20-22% improvment compared to 6.0.0.

--- oups, I did a rebase, I wanted to just do an added commit :/ ----

Some more change, and we are now near what we had in 5.0 (still 25% worse for writting techniques) but on the other hand, we are 50 times faster on move promise to final position.

New changes:

use Promise for parsing template (to put it in cache) which allows to block less in semaphore,
change the way we fill template and use IOResult.effectNonBlocking to minimize time in semaphore. This (especially avoiding thread creation with effectNonBlocking) leads to a 4x improvement on the most costly step.
remove parallelisation below the first level (ie all traversePar). It leads to more stable result overall.

It's also intersting to see that almost nothing change when we add nodes to the test (until we reach memory/gc limit, and then things go to a stop).

Results in images:

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

fanf · 2019-12-17T19:28:07Z

PR rebased

ncharles · 2019-12-17T21:07:51Z

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

+                                   for {
+                                     _       <- PolicyLoggerPure.trace(s"Loading template: ${templateId}")
+                                               //string template does not allows "." in path name, so we are force to use a templateGroup by polity template (versions have . in them)
+                                     content <- IOResult.effect(s"Error when copying technique template '${templateId.toString}'")(inputStream.asString(false))


is switching from IOUtils to better.files as a noticable impact ? if so, we could do that in 5.0 also

so it seems that performance are no better

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

ncharles · 2019-12-17T21:21:20Z

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

+      val src = File(nodeFolder)
+      if (src.isDirectory()) {
+        val dest = File(backupFolder)
+        if (dest.isDirectory) {


just a stupid question: if the folder is there, and we move, do we really nned to delete first ?

I think it was for the case when something bad happened in a previous generation. But now that we can be much more fine grained on our error management, perhaps we could delete on error. I'm not sur it changes much in perf (to avoid the unused delete).

delete is fairly costly when there are many files and already a lors of I/O
I don't remember exactly, but it was between 2 and 6 minutes to delete with rm the backup folders with 10 000 nodes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

fanf · 2019-12-17T23:48:07Z

PR updated with a new commit

ncharles · 2019-12-18T05:06:20Z

This fails:

[2019-12-18 04:57:13] ERROR policy.generation - Root exception was: /var/rudder/share/de0727bb-b594-4929-bebd-2ca258c47204/rules.new/test-rudder-policy-mv-options8612784980732448 -> /var/rudder/share/de0727bb-b594-4929-bebd-2ca258c47204/rules
[2019-12-18 04:57:13] INFO  policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed

i don't have more explaination than that

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

ncharles · 2019-12-18T05:33:05Z

webapp/sources/rudder/rudder-core/src/main/scala/com/normation/rudder/hooks/RunHooks.scala

+   def getHooks(basePath: String, ignoreSuffixes: List[String]): Box[Hooks] = getHooksPure(basePath, ignoreSuffixes).toBox
+
+   def getHooksPure(basePath: String, ignoreSuffixes: List[String]): IOResult[Hooks] = {
+     IOResult.effect {


it seems this causes a huge drop in performance

OK, I will need to test specifically that part. I think I spawn thread for non blocking part (the blocking part already spawn a thread via NuProcess. I need to check carefully

fanf · 2019-12-18T09:42:42Z

PR updated with a new commit

fanf · 2019-12-19T01:10:31Z

PR updated with a new commit

fanf · 2019-12-19T09:14:46Z

PR updated with a new commit

fanf · 2019-12-19T19:55:46Z

PR updated with a new commit

fanf · 2019-12-19T23:56:19Z

PR updated with a new commit

ncharles · 2020-01-28T09:56:53Z

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

+      t  <- pt.templatesToProcess
+    } yield {
+      (t.content, TemplateFillInfo(t.id, t.destination, p.paths.newFolder, pt.environmentVariables, pt.reportIdToReplace))
+    }).groupMap(_._1)(_._2)


This is really clever

ncharles · 2020-01-28T10:01:17Z

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala

+                                           )
+                       t2               <- currentTimeNanos
+                       _                <- writeTimer.writeTemplate.update(_ + t2 - t1)
+                     } yield ()


I am a bit scared by the potential memory usage here, as we keep a lot of data in memory
I'm going to accept it, because it seems ok on our test platform, but let's keep an eye on this

ncharles · 2020-01-28T10:17:55Z

...es/rudder/rudder-templates/src/main/scala/com/normation/templates/FillTemplatesService.scala

+ */
+object FillTemplateThreadUnsafe {
+  ////////// Hottest method on whole Rudder //////////
+  def fill(templateName: String, sourceTemplate: StringTemplate, variables: Seq[STVariable], timer: FillTemplateTimer, replaceId: Option[(String, String)]): IOResult[(String, String)] = {


why is there 2 fill in this file ? how can we know which one to use ?

Normation-Quality-Assistant · 2020-01-28T11:44:44Z

This PR is not mergeable to upper versions.
Since it is "Ready for merge" you must merge it by yourself using the following command:
rudder-dev merge https://github.com/Normation/rudder/pull/2666
-- Your faithful QA
Kant merge: "To be is to do."
(https://ci.normation.com/jenkins/job/merge-accepted-pr/19775/console)

fanf · 2020-01-28T14:06:34Z

OK, squash merging this PR

ncharles reviewed Dec 9, 2019

View reviewed changes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala Show resolved Hide resolved

ncharles reviewed Dec 9, 2019

View reviewed changes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala Outdated Show resolved Hide resolved

fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from df0b5e2 to a81193f Compare December 17, 2019 19:28

ncharles reviewed Dec 17, 2019

View reviewed changes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala Outdated Show resolved Hide resolved

ncharles reviewed Dec 17, 2019

View reviewed changes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala Show resolved Hide resolved

ncharles reviewed Dec 17, 2019

View reviewed changes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala Outdated Show resolved Hide resolved

ncharles reviewed Dec 17, 2019

View reviewed changes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala Outdated Show resolved Hide resolved

ncharles reviewed Dec 18, 2019

View reviewed changes

...r-core/src/main/scala/com/normation/rudder/services/policies/write/PolicyWriterService.scala Outdated Show resolved Hide resolved

ncharles reviewed Dec 18, 2019

View reviewed changes

fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from 17fdfe1 to 7b1e301 Compare December 19, 2019 16:05

fanf added the WIP Use that label for a Work In Progress PR that must not be merged yet label Jan 7, 2020

fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch 2 times, most recently from ae2282c to 125afc6 Compare January 14, 2020 19:11

fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch 2 times, most recently from 8d1a42a to 21a05f3 Compare January 27, 2020 16:23

fanf removed the WIP Use that label for a Work In Progress PR that must not be merged yet label Jan 27, 2020

fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from 9c8178d to 52b44bd Compare January 27, 2020 18:54

ncharles reviewed Jan 28, 2020

View reviewed changes

ncharles approved these changes Jan 28, 2020

View reviewed changes

Normation-Quality-Assistant added the qa: Can't merge label Jan 28, 2020

Fixes #16382: Improve performance of policy generation writer

5c4b3b2

fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from 52b44bd to 5c4b3b2 Compare January 28, 2020 14:07

fanf merged commit 5c4b3b2 into Normation:branches/rudder/6.0 Jan 28, 2020

fanf deleted the bug_16382/improve_performance_of_policy_generation_writer branch March 15, 2024 10:20

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #16382: Improve performance of policy generation writer #2666

Fixes #16382: Improve performance of policy generation writer #2666

fanf commented Dec 9, 2019 •

edited

Loading

fanf commented Dec 17, 2019

ncharles Dec 17, 2019

fanf Jan 6, 2020

ncharles Dec 17, 2019

fanf Dec 17, 2019

ncharles Dec 18, 2019

fanf commented Dec 17, 2019

ncharles commented Dec 18, 2019

ncharles Dec 18, 2019

fanf Dec 18, 2019

fanf commented Dec 18, 2019

fanf commented Dec 19, 2019

fanf commented Dec 19, 2019

fanf commented Dec 19, 2019

fanf commented Dec 19, 2019

ncharles Jan 28, 2020

ncharles Jan 28, 2020

ncharles Jan 28, 2020

Normation-Quality-Assistant commented Jan 28, 2020

fanf commented Jan 28, 2020

Fixes #16382: Improve performance of policy generation writer #2666

Fixes #16382: Improve performance of policy generation writer #2666

Conversation

fanf commented Dec 9, 2019 • edited Loading

fanf commented Dec 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fanf commented Dec 17, 2019

ncharles commented Dec 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fanf commented Dec 18, 2019

fanf commented Dec 19, 2019

fanf commented Dec 19, 2019

fanf commented Dec 19, 2019

fanf commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Normation-Quality-Assistant commented Jan 28, 2020

fanf commented Jan 28, 2020

fanf commented Dec 9, 2019 •

edited

Loading