Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #16382: Improve performance of policy generation writer #2666

Conversation

fanf
Copy link
Member

@fanf fanf commented Dec 9, 2019

https://issues.rudder.io/issues/16382

This PR try to make the abysmal performances of policy generation in 6.0 less abysmal.

Most of that PR was split in smaller amount, but there is a big part remaining: I change (most) of the the write logic, and that part can't really be done in smaller bit.

The big changes are:

  • port a bit more GitTemplateReader & friends to ZIO to avoid switching from in/out ZIO;
  • change the way policies are done so that now, it' not by node but by template, which avoid to lock on them
  • add A LOT of timing information, which are now pretty cool
  • use atomic move when possible, but it doesn't yield a lot of perf,
  • parallelise move of policies to their final place,
  • use ArraySeq for string template values, which avoid to switch between list and array which cost a lot in ZIO.

It can be merded.

--- for records, what was done before --

Here, we explore three paths to earn some performances:

  • 1/ try to use OS atomic move operation in place of copy+delete for moving policies from rules.new to rules
  • 2/ adapt ZIO blocking thread executor, giving it a bit more chance to reuse existing threads
  • 3/ refactoring from Box to PureResult and IOResult. There is something extremely costly with toIO and toBox (and even more for layers of them).
  • 4/ add a parrallel call in prepareTechniqueTemplate

Tests and measure were done on WriteSystemTechniquesTest, so the test case is rather specific and small (even for the 500 nodes cases). So the experiment should be replicated in real load testing environment to draw any conclusions.

1/ That change does not seems to change anything, but my test case didn't exibited the big problems seen elsewhere. It could be an easy win.

2/ Changing the threadpool ergonomics leads to ~7% better perf, but it may be due to the very specific patterns in the unit test.

3/ This change is the biggest in number of lines. The most dramatic improvement is due to the changes in STVariable for validation, which drives on itself a ~7% improvement. Validation is called a lot of time. Other changes leads to 5-7% improvment.

4/ This last change lead to ~30% improvment (but uniquely on prepareTechniqueTemplate phase)

All in all, we get a consistant 20-22% improvment compared to 6.0.0.

--- oups, I did a rebase, I wanted to just do an added commit :/ ----

Some more change, and we are now near what we had in 5.0 (still 25% worse for writting techniques) but on the other hand, we are 50 times faster on move promise to final position.

New changes:

  • use Promise for parsing template (to put it in cache) which allows to block less in semaphore,
  • change the way we fill template and use IOResult.effectNonBlocking to minimize time in semaphore. This (especially avoiding thread creation with effectNonBlocking) leads to a 4x improvement on the most costly step.
  • remove parallelisation below the first level (ie all traversePar). It leads to more stable result overall.

It's also intersting to see that almost nothing change when we add nodes to the test (until we reach memory/gc limit, and then things go to a stop).

Results in images:

2019-12-17_21 23 33-Untitled_spreadsheet_-_Google_Sheets

2019-12-17_20 25 58-Untitled_spreadsheet_-_Google_Sheets

@fanf
Copy link
Member Author

fanf commented Dec 17, 2019

PR rebased

@fanf fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from df0b5e2 to a81193f Compare December 17, 2019 19:28
for {
_ <- PolicyLoggerPure.trace(s"Loading template: ${templateId}")
//string template does not allows "." in path name, so we are force to use a templateGroup by polity template (versions have . in them)
content <- IOResult.effect(s"Error when copying technique template '${templateId.toString}'")(inputStream.asString(false))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is switching from IOUtils to better.files as a noticable impact ? if so, we could do that in 5.0 also

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it seems that performance are no better

val src = File(nodeFolder)
if (src.isDirectory()) {
val dest = File(backupFolder)
if (dest.isDirectory) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a stupid question: if the folder is there, and we move, do we really nned to delete first ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was for the case when something bad happened in a previous generation. But now that we can be much more fine grained on our error management, perhaps we could delete on error. I'm not sur it changes much in perf (to avoid the unused delete).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete is fairly costly when there are many files and already a lors of I/O
I don't remember exactly, but it was between 2 and 6 minutes to delete with rm the backup folders with 10 000 nodes

@fanf
Copy link
Member Author

fanf commented Dec 17, 2019

PR updated with a new commit

@ncharles
Copy link
Member

This fails:

[2019-12-18 04:57:13] ERROR policy.generation - Root exception was: /var/rudder/share/de0727bb-b594-4929-bebd-2ca258c47204/rules.new/test-rudder-policy-mv-options8612784980732448 -> /var/rudder/share/de0727bb-b594-4929-bebd-2ca258c47204/rules
[2019-12-18 04:57:13] INFO  policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed

i don't have more explaination than that

def getHooks(basePath: String, ignoreSuffixes: List[String]): Box[Hooks] = getHooksPure(basePath, ignoreSuffixes).toBox

def getHooksPure(basePath: String, ignoreSuffixes: List[String]): IOResult[Hooks] = {
IOResult.effect {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this causes a huge drop in performance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will need to test specifically that part. I think I spawn thread for non blocking part (the blocking part already spawn a thread via NuProcess. I need to check carefully

@fanf
Copy link
Member Author

fanf commented Dec 18, 2019

PR updated with a new commit

2 similar comments
@fanf
Copy link
Member Author

fanf commented Dec 19, 2019

PR updated with a new commit

@fanf
Copy link
Member Author

fanf commented Dec 19, 2019

PR updated with a new commit

@fanf fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from 17fdfe1 to 7b1e301 Compare December 19, 2019 16:05
@fanf
Copy link
Member Author

fanf commented Dec 19, 2019

PR updated with a new commit

1 similar comment
@fanf
Copy link
Member Author

fanf commented Dec 19, 2019

PR updated with a new commit

@fanf fanf added the WIP Use that label for a Work In Progress PR that must not be merged yet label Jan 7, 2020
@fanf fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch 2 times, most recently from ae2282c to 125afc6 Compare January 14, 2020 19:11
@fanf fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch 2 times, most recently from 8d1a42a to 21a05f3 Compare January 27, 2020 16:23
@fanf fanf removed the WIP Use that label for a Work In Progress PR that must not be merged yet label Jan 27, 2020
@fanf fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from 9c8178d to 52b44bd Compare January 27, 2020 18:54
t <- pt.templatesToProcess
} yield {
(t.content, TemplateFillInfo(t.id, t.destination, p.paths.newFolder, pt.environmentVariables, pt.reportIdToReplace))
}).groupMap(_._1)(_._2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really clever

)
t2 <- currentTimeNanos
_ <- writeTimer.writeTemplate.update(_ + t2 - t1)
} yield ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit scared by the potential memory usage here, as we keep a lot of data in memory
I'm going to accept it, because it seems ok on our test platform, but let's keep an eye on this

*/
object FillTemplateThreadUnsafe {
////////// Hottest method on whole Rudder //////////
def fill(templateName: String, sourceTemplate: StringTemplate, variables: Seq[STVariable], timer: FillTemplateTimer, replaceId: Option[(String, String)]): IOResult[(String, String)] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there 2 fill in this file ? how can we know which one to use ?

@Normation-Quality-Assistant
Copy link
Contributor

This PR is not mergeable to upper versions.
Since it is "Ready for merge" you must merge it by yourself using the following command:
rudder-dev merge https://github.com/Normation/rudder/pull/2666
-- Your faithful QA
Kant merge: "To be is to do."
(https://ci.normation.com/jenkins/job/merge-accepted-pr/19775/console)

@fanf
Copy link
Member Author

fanf commented Jan 28, 2020

OK, squash merging this PR

@fanf fanf force-pushed the bug_16382/improve_performance_of_policy_generation_writer branch from 52b44bd to 5c4b3b2 Compare January 28, 2020 14:07
@fanf fanf merged commit 5c4b3b2 into Normation:branches/rudder/6.0 Jan 28, 2020
@fanf fanf deleted the bug_16382/improve_performance_of_policy_generation_writer branch March 15, 2024 10:20
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants