Fixes #15675: Leak in Cache of Node Compliance and NodeInfo and perfs improvement #2648

ncharles · 2019-12-03T08:47:17Z

https://issues.rudder.io/issues/15675

… improvement

ncharles · 2019-12-03T08:47:29Z

do not merge for now, need testing

ncharles · 2019-12-03T09:20:49Z

...er/rudder-core/src/main/scala/com/normation/rudder/services/policies/DeploymentService.scala


      writeTime             =  System.currentTimeMillis
-      writtenNodeConfigs    <- writeNodeConfigurations(rootNodeId, updatedNodeConfigIds, nodeConfigs, allLicenses, globalPolicyMode, generationTime, parallelism) ?~!"Cannot write nodes configuration"
+// we don't need nodeConfigs, but rather the nodeConfigs for updated nodes, and all nodesInfos
+      _                     <- writeNodeConfigurations(rootNodeId, updatedNodeConfigIds, updatedNodeConfigs, allLicenses, globalPolicyMode, generationTime, parallelism) ?~!"Cannot write nodes configuration"


this change has bee overzealous, we still need allNodeConfigInfo here

ncharles · 2019-12-03T14:33:15Z

it seems this change let me go from 12Go to 10Go heap size

This reverts commit 9598aaa.

ncharles · 2019-12-16T01:16:42Z

...er/rudder-core/src/main/scala/com/normation/rudder/services/policies/DeploymentService.scala

+      _                     =  PolicyLogger.debug(s"Reports computed in ${timeSetExpectedReport} ms")
+
+      _                    =  PolicyLogger.debug(s"Size of expectedReports is ${memoryMeter.measure(expectedReports)} ")
+      _                    =  PolicyLogger.debug(s"Size deepof expectedReports is ${memoryMeter.measureDeep(expectedReports)} ")


[2019-12-15 21:15:44] DEBUG policy.generation - Size of allNodeInfos is 24
[2019-12-15 21:15:46] DEBUG policy.generation - Size deepof allNodeInfos is 26394576
[2019-12-15 21:15:46] DEBUG policy.generation - Size of allNodeModes is 24
[2019-12-15 21:15:46] DEBUG policy.generation - Size deepof allNodeModes is 1132752
[2019-12-15 21:15:46] DEBUG policy.generation - Size of nodeConfigCaches is 16
[2019-12-15 21:15:46] DEBUG policy.generation - Size deepof nodeConfigCaches is 16
[2019-12-15 21:15:56] DEBUG policy.generation - Size of ruleVals is 24
[2019-12-15 21:32:14] DEBUG policy.generation - Size deepof ruleVals is 1978959240
[2019-12-15 21:32:33] DEBUG policy.generation - Size of nodeContexts is 24
[2019-12-15 21:32:38] DEBUG policy.generation - Size deepof nodeContexts is 304977352
[2019-12-15 21:35:28] DEBUG policy.generation - Size of nodeConfigs is 24
[2019-12-15 21:56:58] DEBUG policy.generation - Size deepof nodeConfigs is 3474356896
[2019-12-15 21:57:13] DEBUG policy.generation - Size of updatedNodeConfigIds is 24
[2019-12-15 21:57:13] DEBUG policy.generation - Size deepof updatedNodeConfigIds is 1423528
[2019-12-15 21:57:13] DEBUG policy.generation - Size of updatedNodeConfigs is 24
[2019-12-15 22:18:08] DEBUG policy.generation - Size deepof updatedNodeConfigs is 3504843864
[2019-12-15 22:18:08] DEBUG policy.generation - Size of updatedNodeInfo is 24
[2019-12-15 22:18:09] DEBUG policy.generation - Size deepof updatedNodeInfo is 26395416
[2019-12-15 22:18:09] DEBUG policy.generation - Size of updatedNodesId is 24
[2019-12-15 22:18:09] DEBUG policy.generation - Size deepof updatedNodesId is 804984
[2019-12-15 22:28:57] DEBUG policy.generation - Size of expectedReports is 24
[2019-12-15 22:34:29] DEBUG policy.generation - Size deepof expectedReports is 241639688

largest objects are
updatedNodeConfigs = 3 504 843 864
nodeConfigs = 3 474 356 896
ruleVals = 1 978 959 240
nodeContexts = 304 977 352
expectedReports = 241 639 688

ping @fanf

it's in byte or bite ?

this is in byte

out of the 2 668 075 048 within nodeConfigs, 2 582 780 816 are for the policies

ncharles · 2019-12-16T03:15:26Z

...udder/rudder-core/src/main/scala/com/normation/rudder/services/policies/DataStructures.scala

@@ -357,7 +357,7 @@ final case class Policy(
  // == .toList (keep order)     ==> List[List[Variable]]
  // == flatten (keep order)     ==> List[Variable]
  val expandedVars    = Policy.mergeVars(policyVars.map( _.expandedVars.values).toList.flatten)
-  val originalVars    = Policy.mergeVars(policyVars.map( _.originalVars.values).toList.flatten)
+  //val originalVars    = Policy.mergeVars(policyVars.map( _.originalVars.values).toList.flatten)


this one seems to have a lot of impact
Size deepof nodeConfigs is 2 668 075 072
Size deepof ruleVals is 376 459 952

vs
Size deepof nodeConfigs is 3 474 356 896
Size deepof ruleVals is 748 776 080

but we need to validate as there is a lot of variability in ruleval (from 748 487 288 to 2 123 635 544)

nodeconfig is stable at about 2 668 075 048 with this line vs 3 474 356 896 without this line

ncharles · 2019-12-16T06:00:51Z

...er/rudder-core/src/main/scala/com/normation/rudder/services/policies/DeploymentService.scala

+      nodeContexts         = nodeConfigs.map { case (_, nodeconfig) => nodeconfig.nodeContext}
+
+      _                    =  PolicyLogger.debug(s"Size of nodeConfigs.nodeContext is ${memoryMeter.measure(nodeContexts)}")
+      _                    =  PolicyLogger.debug(s"Size deepof nodeConfigs is ${memoryMeter.measureDeep(nodeContexts)} ")


this is 78 016 840 only

ncharles · 2019-12-16T12:51:07Z

...udder/rudder-core/src/main/scala/com/normation/rudder/services/policies/DataStructures.scala

@@ -357,7 +357,7 @@ final case class Policy(
  // == .toList (keep order)     ==> List[List[Variable]]
  // == flatten (keep order)     ==> List[Variable]
  val expandedVars    = Policy.mergeVars(policyVars.map( _.expandedVars.values).toList.flatten)


i'm really tempted to make a def out of this one, as it is used only in two different place

ncharles · 2020-01-01T14:58:38Z

The fill template is the most expensive part of the writing of policies, by a factor 3. We could certainly do better, but I can't really see how
[2020-01-01 14:54:19] DEBUG policy.generation - Templates filled in 1130297 ms
[2020-01-01 14:54:19] DEBUG policy.generation - Templates replaced in 51243 ms
[2020-01-01 14:54:19] DEBUG policy.generation - Templates wrote in 407855 ms
[2020-01-01 14:54:19] DEBUG policy.generation - Promises written in 329058 ms

(it adds up to more than Promises written, because the 3 before are multi-threaded)

ncharles · 2020-01-06T22:49:17Z

...es/rudder/rudder-core/src/main/scala/com/normation/rudder/domain/reports/StatusReports.scala

@@ -76,7 +76,7 @@ final class RuleStatusReport private (
  , val overrides : List[OverridenPolicy]
 ) extends StatusReport {
  val compliance = report.compliance
-  val byNodes: Map[NodeId, AggregatedStatusReport] = report.reports.groupBy(_.nodeId).mapValues(AggregatedStatusReport(_))
+  val byNodes: Map[NodeId, AggregatedStatusReport] = report.reports.groupBy(_.nodeId).map{case (a, value) => (a, AggregatedStatusReport(value))}


it looks like this files should not have been changed, as the cost for homepage skyrockets

ncharles · 2020-01-06T22:50:04Z

...rudder/rudder-core/src/main/scala/com/normation/rudder/services/reports/ExecutionBatch.scala

@@ -1226,13 +1226,13 @@ object ExecutionBatch extends Loggable {
     */
    ComponentStatusReport(
        expectedComponent.componentName
-      , ComponentValueStatusReport.merge(unexpectedReportStatus ::: pairedReportStatus).mapValues { status =>
+      , ComponentValueStatusReport.merge(unexpectedReportStatus ::: pairedReportStatus).map { case (key, status) =>


we need to inspect perf cost here

ncharles · 2020-01-06T22:50:24Z

...dder/rudder-core/src/main/scala/com/normation/rudder/services/reports/ReportingService.scala

-      val result = reports.mapValues { status =>
-        NodeStatusReport.filterByRules(status, ruleIds)
+      val result = reports.map { case (nodeId, status) =>
+        (nodeId, NodeStatusReport.filterByRules(status, ruleIds))


need to inspect cost here also

ncharles · 2020-01-07T14:01:55Z

it need much more work to detect if mapValue change was worthy or not. Will revert this commit, and focus on measuring the impact of the extra collection created to have only the nodeconfiguration of node changed, which may (or not) double memory requierement in worst case. Then we'll call it a PR

This reverts commit 9176763.

ncharles · 2020-01-09T12:51:40Z

This looks good, so I won't change anything in this PR. I'll cherry pick parts in new PRs, more unitary

ncharles · 2020-01-09T12:54:39Z

Note: spikes in memory usage seems to come from

scheduledJob - Writting non-compliant-report logs beetween ids 9136561090 and 9136648677 (both incuded)

and from compute compliance

ncharles · 2020-01-12T05:57:02Z

Sharp spikes in moemory usages happens:

in the middle of building node target configuration (1)
in th middle of path computed and template read (1)
during the writing of policies (many)
during caching in LDAP (1)
during save of expected reports (1)

i'm a bit unsure if it is caused by the policy generation, or the pressure from compliance on top of policy generation

ncharles · 2020-03-02T14:33:30Z

Closing, as all has been backported in more sensible PRs

Fixes #15675: Leak in Cache of Node Compliance and NodeInfo and perfs…

3af575c

… improvement

ncharles added the qa: Can't merge label Dec 3, 2019

ncharles commented Dec 3, 2019

View reviewed changes

ncharles added 2 commits December 3, 2019 10:34

fixup! fixes #15675: fix memory usage

e6aeb1a

fixup! fixup! fixes #15675: fix memory usage

f80eee0

ncharles added 8 commits December 9, 2019 11:20

fixup

32ecee9

fixup

b5e4710

Try to prevent locking in yourkit

9b35e71

memory usage computation

9598aaa

Revert "memory usage computation"

38611b3

This reverts commit 9598aaa.

adding memory usage computation

b530809

allow instrumentaton

f70ce20

exclude jar to allow instrumentation to load

3ed9ec0

ncharles commented Dec 16, 2019

View reviewed changes

ncharles added 2 commits December 16, 2019 02:58

adding measure pour nodecontexts

87c09cf

originalVars is unused

78cdb29

ncharles commented Dec 16, 2019

View reviewed changes

measuring policies

74e664f

ncharles commented Dec 16, 2019

View reviewed changes

ncharles added 8 commits December 18, 2019 11:26

test wrapping inside for yield

312da22

typo

1f92658

remove unessary call to trim

16eafe2

more for yield, it's like violence

18af296

lower number of created objects + fasten compliance computation

c56c638

try to help hotpot by adding all variables at once

9bce1c0

try to help hotpot by adding all variables at once - edge case

9e2c74e

for yield, for yield for everyone, for you, for them

e916e3e

ncharles added 3 commits December 31, 2019 23:57

randomize templates to write

78eab1e

adding timing info

36b0eea

aggregate timings

e3001fa

ncharles added 6 commits January 1, 2020 20:42

improve timing

535e32e

fix type

2001015

fix type2

97c9c0b

fix type2

fbfcc9c

fix compilation

355fbeb

remove some mapValues

9176763

ncharles commented Jan 6, 2020

View reviewed changes

fanf added the WIP Use that label for a Work In Progress PR that must not be merged yet label Jan 7, 2020

ncharles added 2 commits January 9, 2020 09:09

Revert "remove some mapValues"

7fe1bc6

This reverts commit 9176763.

removing extra map of nodeconfiguration in the hope to save memory

a0c1757

ncharles added 3 commits January 11, 2020 23:46

attempt to lower pressure on memory usage

5a9d3ea

fix type

6225312

fix type

e680f67

ncharles added 5 commits January 12, 2020 07:42

clean some usage

33f4ca6

correct type

96ceafb

test to remove spike

a93b67e

try to lower memory pressur whil fetching reports

236a035

fix type

9ff671a

ncharles closed this Mar 2, 2020

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #15675: Leak in Cache of Node Compliance and NodeInfo and perfs improvement #2648

Fixes #15675: Leak in Cache of Node Compliance and NodeInfo and perfs improvement #2648

ncharles commented Dec 3, 2019

ncharles commented Dec 3, 2019

ncharles Dec 3, 2019

ncharles commented Dec 3, 2019

ncharles Dec 16, 2019

ncharles Dec 16, 2019

fanf Dec 16, 2019

ncharles Dec 16, 2019

ncharles Dec 16, 2019

ncharles Dec 16, 2019

ncharles Dec 16, 2019

ncharles Dec 16, 2019

ncharles Dec 16, 2019

ncharles commented Jan 1, 2020

ncharles Jan 6, 2020

ncharles Jan 6, 2020

ncharles Jan 6, 2020

ncharles commented Jan 7, 2020

ncharles commented Jan 9, 2020

ncharles commented Jan 9, 2020

ncharles commented Jan 12, 2020

ncharles commented Mar 2, 2020

Fixes #15675: Leak in Cache of Node Compliance and NodeInfo and perfs improvement #2648

Fixes #15675: Leak in Cache of Node Compliance and NodeInfo and perfs improvement #2648

Conversation

ncharles commented Dec 3, 2019

ncharles commented Dec 3, 2019

Choose a reason for hiding this comment

ncharles commented Dec 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncharles commented Jan 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncharles commented Jan 7, 2020

ncharles commented Jan 9, 2020

ncharles commented Jan 9, 2020

ncharles commented Jan 12, 2020

ncharles commented Mar 2, 2020