Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #15675: Leak in Cache of Node Compliance and NodeInfo and perfs improvement #2648

Conversation

ncharles
Copy link
Member

@ncharles ncharles commented Dec 3, 2019

@ncharles
Copy link
Member Author

ncharles commented Dec 3, 2019

do not merge for now, need testing


writeTime = System.currentTimeMillis
writtenNodeConfigs <- writeNodeConfigurations(rootNodeId, updatedNodeConfigIds, nodeConfigs, allLicenses, globalPolicyMode, generationTime, parallelism) ?~!"Cannot write nodes configuration"
// we don't need nodeConfigs, but rather the nodeConfigs for updated nodes, and all nodesInfos
_ <- writeNodeConfigurations(rootNodeId, updatedNodeConfigIds, updatedNodeConfigs, allLicenses, globalPolicyMode, generationTime, parallelism) ?~!"Cannot write nodes configuration"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change has bee overzealous, we still need allNodeConfigInfo here

@ncharles
Copy link
Member Author

ncharles commented Dec 3, 2019

it seems this change let me go from 12Go to 10Go heap size

_ = PolicyLogger.debug(s"Reports computed in ${timeSetExpectedReport} ms")

_ = PolicyLogger.debug(s"Size of expectedReports is ${memoryMeter.measure(expectedReports)} ")
_ = PolicyLogger.debug(s"Size deepof expectedReports is ${memoryMeter.measureDeep(expectedReports)} ")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2019-12-15 21:15:44] DEBUG policy.generation - Size of allNodeInfos is 24
[2019-12-15 21:15:46] DEBUG policy.generation - Size deepof allNodeInfos is 26394576
[2019-12-15 21:15:46] DEBUG policy.generation - Size of allNodeModes is 24
[2019-12-15 21:15:46] DEBUG policy.generation - Size deepof allNodeModes is 1132752
[2019-12-15 21:15:46] DEBUG policy.generation - Size of nodeConfigCaches is 16
[2019-12-15 21:15:46] DEBUG policy.generation - Size deepof nodeConfigCaches is 16
[2019-12-15 21:15:56] DEBUG policy.generation - Size of ruleVals is 24
[2019-12-15 21:32:14] DEBUG policy.generation - Size deepof ruleVals is 1978959240
[2019-12-15 21:32:33] DEBUG policy.generation - Size of nodeContexts is 24
[2019-12-15 21:32:38] DEBUG policy.generation - Size deepof nodeContexts is 304977352
[2019-12-15 21:35:28] DEBUG policy.generation - Size of nodeConfigs is 24
[2019-12-15 21:56:58] DEBUG policy.generation - Size deepof nodeConfigs is 3474356896
[2019-12-15 21:57:13] DEBUG policy.generation - Size of updatedNodeConfigIds is 24
[2019-12-15 21:57:13] DEBUG policy.generation - Size deepof updatedNodeConfigIds is 1423528
[2019-12-15 21:57:13] DEBUG policy.generation - Size of updatedNodeConfigs is 24
[2019-12-15 22:18:08] DEBUG policy.generation - Size deepof updatedNodeConfigs is 3504843864
[2019-12-15 22:18:08] DEBUG policy.generation - Size of updatedNodeInfo is 24
[2019-12-15 22:18:09] DEBUG policy.generation - Size deepof updatedNodeInfo is 26395416
[2019-12-15 22:18:09] DEBUG policy.generation - Size of updatedNodesId is 24
[2019-12-15 22:18:09] DEBUG policy.generation - Size deepof updatedNodesId is 804984
[2019-12-15 22:28:57] DEBUG policy.generation - Size of expectedReports is 24
[2019-12-15 22:34:29] DEBUG policy.generation - Size deepof expectedReports is 241639688

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

largest objects are
updatedNodeConfigs = 3 504 843 864
nodeConfigs = 3 474 356 896
ruleVals = 1 978 959 240
nodeContexts = 304 977 352
expectedReports = 241 639 688

ping @fanf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's in byte or bite ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in byte

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of the 2 668 075 048 within nodeConfigs, 2 582 780 816 are for the policies

@@ -357,7 +357,7 @@ final case class Policy(
// == .toList (keep order) ==> List[List[Variable]]
// == flatten (keep order) ==> List[Variable]
val expandedVars = Policy.mergeVars(policyVars.map( _.expandedVars.values).toList.flatten)
val originalVars = Policy.mergeVars(policyVars.map( _.originalVars.values).toList.flatten)
//val originalVars = Policy.mergeVars(policyVars.map( _.originalVars.values).toList.flatten)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one seems to have a lot of impact
Size deepof nodeConfigs is 2 668 075 072
Size deepof ruleVals is 376 459 952

vs
Size deepof nodeConfigs is 3 474 356 896
Size deepof ruleVals is 748 776 080

but we need to validate as there is a lot of variability in ruleval (from 748 487 288 to 2 123 635 544)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeconfig is stable at about 2 668 075 048 with this line vs 3 474 356 896 without this line

nodeContexts = nodeConfigs.map { case (_, nodeconfig) => nodeconfig.nodeContext}

_ = PolicyLogger.debug(s"Size of nodeConfigs.nodeContext is ${memoryMeter.measure(nodeContexts)}")
_ = PolicyLogger.debug(s"Size deepof nodeConfigs is ${memoryMeter.measureDeep(nodeContexts)} ")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 78 016 840 only

@@ -357,7 +357,7 @@ final case class Policy(
// == .toList (keep order) ==> List[List[Variable]]
// == flatten (keep order) ==> List[Variable]
val expandedVars = Policy.mergeVars(policyVars.map( _.expandedVars.values).toList.flatten)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm really tempted to make a def out of this one, as it is used only in two different place

@ncharles
Copy link
Member Author

ncharles commented Jan 1, 2020

The fill template is the most expensive part of the writing of policies, by a factor 3. We could certainly do better, but I can't really see how
[2020-01-01 14:54:19] DEBUG policy.generation - Templates filled in 1130297 ms
[2020-01-01 14:54:19] DEBUG policy.generation - Templates replaced in 51243 ms
[2020-01-01 14:54:19] DEBUG policy.generation - Templates wrote in 407855 ms
[2020-01-01 14:54:19] DEBUG policy.generation - Promises written in 329058 ms

(it adds up to more than Promises written, because the 3 before are multi-threaded)

@@ -76,7 +76,7 @@ final class RuleStatusReport private (
, val overrides : List[OverridenPolicy]
) extends StatusReport {
val compliance = report.compliance
val byNodes: Map[NodeId, AggregatedStatusReport] = report.reports.groupBy(_.nodeId).mapValues(AggregatedStatusReport(_))
val byNodes: Map[NodeId, AggregatedStatusReport] = report.reports.groupBy(_.nodeId).map{case (a, value) => (a, AggregatedStatusReport(value))}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this files should not have been changed, as the cost for homepage skyrockets

@@ -1226,13 +1226,13 @@ object ExecutionBatch extends Loggable {
*/
ComponentStatusReport(
expectedComponent.componentName
, ComponentValueStatusReport.merge(unexpectedReportStatus ::: pairedReportStatus).mapValues { status =>
, ComponentValueStatusReport.merge(unexpectedReportStatus ::: pairedReportStatus).map { case (key, status) =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to inspect perf cost here

val result = reports.mapValues { status =>
NodeStatusReport.filterByRules(status, ruleIds)
val result = reports.map { case (nodeId, status) =>
(nodeId, NodeStatusReport.filterByRules(status, ruleIds))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to inspect cost here also

@ncharles
Copy link
Member Author

ncharles commented Jan 7, 2020

it need much more work to detect if mapValue change was worthy or not. Will revert this commit, and focus on measuring the impact of the extra collection created to have only the nodeconfiguration of node changed, which may (or not) double memory requierement in worst case. Then we'll call it a PR

@fanf fanf added the WIP Use that label for a Work In Progress PR that must not be merged yet label Jan 7, 2020
@ncharles
Copy link
Member Author

ncharles commented Jan 9, 2020

This looks good, so I won't change anything in this PR. I'll cherry pick parts in new PRs, more unitary

@ncharles
Copy link
Member Author

ncharles commented Jan 9, 2020

Note: spikes in memory usage seems to come from

scheduledJob - Writting non-compliant-report logs beetween ids 9136561090 and 9136648677 (both incuded)

and from compute compliance

@ncharles
Copy link
Member Author

Sharp spikes in moemory usages happens:

  • in the middle of building node target configuration (1)
  • in th middle of path computed and template read (1)
  • during the writing of policies (many)
  • during caching in LDAP (1)
  • during save of expected reports (1)

i'm a bit unsure if it is caused by the policy generation, or the pressure from compliance on top of policy generation

@ncharles
Copy link
Member Author

ncharles commented Mar 2, 2020

Closing, as all has been backported in more sensible PRs

@ncharles ncharles closed this Mar 2, 2020
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qa: Can't merge WIP Use that label for a Work In Progress PR that must not be merged yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants