Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #7735: OOM in Rudder when there are too many repaired reports #1084

Conversation

fanf
Copy link
Member

@fanf fanf commented May 3, 2016

https://www.rudder-project.org/redmine/issues/7735

For information, with the 3 patches applied (+https://www.rudder-project.org/redmine/issues/8195), with > 1.2 Millions changes in the last 3 days (with one 6h interval at 1M changes), we are able to start a Rudder with -Xmx600m, have the non-compliant report be iteratively written, login, go to rule details, see graphes, click and see details on all interval, and everything is fast - with a top heap consomption at 290MB.

So, there is still some cleaning to do, but the general architecture of the patch is ok. It is in 3 parts, I'm letting them in three commits to make them better understandable. Perhaps it could be split in 3 tickets, to, but they are highly coupled (at least the last two), so not sure.

1/ Today, we reload compliance each time we switch from the Rule edit form tab to the compliance one. This is useless since we have nice "reload" button on compliance tab, it is an UX strangeness because it causes delay for switch, and don't let the user choose when data should change (difficult to understand, or crazy when you were trying to analized something and clicked on the tab by mistake), and it leads to OOM when there is a lot of reports (because even if we succeeded in calculating one time, the second one, we have much less memory, since a big part is used).

2/ That patches allows to only load the events details (display on the bottom grid) for the last interval in rule details (and load details of other on click on the corresponding graph interval). That also means that much less data are transfered to the browser, making user happier (end rule details faster to display). More over, a hard limit of max 10 000 event for a rule, for an interval, is introduced. It is not a good solution, but the correct one (pagination on each interval) is much harder to implement, with higher risks in a released version.

3/ With the third patch, the event cache only keep a COUNT of the number of events by rule by interval, which consume much less memory than keeping all events (of course). Also rewrote some other methods to consum less memory overall.

@@ -478,6 +515,16 @@ class ReportsJdbcRepository(jdbcTemplate : JdbcTemplate) extends ReportsReposito
}
}

final case class CoundChangesMapper(intMapper: Int => Interval) extends RowMapper[(RuleId, Int, Interval)] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be CountChangesMapper

@VinceMacBuche
Copy link
Member

First commit is ok ! and is the easiest one, now let's take a look at the big ones ...

@fanf
Copy link
Member Author

fanf commented May 4, 2016

PR updated

@fanf fanf force-pushed the bug_7735/oom_in_rudder_when_there_are_too_many_repaired_reports branch from 8357999 to 190a719 Compare May 4, 2016 14:37
* Get the changes for the given interval, without using the cache.
*/
override def getChangesForInterval(ruleId: RuleId, interval: Interval, limit: Option[Int]): Box[Seq[ResultRepairedReport]] = {
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just 'changeService.getChangesForInterval(ruleId, interval, limit)' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, certainly there was some other comands at some point of the dev, that were removed in the end (but not the for/yield structure).

@VinceMacBuche
Copy link
Member

Apart the two minor fixes it's ok and can be merged (once it's one commit! )

@fanf
Copy link
Member Author

fanf commented May 10, 2016

PR updated

@fanf fanf force-pushed the bug_7735/oom_in_rudder_when_there_are_too_many_repaired_reports branch from 190a719 to f4db618 Compare May 10, 2016 16:36
@VinceMacBuche VinceMacBuche merged commit 191aea0 into Normation:branches/rudder/3.0 May 10, 2016
@fanf fanf deleted the bug_7735/oom_in_rudder_when_there_are_too_many_repaired_reports branch March 15, 2024 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants