Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache being sequences that are 'old'. #8063

Merged
merged 1 commit into from Nov 15, 2017

Conversation

jameinel
Copy link
Member

@jameinel jameinel commented Nov 13, 2017

Description of change

It turns out that we actually have lots of ways to have old beings. We
don't want to have to read them from the database every time we do a
Sync. So instead we cache them as 'known-to-be-superseded', and we can
skip reading them from the database.

Even though we'll fix the bug that makes us leak 100s of agent pingers, we'll still always have a few extra connections for each controller agent (we currently have 4 for each agent). So it still allows Watcher.Sync() to be performed without hitting presence.beings at all (it still needs to read in the latest Pings() of course).

QA steps

In our current codebase, we have a bug where we are leaking Pingers, which causes us to start reading presence.presence.beings more than we would expect. Running this branch (without merging develop) should show that we don't actually trigger presence.beings reads. I included a test as well.

Documentation changes

Not significantly.

Bug reference

At least related to:
lp:1731745

It turns out that we actually have lots of ways to have old beings. We
don't want to have to read them from the database every time we do a
Sync. So instead we cache them as 'known-to-be-superseded', and we can
skip reading them from the database.

Note that this potentially leaves us with a gap in alive, because we
don't transition to the new seq for an entity until the previous one has
actually transitioned to dead, but I think that is true for the old code
as well.
@howbazaar
Copy link
Contributor

As far as I can tell this looks OK. My question is how this interacts in HA with each agent pruning beings. Is there a bad interaction with beings that are running in other controllers?

@jameinel
Copy link
Member Author

No. This uses the active pingers in the "presence.pings" collection to know what is safe/not safe to remove from the in-memory cache. It doesn't rely on who is pinging locally.

Tested with specifically leaky Pingers. After running overnight I had:

Y> db.presence.beings.aggregate([{$group: {_id: {$concat: [{$substr: ["$_id", 0, 4]}, "-", "$key"]}, count: {$sum: 1}}}, {$sort: {count: -1}}, {$limit: 10}])
{ "_id" : "fe59-m#1", "count" : 7479 }
{ "_id" : "fe59-m#2", "count" : 7476 }
{ "_id" : "fe59-m#0", "count" : 7476 }

And there was still no load on presence.presence.beings from mongotop.
The only load that did show up was when I did the above aggregate query and it spiked to 120ms. Presumably just doing that one pass.
That is also running 2 'while true; do juju status >/dev/null; done'.
Load on the system looks like:

                                            ns    total    read    write    2017-11-14T04:07:28Z
logs.logs.fe59a6ba-4d90-435d-8abe-a87189fbb8dd    149ms     0ms    149ms
                       presence.presence.pings    101ms     4ms     96ms
                                   juju.models     40ms    40ms      0ms
                                local.oplog.rs     38ms    38ms      0ms
                                    juju.users     32ms    32ms      0ms
                             juju.instanceData     23ms    23ms      0ms
                              juju.permissions     23ms    23ms      0ms
                                 juju.statuses     17ms    17ms      0ms
                              juju.constraints     13ms    13ms      0ms
                  juju.modelUserLastConnection     13ms     6ms      7ms

(The controller is at DEBUG which probably also accounts for the 'log' load.)

@jameinel
Copy link
Member Author

all-pingers

@jameinel
Copy link
Member Author

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Nov 15, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

@jujubot jujubot merged commit d18212e into juju:develop Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants