-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce frequency of lastoriginupdate
db writes
#702
base: main
Are you sure you want to change the base?
Conversation
Output of https://wiki.postgresql.org/wiki/Lock_Monitoring#.D0.A1ombination_of_blocked_and_blocking_activity on kbin.run before applying this patch: |
What do you think about instead of updating with every announce (which includes like and is probably the biggest problem) to only update with announced create activities? |
Even if this was moved to comments vs likes I think we still don't need every single update always as the check is >24h, it seems like we shouldn't landrush the db just like we don't for apfetchedat |
|
When your queues fall behind and you try to catch up, every single event locks the magazine row to update this. It most likely causes deadlocks as debounced has shown (though the root cause might be unrelated, it still showed just how often it's required). You just don't need this value to be accurate to the nanosecond of every event. I'm willing to bring it down to a minute if that makes you feel better, but I don't think there should be no limit whatsoever, causing this performance hit on the DB Like, imagine you're 5k activities behind on |
lastoriginupdate
db writes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we should not be hitting the db everytime considering what this "feature" is doing. 1 minute resolution is plenty (and probably still overkill).
@BentiGorlich want to make sure this seems acceptable to you after the change |
I've been thinking about it and came to the conclusion that a ~3h timespan would be a good value. Atm this galue is only used to display magazine stalenes and only if no one is subscribed to the magazine... However I think we should still limit it to create activities, because even if it is just a redis hit when getting all the likes, it still involves latency. If the redis server is located on a differenf machine (who knows) that latency is super noticable... Let me know what you think. I'd accept it the way it is now as well as it is obviously an improvement |
i can reapprove if this is better, that is ~3h timespan @e-five256 |
I'll work on making the changes suggested |
change threshold for updates to 3 hours old
definitely super look over me here as this is outside my wheelhouse, kind of guessing based on grabbing payloads from past logs shared in matrix, as obviously this is difficult to test locally. If you meant announces of creates and this should go inside dewrapping announces, let me know. |
} elseif ('Create' === $payload['type']) { | ||
$actor = $this->manager->findActorOrCreate($payload['actor']); | ||
if ($actor instanceof Magazine && $actor->lastOriginUpdate < (new \DateTime())->modify('-3 hours')) { | ||
$actor->lastOriginUpdate = new \DateTime(); | ||
$this->entityManager->persist($actor); | ||
$this->entityManager->flush(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What actually needs checking is the announced type, so payloads like this will trigger the code:
{
"type": "Announce",
"object": {
"type": "Create",
...
},
...
}
So I'd say move it back to where it was before and change the condition to:
if ('Announce' === $payload['type']) {
if (\is_array($payload['object'])) {
if ('Create' === $payload['object']['type']) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would've thought that means it'd only update on comments that come in from external instances, but maybe I'm misunderstanding AP here. I assumed create = comes from owner instance, announce = comes from another instance, and owner is announcing it happened. Or maybe owner instances both create and announce create their own posts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a create from a magazine and an announced create from a magazine should refresh the field. Until now the plain create did not do that, but it should, you're right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that might actually explain some things I had seen. Looking at this one magazine, it says "This magazine is not receiving updates (last activity 29 day(s) ago). " but has posts from today, yesterday etc, continuing to get posts, but they're all posts from that instance the magazine is from
So if I understand correctly, I need both sections, what I have here which updates from the origin's creates, and then what you suggested which updates when the origin announces creates as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well you alsp just get the creates from the user directly which would not count towards origin uodates.
And yes both sections would be good indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this conversation I forgot the point I made in my latest comment. Sorry for the confusion on my part
I think I've handled the situations as described now where it will update the value when a create or an announced create after 3 hours |
@@ -113,6 +117,14 @@ private function handle(?array $payload) | |||
} | |||
} | |||
} | |||
} elseif ('Create' === $payload['type']) { | |||
$actor = $this->manager->findActorOrCreate($payload['actor']); | |||
if ($actor instanceof Magazine && $actor->lastOriginUpdate < (new \DateTime())->modify('-3 hours')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work, because the actor of a Create
activity is never a magazine, but always a user. A magazine only announces what a user creates. If you get a plain Create
activity you would have to check from which instance the user is from and then get the magazine the activity is posted to and then you can adjust the origin update.
However I think I removed that part (I initially had something that did the same) because you cannot be sure that you get create activities because you are subscribed to a magazine, because you will get create activities from users you are subscribed to, regardless of whether you're subscribed to the magazine they post it to. Announce
activities on the other hand are only sent to people subscribed to magazines
|
||
if ('Create' === $payload['type'] && $actor instanceof Magazine && $actor->lastOriginUpdate < (new \DateTime())->modify('-3 hours')) { | ||
// announced creates update magazine last origin timestamp when after threshold | ||
$actor->lastOriginUpdate = new \DateTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rename this variable? I was confused whether this works, because this is not the actor of the $payload
variable at this position.
$actor->lastOriginUpdate = new \DateTime(); | |
$announceActor->lastOriginUpdate = new \DateTime(); |
I don't think I really have the expertise to make this change, as I'm basically taking a shot in the dark each time since I'm not familiar with AP and don't have a cluster to test what real events look like. Not sure if this could be tackled by someone else or just closed. I think assuming @nobodyatroot isn't running it anymore since that one time since he's been running the tag PR since, maybe this isn't a problem that needs to be handled and the problem at the time was the foreign key constraint violation. I'm just very, super overly optimized where I come from (even two DB calls in the same request are against my best designs, I'm like, forcing people to use refcursors if they need multiple data sets from a DB), whereas Mbin seems to be more like "haha, postgresql connections go brrrrrrrr" |
I am happy to take this on. I am more on the side of "optimize where it is doable with reasonable effort". I agree that this should be optimized and I think I have the best understanding of the AP side in our team. And as my vacation "season" is over for now I should have a lot more time |
potentially limit the amount of times we update magazine entries to only update if it has been an hour since the last update, as before it was updating every single event