Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scene trending #327

Merged
merged 26 commits into from
Nov 11, 2022
Merged

Scene trending #327

merged 26 commits into from
Nov 11, 2022

Conversation

dholms
Copy link
Collaborator

@dholms dholms commented Nov 8, 2022

This adds a message queue abstraction to the database for handling side effects of created records.

It is currently used for

  • notifications
  • scene processing

For scene processing, we maintain two tables: scene_member_count & scene_votes_on_post. And we publish a new trending record when a post has atleast 2 upvotes from a scene & the ratio of upvotes is >=20% of the scene

Copy link
Collaborator

@pfrazee pfrazee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is looking great. Don't let my comments come off as negativity, this is feeling very clean and getting solid value for the amount of effort.

packages/pds/src/db/message-queue/index.ts Outdated Show resolved Hide resolved
export interface MessageQueue {
id: Generated<number>
message: string
read: 0 | 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is small and completely fine w/you ignoring it, but a separate "MessageQueueCursors" table might be better for two reasons -- it allows multiple separate processors (may not need it but who knows) and it would eliminate N numbers for N events, saving ... some ... space on disk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should still allow for multiple separate processors. we lock rows with SELECT FOR UPDATE that keep separate processors from stepping on each other's toes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking 2 processors that aren't coordinated with each other, eg maintaining their own cursors. But it's not a real concern tbh

packages/pds/src/db/message-queue/messages.ts Show resolved Hide resolved
packages/pds/src/db/message-queue/index.ts Show resolved Hide resolved
packages/pds/src/db/message-queue/index.ts Outdated Show resolved Hide resolved
@@ -42,6 +84,9 @@ export default async (sc: SeedClient) => {
await sc.vote('down', carol, sc.posts[alice][1].ref)
await sc.vote('up', carol, sc.posts[alice][2].ref)
await sc.vote('up', dan, sc.posts[alice][1].ref)
await sc.vote('up', alice, sc.posts[carol][0].ref)
await sc.vote('up', bob, sc.posts[carol][0].ref)
mq && (await mq.process())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, we want to wait till the queue is done processing before moving on

this gives us a predictable order for our snapshots

@dholms dholms marked this pull request as ready for review November 11, 2022 03:27
Copy link
Collaborator

@devinivy devinivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few thoughts, but so many things I'm digging about this. Real impressive

Approving since I think it's essentially all set, but this comment could probably use attention: https://github.com/bluesky-social/atproto/pull/327/files/908c1e829a64f6a9e7372d4aed8da6eab22b1c6a#diff-4c1f0ad0cea9ddfe9a8ab78478a07d53a598be698252db06be359db1c77eda21

packages/pds/src/db/index.ts Outdated Show resolved Hide resolved
packages/pds/src/db/index.ts Show resolved Hide resolved
packages/pds/src/db/migrations/20221021T162202001Z-init.ts Outdated Show resolved Hide resolved
packages/pds/src/db/records/repost.ts Show resolved Hide resolved
const messageQueue = new SqlMessageQueue('pds', db, (did: string) => {
return auth.verifier.loadAuthStore(keypair, [], did)
})
db.setMessageQueue(messageQueue)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you weren't pumped about this. The two ideas that come to mind are that

  • perhaps indexRecord(), deleteRecord() don't need to live on db.
  • perhaps the message queue deserves a separate db of its own.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah both of these crossed my mind as well 🤔

we may want something like RecordIndexer. This kinda has to do with that fact that the db has a lot of functionality jammed into it. probably needs to be rethought & broken up a bit

Comment on lines 63 to 65
if (this.db.dialect !== 'sqlite') {
builder = builder.forUpdate()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a skipLocked() here so that if there's contention for the cursor, one consumer gets it and the other just bails rather than wait for it to open up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we actually want to wait for it to open up. we basically ping the consumer on every event push to try & process exactly 1 event. So it should wait until the cursor is freed up & then give it a go

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, okay got it. How does keepGoing play into that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah not sure if you saw, but keepGoing actually got ripped out.

Yeah beforehand, i was throwing way to many handlers at this & they'd all just hang out at the lock with their hands open until the DB got back to them & said "sorry buddy no cursor for you"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh right on!

await this.handleMessage(dbTxn, message)
await dbTxn.db
.updateTable('message_queue_cursor')
.set({ cursor: sql`cursor + 1` })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only question I have about this is if we can assume the incrementing id is guaranteed to not leave any gaps. Otherwise I think the cursor can get stuck.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's a good point. i think so? worth a look 🧐

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't guarantee it :notlikethis:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk this should fix it
e446481

(also added an actual SQL LIMIT in the next commit)

Comment on lines 86 to 101
private async handleMessage(db: Database, message: Message) {
switch (message.type) {
case 'add_member':
return this.handleAddMember(db, message)
case 'remove_member':
return this.handleRemoveMember(db, message)
case 'add_upvote':
return this.handleAddUpvote(db, message)
case 'remove_upvote':
return this.handleRemoveUpvote(db, message)
case 'create_notification':
return this.handleCreateNotification(db, message)
case 'delete_notifications':
return this.handleDeleteNotifications(db, message)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I think it might be nice to colocate these handlers with related code (e.g.handleAddMember() living closer to member methods) and allow other areas of the app to hook-in to the message queue. Thinking of the message queue as more of a pull-based transport, rather than a vertical slice of the app. Zero problem with it as-is today, though, just thinkin!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup makes sense 👍

await this.handleMessage(dbTxn, message)
await dbTxn.db
.updateTable('message_queue_cursor')
.set({ cursor: sql`cursor + 1` })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by incrementing by one here we can cause messages to get processed multiple times if there's a gap in the ids. Pretty sure we'll want to catch the cursor up to whichever message was just processed (plus 1).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a true hero you are 🙏

Copy link
Collaborator

@devinivy devinivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@dholms dholms merged commit 962041e into main Nov 11, 2022
@dholms dholms deleted the scene-trending branch November 11, 2022 18:29
mloar pushed a commit to mloar/atproto that referenced this pull request Nov 15, 2023
* wip

* views

* trending schema

* starting message queue

* scene processor

* wip

* send mq messages from db

* db events

* undo screwing up codegen lol

* setup queue

* db migrations

* fixing up message processing

* div by 0 check

* tx issue

* queue use cursor

* update not insert

* sql bugfix + tests

* trying to linearize tests

* correclty serializing txs

* attempt update before insert

* log errors

* handle gaps in cursor

* cleanup

* oops reenable test

* correctly incr cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants