Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster mention notifications #39

Closed
aaronpk opened this issue Dec 29, 2014 · 20 comments
Closed

Cluster mention notifications #39

aaronpk opened this issue Dec 29, 2014 · 20 comments

Comments

@aaronpk
Copy link
Owner

aaronpk commented Dec 29, 2014

When multiple webmentions are received in short succession for the same target, it should result in a single notification sent to IRC rather than one for each target.

Something like "... commented on a post that linked to _____, _____ and ____"

@snarfed
Copy link
Contributor

snarfed commented Jun 9, 2015

low priority: @aaronpk mind clarifying if this is only across sources, or across targets, or both? i'm guessing both but we're not sure. background in IRC; cc @tantek.

@snarfed
Copy link
Contributor

snarfed commented Jul 21, 2015

i'm itching for this. @aaronpk do you have a specific design in mind? how about this straw man:

  1. On incoming wm for a target, store in two in-memory clusters, one for source and one for target and start a 60s timer.
  2. Add new wms to those clusters as they arrive.
  3. When the timer expires, if the last wm in the cluster arrived in the last 10s, extend the timer another 60s, up to 5m total.
  4. Otherwise, pick the cluster (source vs target) with the most notifs, send a single notif for the cluster, and remove the wms from the other clusters they're in (source vs target) so we don't send dupe notifs.

i made up those durations, i'm open to changing them.

rendering the notif text would probably stop and ellipsize after three sources. we'd probably also omit the multiple-side links (ie sources for a multiple-source-to-single-target wm, and vice versa) for bridgy wms, and maybe for non-bridgy too.

thoughts?

@snarfed
Copy link
Contributor

snarfed commented Jul 21, 2015

one minor drawback: pending clusters aren't persisted, so they're lost on restarts, reboots, etc. i expect we don't care though, since they're just irc notifs. :P

@aaronpk
Copy link
Owner Author

aaronpk commented Jul 21, 2015

I'll have to read through that logic a few more times. But I would probably use Redis to buffer the notifications which has the nice property of persisting anyway!

@snarfed
Copy link
Contributor

snarfed commented Jul 21, 2015

ooh true, nice!

@snarfed
Copy link
Contributor

snarfed commented Jul 26, 2015

@aaronpk
Copy link
Owner Author

aaronpk commented Jul 26, 2015

a couple clarifications from IRC:

  • the timer is set for for source/target pair, so there would be multiple timers running
  • when a timer expires, there may be clusters that have overlapping notification info, so sort the clusters based on the number of URLs in each when starting to generate notifications

@aaronpk
Copy link
Owner Author

aaronpk commented Jul 26, 2015

Example of receiving two webmentions to the same target in close succession:

  • t0: webmention a -> z is received
    • create cluster (a: a -> z) and (z: a -> z)
    • start a timer called a -> z for 60 seconds
  • t5: webmention b -> z is received
    • check existing clusters to see if any contain "b" or "z"
    • add b -> z to cluster z, so now is (z: a -> z, b -> z)
    • (no timer is set because this is being added to an existing cluster)
  • t60: timer a -> z expires
    • sort the "a" and "z" sets by number of members descending
    • take the first set, and generate notification text "a and b linked to z"
    • for each member in the pairs of set z,
    • remove the member from any other set (removes a -> z from the "a" set)
    • remove empty sets

@snarfed
Copy link
Contributor

snarfed commented Jul 26, 2015

thanks! good to have a step by step example.

a couple notes:

  • i think we want to add b->z to both b and z clusters even if the z cluster already exists, since more wms could arrive later with b as source and make it the bigger one.
  • after notifying, we delete the whole cluster we sent, but we only delete the corresponding wms from the other clusters, since they could have other unrelated wms.
  • i think we want separate clusters for sources vs targets, ie the keys would be (SOURCE | TARGET, url), since i don't think we're trying to design notif text for a url that's both a source and target. (you were probably already assuming this.)

there's also the 5m timer extension thing i mentioned, which i think will be useful for floods, but it's definitely just a nice to have, not required.

thanks again!

@aaronpk
Copy link
Owner Author

aaronpk commented Jul 30, 2015

Yep you're right, need to add a cluster for b->z as well. Agreed about the 5-min timer extension thing, I was just walking through a simple example. Here is the same one as above revised with the new cluster:

Example of receiving two webmentions to the same target in close succession:

  • t0: webmention a -> z is received
    • check existing clusters to see if any contain "a" or "z"
    • create cluster (a: a -> z) and (z: a -> z)
    • start a timer called a -> z for 60 seconds
  • t5: webmention b -> z is received
    • check existing clusters to see if any contain "b" or "z"
    • add b -> z to cluster z, so now is (z: a -> z, b -> z)
    • create cluster (b: b -> z)
    • set a timer called b -> z for 60 seconds
  • t60: timer a -> z expires
    • sort the "a" and "z" sets by number of members descending
    • take the first set, and generate notification text "a and b linked to z"
    • for each member in the pairs of set z,
    • remove the member from any other set (removes a -> z from the "a" set, and b -> z from the "b" set)
    • remove empty sets (removes the "a" and "b" sets)
  • t65: timer b -> z expires
    • there are no sets for "b" so nothing happens

@snarfed
Copy link
Contributor

snarfed commented Jul 30, 2015

nice! just a couple notes that i suspect are already in your plan and just may not have gotten updated in the example text:

  • separate cluster keys for source X vs target X, as described above
  • at t60, only sort and examine the a and z sets, not b

i also realized that the 5m extension may be a bit harder than i thought, since you have to extend all of the timers for the cluster. that may be straightforward, but i'm not sure. i think it mostly depends on your timestamp implementation details, so i'm going to ignore it for now.

aaronpk added a commit that referenced this issue Aug 2, 2015
for #39

* right now does nothing with getting the contents of the URLs, the notification text is just the URL
* buffers mentions for 60 seconds without extending the timers
* should handle 1->1, 1->many, many->1, and many->many notifications right now
* needs to be modified to actually factor in the contents of the mention URLs, to generate text like "x and y liked a post"
@aaronpk
Copy link
Owner Author

aaronpk commented Aug 2, 2015

Making progress! Currently:

  • buffers mentions for 60 seconds without extending the timers
  • should handle 1->1, 1->many, many->1, and many->many notifications right now
  • right now does nothing with getting the contents of the URLs, the notification text is just the URL
  • needs to be modified to actually factor in the contents of the mention URLs, to generate text like "x and y liked a post"

The Redis implementation of this means the logic is slightly easier than described above, since Redis creates things on the fly and doesn't fail when things don't exist.

@aaronpk
Copy link
Owner Author

aaronpk commented Aug 2, 2015

I'm imagining the common case of someone sending out a hundred invites and bridgy floods the channel. What ends up happening is in the process of those webmentions being sent, someone inevitably RSVPs to the event while the invites are still going out. I'm thinking the notifications we see should be things like "30 people were invited to Y" and "X RSVPd to Y", which means they should also be clustered by type. I think adding the type of mention as a component of the name of the clusters should solve this.

@snarfed
Copy link
Contributor

snarfed commented Aug 2, 2015

makes sense. this all sounds great!

@aaronpk
Copy link
Owner Author

aaronpk commented Aug 2, 2015

I realized we need to design the notification text before I can get much further with this. So below is a collection of the possible notification text this should generate. This is designed as plaintext first, since it will be sent mainly to plaintext channels such as IRC or push notification. In all cases, a URL will be appended that takes people to a URL for the notification on webmention.io which has this notification text as well as the full list of webmentions that were clustered for this notification.

For the plaintext representation, displaying the author name is not particularly useful, since people often put random junk into the "name" field. I think a compromise would be to show the author as "Author Name (url)" in plaintext form.

Like / Repost

  • article: {author} liked "{Article name}" {target url}
  • note: {author} liked "{Target post text truncated...}" {target url}
  • photo with caption: {author} liked your photo "{Target post text truncated...}" {target url}
  • photo no caption: {author} liked your photo {target url}
  • {author}, {author}, and 4 others liked (your photo/video) "{Target post text truncated...}" {target url}
  • {author} liked a post that linked to "{article name}" {target url}

RSVP (yes / no / maybe)

  • {author} RSVPd yes to {target event name} {target url}
  • {author} and {author} RSVPd no to {target event name} {target url}
  • {author}, {author} and 4 others RSVPd yes to {target event name} {target url}

Invite

  • {invitee} was invited to {target event name} {target url}
  • {invitee} and {invitee} were invited to {target event name} {target url}
  • {invitee}, {invitee} and 4 others were invited to {target event name} {target url}

Generic Mention

  • {author} linked to {target url}: "Source post text truncated..." {source url}
  • {author} and {author} linked to {target url}
  • {author} posted "Source post text truncated..." {source url} that linked to {target url}, {target url} and {target url}

Reply

(not clustered)

  • {author} commented on {target url}: "Source post text truncated..." {source url}

@aaronpk
Copy link
Owner Author

aaronpk commented Aug 2, 2015

GWG pointed out that you probably want immediate notification of comments, so I'm going to not cluster comments at all.

@snarfed
Copy link
Contributor

snarfed commented Aug 2, 2015

thanks for writing this up! getting the text itself right is surprisingly difficult; I'm glad you're doing it so deliberately here, and unifying it across wmio and your site, so that we'll have a(nother) solid reference design and implementation.

my one big piece of feedback is to consider linking directly to the target (or source) instead of a wmio landing page. i expect that's what users mostly want anyway. it wouldn't work for many-many, but it looks like you haven't designed the text for that...and honestly i think we maybe shouldn't try. send like a fair amount of both UX and implementation complexity for probably a very small incremental UX win.

also +1 to using author name, as discussed recently. i know it's not guaranteed good, but i think the majority of at least irc notifs right now are fb or twitter via bridgy, which have usually decent names and often opaque urls. NAME (URL) is a good compromise imho.

so excited! thanks so much for doing this!

@aaronpk
Copy link
Owner Author

aaronpk commented Aug 2, 2015

I'm planning on always having the IRC notification include a link to the webmention.io page. I agree that having a link to the target URL is good in most cases, so I updated the above to include it. Also, for comments and things that have meaningful permalinks of their own (i.e. not likes and reposts) I want to include the real source URL in the notification text as well. You can see this in the above text where I include {source url} in the notification.

@snarfed
Copy link
Contributor

snarfed commented Aug 2, 2015

discussed in irc; we think we're roughly on the same page.

@aaronpk
Copy link
Owner Author

aaronpk commented Aug 7, 2015

This is going live today! I'm going to close this issue since the basic functionality of clustering is done. Feel free to open new issues as problems surface!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants