Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft RFC 001 - Bluemoji spec #1

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Draft RFC 001 - Bluemoji spec #1

wants to merge 5 commits into from

Conversation

aendra-rininsland
Copy link
Owner

@aendra-rininsland aendra-rininsland commented Aug 18, 2024

This is the initial draft of the Bluemoji RFC. Constructive feedback is very much welcome. ✨💚

Please see rfcs/0001-bluemoji.md.

@aendra-rininsland aendra-rininsland changed the title Initial work on the 0001 RFC Draft RFC 001 - Bluemoji spec Aug 18, 2024
@aendra-rininsland aendra-rininsland marked this pull request as ready for review August 18, 2024 17:00
@aendra-rininsland aendra-rininsland marked this pull request as draft August 18, 2024 17:15
@aendra-rininsland aendra-rininsland marked this pull request as ready for review August 18, 2024 17:17
@MetaflameDragon
Copy link

Really excited about this! Especially how the Bsky/AT infrastructure itself allows for such community-driven efforts and how supportive the devs are.


blue.moji.*

I'm curious, what's the reason behind making blue the top-level identifier? Are you future-proofing to add more blue record types or even let others use the same discriminator for related ideas? Why not simply bluemoji.*?


copyOf

I wanna add that bitmap comparison should be used as extra verification. The bluemoji must match its copyOf record's image to be considered valid, both to prevent false "attribution" and because it simply wouldn't make sense. I also think it would be a good idea to prefer inheritance from an emoji's existing copyOf field (so it would end up referencing the root every time), but that's an implementation detail.

I had also mentioned an idea of creating bluemoji signatures. The main benefit is that this extra attribution data would persist if the original record (what would have been in copyOf) is deleted or inaccessible. Signatures would be based on an author secret + image data/metadata, kinda like a jwt (verifiable, cannot be simply forged). Having to copy a self-signature prevents malicious attribution to someone else (you can't sign someone else's name on a malicious bluemoji, and the signature is based on image data so that has to match too). You can still right-click save and self-sign to "steal" attribution, but yeah (also mentioned below).


Both solutions suffer from the issue of unattributed copying (right-click save), but I don't think there's any true way to protect against that, as much as I'd like to let people declare bluemoji as personal/private. If you get the image pixels (or SVG shapes etc.), you can always replicate the raw data. Adding extra friction is likely pointless, since one person copying it manually would be enough to let a private bluemoji spread freely. Dunno, short of using unspecified digital ledger technology to literally hash the image data itself, or using a central authority, I think that all bluemoji should be expected to be freely copyable.

Copyright is related to this, but that would likely be dealt with the same way that uploaded images work. This is where discouraging copying would make sense ("don't copy this, it's copyrighted/private-use anyway and you'd be DMCA'd" etc.), but I don't know if it's really worth it without adding even more friction at every step and needlessly harming the experience of regular users.

@qazmlp
Copy link

qazmlp commented Aug 24, 2024

Really excited about this! Especially how the Bsky/AT infrastructure itself allows for such community-driven efforts and how supportive the devs are.

blue.moji.*

I'm curious, what's the reason behind making blue the top-level identifier? Are you future-proofing to add more blue record types or even let others use the same discriminator for related ideas? Why not simply bluemoji.*?

For what it's worth, everything else also uses also reverse domains they actually have authority over.

I assume @aendra-rininsland has registered moji.blue.

copyOf

I wanna add that bitmap comparison should be used as extra verification. The bluemoji must match its copyOf record's image to be considered valid, both to prevent false "attribution" and because it simply wouldn't make sense. I also think it would be a good idea to prefer inheritance from an emoji's existing copyOf field (so it would end up referencing the root every time), but that's an implementation detail.

I think that's a good idea and has precedent with how threads work. It definitely would make this easier to implement for AppViews, and the network of who copied from which intermediary shouldn't be recorded if it's not also clearly communicated to the user that this is done.

I had also mentioned an idea of creating bluemoji signatures. The main benefit is that this extra attribution data would persist if the original record (what would have been in copyOf) is deleted or inaccessible. Signatures would be based on an author secret + image data/metadata, kinda like a jwt (verifiable, cannot be simply forged). Having to copy a self-signature prevents malicious attribution to someone else (you can't sign someone else's name on a malicious bluemoji, and the signature is based on image data so that has to match too). You can still right-click save and self-sign to "steal" attribution, but yeah (also mentioned below).

There's potentially a legal conflict there with right-to-be-forgotten implementation.

It's not a hard one as long as the signature/attribution can be stripped without further breakage downstream, but generally speaking, putting any form of PII into someone else's repository should be done as sparingly as possible and the original author should have a convenient and easy way to break the attribution unilaterally.

@qazmlp
Copy link

qazmlp commented Aug 24, 2024

What's the rationale for the adultOnly property?

I normally would expect an AppView to always derive this information from the self-labels as needed, but maybe there's a legal quirk that I'm not aware of that this helps with.

@aendra-rininsland
Copy link
Owner Author

aendra-rininsland commented Aug 25, 2024

Thanks both @qazmlp and @MetaflameDragon for feedback —

  • Yes, @qazmlp is correct with the point about reverse DNS; I own the moji.blue domain and lexicon documents will available via the blue.moji namespace once Bluesky lands on an idiom for serving lexicons documents to clients. I also plan to run a webserver at that domain to allow for Bluemoji collection and maintenance, as well as possibly facilitate animated PNG, WebP and GIF assets later on. My goal is for the format to be usable even if moji.blue is offline, it currently doesn't rely on moji.blue resources.

  • copyOf is intended to always reference the original record, the original value persists even if it's added via another user's pack (i.e., it's inherited in later copies). It's intended to do nothing other than reference the original creator of an asset; it's not meant to validate or verify it or create any sort of usage graph.

  • I'm still not sure what the best behaviour is when the original creator deletes a Bluemoji; does it prevent further copying? Do packs check the copyOf field to check the original record still exists, and omits it from the pack if it doesn't? My feeling is that Bluemoji records need to be redundant because you don't want downstream messages to change context due to the original creator temporarily or permanently deactivating their account, which is why every user copies the record into their repo when wanting to use an emoji. That said, I'm starting to feel that it'd be a good safety feature to prevent further copying from packs if the original record is unavailable, though I'm open to feedback on that.

  • But that said, given each pack user copies the record into their own repos, if the original creator deactivated their account and someone wanted to modify their own copy of that record to remove the copyOf field, there's nothing anyone can then do to stop them from re-adding it as an original emoji to a pack. So, I suspect this implies the need for some sort of hashing or signature based off the original uploading user's DID, which then opens up PII issues as mentioned by @qazmlp.

  • Regarding malicious attribution, at the moment, copyOf doesn't contain any metadata about the original author and is simply an at-uri pointing at another Bluemoji record; if the record doesn't exist at the account's rkey, the record was either deleted or never existed in the first place. I'm not super concerned about on the record-level as a result.

  • Packs are a totally different story and I thank @MetaflameDragon for helping me recognise a fairly serious issue. I see potential vulnerabilities and trade-offs in both directions — if all previews in a pack are rendered from the parent record only, the owner of the parent record can modify it, which then changes what shows up in other people's repos. This would cause unexpected behaviour and enable malware-style attacks on packs (i.e., Alice adds Mallory's emoji to her pack because it's innocuous. Mallory then later replaces her asset on that record with a malicious version, which is now showing up in Alice's pack). On the other hand, if we load pack previews from only the records in the pack owner's repo, there's currently no way to validate that the original copyOf is the same record, which creates a false attribution risk if e.g. someone creates a pack full of malicious emoji and sets copyOf to the innocuous records of a target, making it seem like the target created those emoji. I'm currently not sure what the best solution is other than to discourage surfacing any attribution metadata to users, instead only using copyOf to disable further copying from packs.

Again, thanks for bringing this all up, I've definitely thought through packs the least so your input has been really helpful! 💚

@qazmlp
Copy link

qazmlp commented Aug 25, 2024

  • I'm still not sure what the best behaviour is when the original creator deletes a Bluemoji; does it prevent further copying? Do packs check the copyOf field to check the original record still exists, and omits it from the pack if it doesn't? My feeling is that Bluemoji records need to be redundant because you don't want downstream messages to change context due to the original creator temporarily or permanently deactivating their account, which is why every user copies the record into their repo when wanting to use an emoji. That said, I'm starting to feel that it'd be a good safety feature to prevent further copying from packs if the original record is unavailable, though I'm open to feedback on that.

It definitely shouldn't be omitted. I think it's fine to copy Bluemoji without active validation that the original PDS is still online (that might cause scaling issues if something peaks in popularity?), but validating against the AppView state seems reasonable.

Denying copies if the AppView knows the account is deactivated or that the original has been deleted is a good idea in my eyes, and that should then also apply if an AppView (recoverably) fails to initially resolve the record on demand.

What if the original has become mismatched? Should the app deny the copy and offer to copy the updated original instead?

  • But that said, given each pack user copies the record into their own repos, if the original creator deactivated their account and someone wanted to modify their own copy of that record to remove the copyOf field, there's nothing anyone can then do to stop them from re-adding it as an original emoji to a pack. So, I suspect this implies the need for some sort of hashing or signature based off the original uploading user's DID, which then opens up PII issues as mentioned by @qazmlp.

Worse, it introduces a race condition regarding which "original" is resolved first by an initialising AppView. You'd need either a record-keeping authority or a distributed ledger to solve this, which… likely does more harm than good from a technical point of view.

There'd also be the case where the first "original" is infringing or bridged in from a different network, and that shouldn't prevent the original author from uploading their identical copy.

  • Regarding malicious attribution, at the moment, copyOf doesn't contain any metadata about the original author and is simply an at-uri pointing at another Bluemoji record; if the record doesn't exist at the account's rkey, the record was either deleted or never existed in the first place. I'm not super concerned about on the record-level as a result.

It may be a good idea to add some simple guidelines on user-facing presentation for cases where the original isn't available/active and how an emoji enters and exits that state.

Should apps show only "original missing" or should they distinguish "original deactivated" and/or "original mismatched"? I assume they shouldn't show the author information in the former two cases.

@aendra-rininsland
Copy link
Owner Author

I may break this proposal into two normative parts, the record and sharing. The sharing stuff seems a bit more contentious so should probably separate it out from the implementation section.

@ngerakines
Copy link

Slack has a 128kb limit on uploaded emojis. Keeping the same limit could encourage people to use some of the Slack emoji packs that adhere to the same size limit.

Reference: https://slack.com/help/articles/206870177-Add-custom-emoji-and-aliases-to-your-workspace#custom-emoji

@aendra-rininsland
Copy link
Owner Author

@ngerakines The usecase on Slack is a bit different, Slack is centralised so a user wanting to use a custom emoji on Slack has to effectively download it once. A copy of the same emoji might be used by many different people so would need to be downloaded many times, ergo reducing the size of that payload where possible is important IMO. Tbh in general I would like to disincentivise people from using GIF as a format for emoji (it's heavier and generally looks worse compared to alternatives) but I suspect it would slow adoption to not have it.

@qazmlp
Copy link

qazmlp commented Aug 28, 2024

Apps are likely to deduplicate these on their end, I think, since they'll want to use hash-based URLs anyway in order to avoid a cache query round-trip. They might also re-encode or at least minify them.

That said, the issue remains that each user is likely to see many more distinct emoji over time than on Slack, so their size remains a larger problem than there.

@qazmlp
Copy link

qazmlp commented Aug 28, 2024

For what it's worth, the upload limit appears to be 256KB on Mastodon (which is probably on the low side for ActivityPub software) and Discord's seems to be the same. They're both less likely than ATProto apps to experience a build-up in any one place with limited resources, though.

@mihailik
Copy link

mihailik commented Sep 6, 2024

Wondering if that :colon-delimited: moniker limited to Latin alphabet is reasonable?

Every language except English will have to transliterate. Could be very annoying to people not using Latin alphabet in daily lives.

@mihailik
Copy link

mihailik commented Sep 6, 2024

Separately, the regex for :colon-delimited: format seems to allow dashes on the outside :-like-this-: and dashes only like this :---:

@qazmlp
Copy link

qazmlp commented Sep 6, 2024

Wondering if that :colon-delimited: moniker limited to Latin alphabet is reasonable?

Every language except English will have to transliterate. Could be very annoying to people not using Latin alphabet in daily lives.

Interestingly, the Japanese text emoji on misskey.io do use :romaji:, but I wonder if that isn't only a legacy thing or an expectation established by limitations of other platforms. It would definitely mess with readability of the fallback and screen reader compatibility to restrict the string to Latin characters.

For what it's worth, Windows's emoji picker is fully localised and only accepts search input according to the currently selected input locale (much to my annoyance). In terms of compatibility, neither Mastodon nor Misskey appear to specify Latin in their respective features (but I have not checked the source code!).


Personally, I think the character set limit is not a good idea because it would disadvantage speakers of languages that are hard to transliterate. Due to the use of facets, there also doesn't seem to be an interop hazard regarding Unicode normal forms here.

I think it would be a good idea to suggest implementations give the opportunity to rename an emoji in the user's own library and when copying it into there from elsewhere. The UI for showing the original should probably also show the original's (current) shortcode, then.

@mfnboer
Copy link

mfnboer commented Sep 8, 2024

First of all: really cool to see what you have created here!!

I'd drop the recommendation to use the Dotted Circle as fallback and only strongly recommend to display :alias-name: This is what applications that do not implement blue.moji would do as well.

Maybe add an optional fallback field of type string in blue.moji.collection.item This could for example be set to a regular unicode emoji. An implementor may chose to display this if there is no support for the format, e.g. lottie.

What is the original field in blue.moji.collection.item#formats_v0 for?

The cid in a facet can be used to from a https-url to get an image from the Bluesky CDN. I guess that is fine as long as Bluesky is the only atproto network. Once there are more, how would a client know that it should use the Bluesky CDN. Wouldn't it be better to place a https-url to the CDN itself in the facet. A sending client in the Bluesky network will put a a Bluesky CDN url in the facet, a sending client in network N would put the N CDN url in the facet. The receiving client just uses the url.

Why is adultOnly on blue.moji.collection.defs#itemView, but labels is on blue.moji.collection.defs#collectionView? I would expect them both on the same level.

blue.moji.pack.record#items is an array of blue.moji.collection.defs#itemView. I don't think there should be references to a view in a record. I think this record works like a list in atproto. A list record does not have an array of all items in the list. Instead the listitems point back to the list they belong to (at-uri). That way items can be add/removed to this list without the need for updating the list record itself. So I think blue.moji.pack.record#items can/should be dropped.

blue.moji.pack.listitem#subject refers to a view. I think that is wrong. I'd expect an at-uri to an blue.moji.collection.item record.

blue.moji.collection.defs#collectionView seems not to be used.

blue.moji.collection.defs#itemView can probably be dropped if my concerns on blue.moji.pack.record are valid.

@tom-sherman
Copy link

tom-sherman commented Sep 13, 2024

The cid in a facet can be used to from a https-url to get an image from the Bluesky CDN. I guess that is fine as long as Bluesky is the only atproto network. Once there are more, how would a client know that it should use the Bluesky CDN. Wouldn't it be better to place a https-url to the CDN itself in the facet. A sending client in the Bluesky network will put a a Bluesky CDN url in the facet, a sending client in network N would put the N CDN url in the facet. The receiving client just uses the url.

I've raised before that there should be room in the AT URI syntax for CIDs, this would solve this problem quite elegantly but for now you could use a strong ref type that includes a URI and a CID.

See thread


Regarding sharing emoji: We're thinking about building Communities into Frontpage and I think custom emoji would form a part of this. We're looking to support adding emoji to a community and attributing emoji to a community. This is a slightly weaker definition of attribution because it's only there as a way to signify that you part of a community when posting somewhere else - it's a badge on a tote bag.

All this to say that I think the ability to use someone else's emoji in your record is a really important and powerful thing.

@qazmlp
Copy link

qazmlp commented Sep 13, 2024

Regarding sharing emoji: We're thinking about building Communities into Frontpage and I think custom emoji would form a part of this. We're looking to support adding emoji to a community and attributing emoji to a community. This is a slightly weaker definition of attribution because it's only there as a way to signify that you part of a community when posting somewhere else - it's a badge on a tote bag.

All this to say that I think the ability to use someone else's emoji in your record is a really important and powerful thing.

You may want to look at very low-friction copying from communities instead, I think.

Discord gets around the "controlled by someone else" problem by using unique IDs internally I believe, so if an emoji is deleted by a community/server, that shouldn't automatically make it unavailable in past messages there if I'm not mistaken. (I haven't tested this.)

It would be a problem if Bluemoji didn't have an equivalent property.


Would communities be ATProto profiles? If so the Bluemoji could be created by/attributed to that community profile and its bio could link to the community as seen through Frontpage under the existing schema, I believe.

@aendra-rininsland
Copy link
Owner Author

Wondering if that :colon-delimited: moniker limited to Latin alphabet is reasonable?

Every language except English will have to transliterate. Could be very annoying to people not using Latin alphabet in daily lives.

I tried to write a better regex for this:

(?<=:)((?:[^-\s]+-?)+)(?<!-)(?=:)

...But then I realised rkeys are restricted to latin characters, alas. See:

https://atproto.com/specs/record-key

@aendra-rininsland
Copy link
Owner Author

aendra-rininsland commented Dec 7, 2024

Separately, the regex for :colon-delimited: format seems to allow dashes on the outside :-like-this-: and dashes only like this :---:

Is this any better?

`^\b(?<!-)[a-z0-9-]+(?!<-)\b$` and does not include `:` characters; when

@aendra-rininsland
Copy link
Owner Author

I have updated the RFC and created a second RFC for sharing.

Please provide feedback on the most recent diff. My goal is to get the base implementation solid and add sharing secondly.

Copy link

@Tamschi Tamschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review for e7496bb..43fa031.

wrapped in `:` character (e.g., `:alias-name:`), hereafter referred to as a
"colon-wrapped alias" or simply "dotted-alias".
to render the Bluemoji. The record name itself must match
`^\b(?<!-)[a-z0-9-]+(?!<-)\b$` and does not include `:` characters; when
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which RegExp syntax is this? In JS this should probably be ^(?!-)[a-z0-9-]+(?<!-)$ for negative lookahead and lookbehind -, respectively. \b isn't needed at the start or end of the string.

approach that may become popular as more clients begin to support Bluemoji in
the future, especially given that the facet contains more accessibility
information than the colon-wrapped alias provides.
there's no way of enforcing that given how ATProto facets work.
Copy link

@Tamschi Tamschi Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the non-descriptive fallback from previous revisions here seems good to me 👍

The full fallback may be a bit annoying with the low character limit in the protocol, but that's app-specific to Bluesky, so I agree it's best not to suggest a less useful alternative in the general case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants