Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse content again for ActivityPub #12573

Closed
julianlam opened this issue May 14, 2024 · 4 comments
Closed

Parse content again for ActivityPub #12573

julianlam opened this issue May 14, 2024 · 4 comments
Assignees
Milestone

Comments

@julianlam
Copy link
Member

The ActivityStreams Vocabulary defines content thusly:

The content or textual representation of the Object encoded as a JSON string. By default, the value of content is HTML. The mediaType property can be used in the object to indicate a different content type.

The leads to various implementors just sending straight HTML through; the same HTML they'd use to render on their frontend UI. We also do this currently, but it leads to some issues experienced:

  • Mastodon sends superfluous tags, attributes, and classes that have no meaning in NodeBB (e.g. mention, wrapping URLs in tags, etc.)
  • NodeBB link previews will send the full HTML if a preview is available, but this renders poorly on Mastodon (most of the tags get stripped out)

Discourse was sending a severely stripped down variant, but has since updated their parsing to allow some basic tags. It would be wise to follow suit at least for now:

https://socialhub.activitypub.rocks/t/content-formatting-when-federating-out/4150

@julianlam julianlam self-assigned this May 14, 2024
@julianlam
Copy link
Member Author

julianlam commented May 14, 2024

The existing filter:parse.post hook can still be used, and some additional options passed in.

A good starting point might be

{
  type: enum [default, plaintext, activitypub.note, activitypub.article]
}

Whereby;

  • if type is default or falsy, then regular post parsing is carried out (so no change for existing plugins)
  • if type is activitypub.note, then a severely stripped down version is sent (minimal allowed tags, most plugins would opt out of parsing, inline images removed)
  • if type is activitypub.article, then a less stripped down version is sent (inline images, minimal allowed tags, some plugins opt out of parsing)
  • type.plaintext is added for future functionality but is outside the scope of this issue at this time

@julianlam
Copy link
Member Author

8cf9617

@julianlam julianlam added this to the 4.0.0 milestone May 17, 2024
@julianlam julianlam reopened this May 23, 2024
@julianlam
Copy link
Member Author

The above commit allows plugins to alter behaviour based on the passed-in flag, but stricter sanitization is required at the end as well.

julianlam added a commit that referenced this issue May 23, 2024
@julianlam
Copy link
Member Author

An additional sanitization step was added on the way in, so that useless classes are removed prior to being saved in database. This only affects content but not sourceContent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

1 participant