Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blue Sky's composer should gather social graph data for PDFs and other non-HTML content too #1672

Open
mlissner opened this issue Oct 11, 2023 · 2 comments
Labels
bug Something isn't working x:on-the-roadmap We're planning to do this but it may be a bit

Comments

@mlissner
Copy link

Right now, if you put a link ending in .html into the composer (on Web), and ask the website to generate a card, you can watch the network panel make a request to https://cardyb.bsky.app/v1/extract.

For example, this URL makes a request in the network panel when you ask to make a card:

https://foo.com/foo.html

But these URLs, ending in .jpeg, .png, .pdf and .xml do not:

https://foo.com/foo.jpeg

https://foo.com/foo.png

https://foo.com/foo.xml

https://foo.com/foo.pdf

I understand the reasoning: In theory, those file endings are telling Blue Sky that they will not have Social Graph information, since that information only exists in HTML content.

That theory is correct, but at the website I run, we share millions of PDFs, and we have a neat hack in place to help fight misinformation and provide better details to our users. When we detect an open graph crawler, we redirect the crawler to an HTML page with open graph data (if it's not a crawler, we serve the PDF). I know that DocumentCloud also uses this trick.

This works on Twitter, Facebook, Slack, Mastodon and a bunch of other sites. As far as I know, Blue Sky is the only one where it doesn't work.

To Reproduce

  1. Paste this link into the web composer: https://foo.com/f.html

  2. Open the browser's network panel.

  3. Press the button in the composer to get the card.

  4. Note that it returns an error (the link doesn't work), and that you see a request in the network panel:

    image

  5. Change the URL to https://foo.com/f.pdf

  6. Press the button in the composer to get the card.

  7. Note that it made no requests and throws no error.

Expected behavior

Blue Sky should go to the URL, regardless of the file ending, and test if it's actually HTML or a PDF. Heck, some horribly misconfigured website might end links with .pdf even when serving HTML. :)

Details

  • Platform: Firefox on Linux
  • Platform version: 118

Additional context

This bug is a bit of a bummer because one of the things that drove me to Blue Sky is that Twitter removed headlines. This bug means that the links from my website don't have twitter cards or headlines either. Darn!

I took a look around the code, but couldn't find where this is done. If somebody sends a pointer, we've got technical folks and volunteers that could help with this.

@mlissner mlissner added the bug Something isn't working label Oct 11, 2023
@mlissner
Copy link
Author

Well, OK, there's a workaround if you're the one posting the link, but it's still broken for everybody that's not this clever.

You can substitute %2e instead of the last period in your link. This works:

https://storage.courtlistener.com/recap/gov.uscourts.flsd.648654/gov.uscourts.flsd.648654.3.0%2epdf

Not, um, exactly, great, but it's something!

@pfrazee pfrazee added the x:on-the-roadmap We're planning to do this but it may be a bit label Oct 25, 2023
@mlissner
Copy link
Author

mlissner commented Feb 9, 2024

One other thought here. It isn't part of open graph, but I've always thought it would be nice to serve open graph data via headers. In fact, I think Facebook must have gotten distracted while building the spec, and just didn't get around to this.

If BlueSky supported this one day, it'd make it possible to return detailed information and thumbnails when serving binary content.

(I've been banging this drum for a decade or so.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working x:on-the-roadmap We're planning to do this but it may be a bit
Projects
None yet
Development

No branches or pull requests

2 participants