Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve embeds and fix Firefox problems with the scroll position on media preview #689

Closed
wants to merge 4 commits into from

Conversation

BentiGorlich
Copy link
Member

@BentiGorlich BentiGorlich commented Apr 8, 2024

When using firefox and automatic media preview was enabled the scroll position was never reliably restored. So when you would browse the entry list and then open an entry and then navigate back you would most likely end up on the bottom of the page, because our links did not have an id and the previews were expanded after the site was loaded.

So what this PR does:

  • add ids to the list links of entries and posts
  • stop the automatic toggling of the button and instead actually render the page with the preview
  • fetching the embed urls on entry creation already happens, the result was saved to cache, that is now saved to the db so we do not fetch the same url twice

add an id to the `a` element of an entry in a list. That leads to firefox being able to jump to the correct position on the site when going back. That is mostly a problem when auto media preview is enabled
- when an entry is created and the url is fetched for any embedded stuff, it previously got discarded. Save it in the db now
- add additional fields to save title, description, image and html returned from the fetched embed
@BentiGorlich BentiGorlich added enhancement New feature or request backend Backend related issues and pull requests labels Apr 8, 2024
@BentiGorlich BentiGorlich added this to the v1.6.0 milestone Apr 8, 2024
@BentiGorlich BentiGorlich self-assigned this Apr 8, 2024
@asdfzdfj
Copy link
Contributor

asdfzdfj commented Apr 9, 2024

don't want to be a stick in the mud again but I'm not too on board with storing fetched embed data (especially html) in the db:

  • depending on the linked resource and how the embed library extracts the data, the link may be pointed to a resource that can be updated after it was first made/post/fetched, and so the extracted embed body can change accordingly, but by storing the embed in the db on first fetch and never fetch it again you could miss the later updates if it was made (remember, this thing can do more than wrapping image link in an <img> tag). sure you could add some kind of tracking and made it refetch once in a while but that means more code complications
  • the first fetch is not exactly tossed away, it's cached per url, though if you feel like 1h cache ttl is the same as "discarded after fetch" then maybe it's better to raise this up a bit, maybe like 3h or 6h to extend the lifetime for the first embed fetch content, reduce amount of refetch while still allow for updated embed to be fetched if the source link contents has been updated
  • this is more of a value judgement but given the nature of these fetched embed data and how it's used, it's the kind of data that's more meaningful on the frontend/users side, and very little uses for the backend, do you really need to store these embed html for effectively eternity although the bulk of its uses is more likely to be within the first week after it was posted? yes it won't grow as fast as the cached images but I'm afraid it'll go the same way our (remote) image handling is, actually this may be worse because it's almost always remote data here

also, if you still want to store the fetched embed html in the db anyway, It'd be great if you could split that changes into a new PR: with the first problem you identified and wanted to fix, not only this change does not answer that problem, this is a different problem perceived with potentially its own solutions altogether, tacking them into the same PR/unit of changes is ... not good

if I wanted to be even more pedantic, the "add ids to links so firefox could come back" and "immediately render embed in twig when auto preview is enabled" could be split into 2 more patches since it answer a slightly different problems, but I can let it slide with how the current problem could be described as a combination of those two problems together, like:

in firefox, page position cannot be restored reliably when navigating back and forth due to missing ids, which is exacerbated by auto media preview expanding preview live which cause even more layout shifts

if this is the thing you're trying to fix, it's quite hard to see that how storing embed in db help solving that problem

@BentiGorlich
Copy link
Member Author

Yeah I always put too much stuff into one PR your right. Mostly its related by topic or by problems created by doing something in a new way, but yeah I can see that it should have been 2-3 PRs.

Regarding the storing: for images I totally get it, they can change (although I doubt they change that often but that is only my gut feeling)
The problems with putting them in the cache are

  • the cache is stored in RAM, we are already a very RAM heavy service, the longer ttl on embeds would grow that even more

The biggest problem with a ttl on an embed are because of the initial rendering with it:
The page can take forever to load, because it is a serial operation and the server needs to fetch all embeds now before the user sees anything, thats why I did the changes in the first place.

Maybe I can just do the link id stuff and see how this is behaving. If that already fixes most of my problem when browsing on my phone (where I have auto preview enabled) then the other changes are not needed.

@BentiGorlich
Copy link
Member Author

I think the main problem is that we cannot say how often an embed changes. Html iFrame element probably never change, images probably change only very rarely, I'd say the url on a post changes when the image changes (mostly)

Comment on lines +5 to +14
namespace App\Controller;

use App\Repository\EmbedRepository;
use App\Utils\Embed;
use Psr\Log\LoggerInterface;
use Symfony\UX\TwigComponent\Attribute\AsTwigComponent;
use Symfony\UX\TwigComponent\Attribute\PostMount;

#[AsTwigComponent('embed', template: 'components/embed.html.twig')]
class EmbedController extends AbstractController
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a twig component though? it should go in App\Twig\Components (src/Twig/Components/EmbedComponent.php) and unless you somehow also need this to act as a controller it shouldn't extend the AbstractController and be named appropiately

Suggested change
namespace App\Controller;
use App\Repository\EmbedRepository;
use App\Utils\Embed;
use Psr\Log\LoggerInterface;
use Symfony\UX\TwigComponent\Attribute\AsTwigComponent;
use Symfony\UX\TwigComponent\Attribute\PostMount;
#[AsTwigComponent('embed', template: 'components/embed.html.twig')]
class EmbedController extends AbstractController
namespace App\Twig\Components;
use App\Repository\EmbedRepository;
use App\Utils\Embed;
use Psr\Log\LoggerInterface;
use Symfony\UX\TwigComponent\Attribute\AsTwigComponent;
use Symfony\UX\TwigComponent\Attribute\PostMount;
#[AsTwigComponent('embed', template: 'components/embed.html.twig')]
class EmbedComponent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right 👍

@asdfzdfj
Copy link
Contributor

asdfzdfj commented Apr 9, 2024

Regarding the storing: for images I totally get it, they can change (although I doubt they change that often but that is only my gut feeling)

the main reason I brought up the image storage is less of that they may change, but rather the need/desire to store caches of almost all remote contents for basically forever, and right now we sort of ended up in a situation where we can't clear the remote image caches out cleanly because many parts has assumed that the backing image will always exists

also, even when the image content change, that should results in a different image content hashes and our image system seem to be handling that case fairly well


I think the main problem is that we cannot say how often an embed changes. Html iFrame element probably never change, images probably change only very rarely, I'd say the url on a post changes when the image changes (mostly)

agree on this, however with the current way embed are fetched and used, there's no similar guarantees or a good way to tell differences between what current remote content and our cached copy is, or if there's any

you could say that I'm arguing for one specific edge case here, but I feel like this one edge case is also where the difference is potentially way more obvious to the users when it happens:

enter github gists, it looks like these can be updated after first publish to a certain extent, you may see something like "last updated ago", along with the revision list showing the changes when viewing some gists

and while the "github recommended" way of getting embed gist is to use <script> tag it gives out that'll bootstrap the embed for you, they also have json endpoint where it could give out an already formatted html string and corresponding stylesheet to render your own embed gist element, and here's the kicker:

and hence my concern about storing the embed html into db that's fetched once and never again, because it assumes that the content at one url never changes although that may not be the case


The problems with putting them in the cache are

  • the cache is stored in RAM, we are already a very RAM heavy service, the longer ttl on embeds would grow that even more

The biggest problem with a ttl on an embed are because of the initial rendering with it: The page can take forever to load, because it is a serial operation and the server needs to fetch all embeds now before the user sees anything, thats why I did the changes in the first place.

I think I can understand this point, though with this and the above concern, it might be a good idea to add some kind of background embed cache update, similar to the actor auto update system, though I guess we might ended up running into the same problem as #620 but in a different place 😅

also it appears that it's possible to async recompute the symfony cache going by this docs and it's also possible to make a cache pool backed by doctrine dbal, but from intial skimming it looks like the amout of additional setup to add may not be too different from doing it manually so might as well continuing down the manual path since you already got some of it in


Yeah I always put too much stuff into one PR your right. Mostly its related by topic or by problems created by doing something in a new way, but yeah I can see that it should have been 2-3 PRs.

I think this conversation here is a good example case on one of the reasons why one should avoid making a PR that tries to fix multiple problems at once: the initial problem of this PR was this:

When using firefox and automatic media preview was enabled the scroll position was never reliably restored. So when you would browse the entry list and then open an entry and then navigate back you would most likely end up on the bottom of the page, because our links did not have an id and the previews were expanded after the site was loaded.

but instead of getting to focus on the problem or the potential solution that fixes the stated problem, here I am going on another raving "old way or highway" on a different but only tangentially related problems instead, but I couldn't simply ignore it because it also needs to be reviewed

@asdfzdfj
Copy link
Contributor

asdfzdfj commented Apr 9, 2024

actually, do you have any explanations, docs or keywords to search on why adding ids to links help with reliably restoring navigation position? this feels like magic or (ab)using firefox quirks to get this done more than a proper "fix", but if it works it works, and if it must need this vendor specific workaround then so be it (there's another firefox specific workaround some where else in this project so I could understand the needs)

@e-five256
Copy link
Member

I agree with asdfzdfj that I don't think storing embed in the DB is a good idea. If we store bad data, we store it forever

@e-five256
Copy link
Member

The weird thing is I'm a firefox user with infinite scrolling on and never ran into any problems entering entries and hitting the back button. I just turned on auto media preview and I still don't have any issues, all the previews are still there and my view is exactly where I was before

@e-five256
Copy link
Member

e-five256 commented Apr 9, 2024

maybe I'm misunderstanding the issue? (can see scroll bar on right side)

scroll super far down in pages open entry do nothing but browser back
image image image

@e-five256
Copy link
Member

e-five256 commented Apr 9, 2024

I see it works fine on kbin.run but not on your site. Maybe there's a difference in cache headers or something? I've been trying to find a difference but not sure yet. When I hit the back button on kbin.run, almost nothing loads whatsoever (only 3 requests for like the favicon), when I go back on your site it refires basically every single network request for all images

on an entry and hitting back button on kbin.run (3 requests)
image

on an entry and hitting back button on your site (71 requests)
image

@e-five256
Copy link
Member

e-five256 commented Apr 9, 2024

Your media. subdomain responds with 200s, maybe changing that to 301 would help? Not positive that's all there is though since networks shows 301s for you but absolutely no requests besides header stuff for kbin.run

@BentiGorlich
Copy link
Member Author

actually, do you have any explanations, docs or keywords to search on why adding ids to links help with reliably restoring navigation position? this feels like magic or (ab)using firefox quirks to get this done more than a proper "fix", but if it works it works, and if it must need this vendor specific workaround then so be it (there's another firefox specific workaround some where else in this project so I could understand the needs)

Sadly I don't have anything concrete. My guess is that firefox remebers on which element I clicked and if it has an id it stores that somehow in the history. I just tinkered with it and this worked...

Your media. subdomain responds with 200s, maybe changing that to 301 would help? Not positive that's all there is though since networks shows 301s for you but absolutely no requests besides header stuff for kbin.run

What would I change to 301s?
Maybe its because gehirneimer uses s3 storage? I have a reverse proxy on the media domain to my s3 endpoint. I thought that it sends out cache headers, they are configured... I copied that config over from mastodon. My Browser console also says 200, but in the "transferred" column it says "from cache"...

@BentiGorlich
Copy link
Member Author

I agree with asdfzdfj that I don't think storing embed in the DB is a good idea. If we store bad data, we store it forever

Ok, I already decided to take it out from this PR and put in another one where we can discuss it 👍

@BentiGorlich
Copy link
Member Author

Maybe its because gehirneimer uses s3 storage?

Though it was happening to me on my local dev environment as well, so I am not sure...

I am using Fennec on mobile, maybe I should try vanilla FF for it...

@e-five256
Copy link
Member

Please read my comment here #689 (comment) I did this testing with the same Firefox instance, kbin.run does 3 requests in 42ms and your site does 71 requests in 3.21s. I feel like the answer to the problems you face is figuring out why it makes any requests at all when hitting the back button

@asdfzdfj
Copy link
Contributor

asdfzdfj commented Apr 9, 2024

Your media. subdomain responds with 200s, maybe changing that to 301 would help? Not positive that's all there is though since networks shows 301s for you but absolutely no requests besides header stuff for kbin.run

if that's the case then it's even weirder: 200s of static content should have been the easiest content to cache

@BentiGorlich
Copy link
Member Author

Your media. subdomain responds with 200s, maybe changing that to 301 would help? Not positive that's all there is though since networks shows 301s for you but absolutely no requests besides header stuff for kbin.run

if that's the case then it's even weirder: 200s of static content should have been the easiest content to cache

It is cached and it says it got the content from cache. But Firefox having the requests in the console at all means that it basically reloads the site instead of using the last site it has in the cache.

@BentiGorlich
Copy link
Member Author

You know what's great... It is mercure... I had it enabled and it is disabled on kbin.run
Now I have disabled it and no more extra requests...

@nobodyatroot
Copy link
Member

You know what's great... It is mercure... I had it enabled and it is disabled on kbin.run Now I have disabled it and no more extra requests...

No me gusta mercure

@BentiGorlich
Copy link
Member Author

On my phone the problem is not fixed by this, so maybe it is a Fennec specific issue? (I have the same problem on kbin.run with auto media preview).
I think adding the ids to links just doesn't hurt anybody, so I would keep that. The rest can be removed I think...

@e-five256
Copy link
Member

e-five256 commented Apr 9, 2024

I see now only 3 requests hitting back on your site. That's crazy that just disabling mercure does that. Maybe we're messing something up with how we load mercure in the page and it's causing the DOM to not cache (obviously the images respond with cached as asdfzdfj pointed out but it showing up vs not at all in network confuses me)?

@BentiGorlich
Copy link
Member Author

maybe I'm misunderstanding the issue? (can see scroll bar on right side)
scroll super far down in pages open entry do nothing but browser back

This is how it is on my phone (demonstrated on kbin.run):

scroll super far down in pages open entry do nothing but browser back
grafik grafik grafik

@BentiGorlich
Copy link
Member Author

I see now only 3 requests hitting back on your site. That's crazy that just disabling mercure does that. Maybe we're messing something up with how we load mercure in the page and it's causing nothing to cache correct?

Well it is caching correctly, but it essentially forces a page reload...

@e-five256
Copy link
Member

e-five256 commented Apr 9, 2024

Yea, sorry, edited comment, I just meant from a perspective of the network requests showing up at all, cached or not, I'm not sure if it's the DOM not being reusable or what, I guess I'm not an expert at this stuff, so won't comment on it further

@asdfzdfj
Copy link
Contributor

asdfzdfj commented Apr 9, 2024

You know what's great... It is mercure... I had it enabled and it is disabled on kbin.run Now I have disabled it and no more extra requests...

merde*, I think I have idea why:

  • this looks like a browser bfcache territory
  • the notification controller sets up onbeforeunload event listener to tear down the mercure connection
    connect() {
    this.es(this.getTopics());
    window.onbeforeunload = function (event) {
    if (window.es !== undefined) {
    window.es.close();
    }
    };
    }
  • this firefox docs says that the page is ineligible if beforeunload listener is used (the docs seems ancient but that's the only page of firefox bfcache docs I could find atm)

note that I might be off here, but this is the closest thing I could think of

* somehow this sound came up in my head upon reading that

@BentiGorlich
Copy link
Member Author

But the notifications_controller is connected no matter if mercure is enabled or not, isn't it?

@BentiGorlich
Copy link
Member Author

Don't know if it works, but would this be better:

    connect() {
        this.es(this.getTopics());

        window.onpagehide = function (event) {
            if (window.es !== undefined) {
                window.es.close();
            }
        };
        const c = this;
        window.onpageshow = function () {
            c.es(c.getTopics());
        }
    }

@asdfzdfj
Copy link
Contributor

asdfzdfj commented Apr 9, 2024

in any case I spawned a new issue to track that in #696, I'll see what I could do for that (at the very least is to change from beforeunload to pagehide like suggested)

@BentiGorlich
Copy link
Member Author

Just confirmed that only #696 is the problem, so I am going to close this

@BentiGorlich BentiGorlich deleted the fix/firefox-scroll-position branch April 9, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend related issues and pull requests enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants