New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Podcasts: Pre-caching image and more-info data can bring the server to a crawl #668
Podcasts: Pre-caching image and more-info data can bring the server to a crawl #668
Conversation
When browsing within the podcast menu, pre-caching activity will be initiated for each feed where pre-cached image or podcast info data is not found. A guard is needed to prevent launching the pre-cache process multiple times for the same feed. This will happen whenever the user is browsing within the podcast menu and a pre-existing pre-cache process has been launched but has not completed. This can have an adverse impact on modestly powered servers, and/or where there are a large number of feeds to be handled. This change handles the issue by pre-caching place-holder data each time a feed pre-cache process is started. The place-holder data will be replaced by real data when the process completes.
At present, a podcast's 'moreInfo' data is cached for the default period of one hour, in contrast to image data which is cached for 90 days. This generates regular pre-caching activity, at at least hourly intervals, whenever the user browses within the podcast menu. This can become a significant burden if the user subscribes to more than a handful of feeds. This change aligns the caching period with image data. Staleness will not be a significant issue, as this data rarely changes. Additionally, the caching period is now randomized to fall within a window of +/- 10 days about 90 days. This will reduce the chance that all cached data becomes stale at the same time, thereby spreading the impact of pre-caching activity over a number of user browse sessions.
Take the opportunity to refresh/update a podcast's pre-cached data while browsing the feed. This is, essentially, free, and extends the life of the cached data. By preventing cached data from becoming stale, we reduce the burden of podcast feed pre-caching activity that can take place whenever the user browses within the podcast menu.
You're right about the load on more simple servers and the need to not query the same thing multiple times, I often forget that LMS redescend XML tree multiple time over navigation, but don't you think that something basic like that would be good enough
I'm not a big fan of timers in general and this alternative simply serialize the requests, so there is no flooding of 100's and when descending a tree, grabbing the episodes of a single feed will simply enter into concurrence with just one caching request. It's faster (no need to wait 30s) when the opportunity permits, while not being in the way of other user-requested browsing. |
Thanks @mw9 for identifying this issue - good catch! I like the idea of the randomized TTL for the cache and the update whenever available. But I'd agree with @philippe44 about the timers: they're always wrong - too short or too long, but never right 😀. I mentioned some modules in the forum thread, but @philippe44's approach seems good enough to me. The Could you please factor in @philippe44's suggestion to get rid of the timer? The rest looks good to me. Thanks again! |
This change adapts the podcast image and 'more info' pre-caching activity to take place over an extended period of time. The intention is to avoid overly burdening modestly endowed servers. Some users may have a considerable number of subscribed feeds (in excess of 100 have been reported), and to fetch and parse this number of feeds all at the same time can cause considerable delays (reportedly in the 10s of seconds) when browsing the podcast UI. This will be particularly noticeable when using the podcast menu after clearing server caches, or re-installing LMS. In those circumstances all feeds need to be pre-cached, and this will be attempted on first use of the podcast menu. With this change, fetching and parsing each feed takes place sequentially, So the server will only be processing one feed at a time, instead of processing all feeds concurrently. No explicit expiry time need be set for the retrieved feed, we will be caching the data we want from it.
cf0e21e
to
fa945f7
Compare
I have taken on board your discomfort with my laid back timers 😀, and have taken up @philippe44's elegant suggestion in its place. It seems to work fine on my modestly powered server. Just the last commit is changed:
|
A remark: While playing around with this last week, I ran across a curious issue. It seems that, if there is an http failure, there are circumstances in which But the remote http server that was triggering the problem has now started to behave itself, and I have not been able to reproduce the circumstances. At first I thought it might be something to do with There is no particularly adverse effect on this proposed change. It just means that we would be triggered into fetching the next feed a little earlier than anticipated. It will probably be a rare occurrence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I was wondering whether you didn't pre-cache things twice now: once in the parser, then in the callback triggered with the parser's result?
sub 'preCacheFeedData' requires a URL and a feed as parameters. Check that we are given something that looks believable. Also, 'preCacheFeedData' is already called by the parser, we don't need to do it again.
Thanks for this PR! I merged it to 8.3 for now. Should it turn out to be good, I'll back-port to 8.2. Please give it some testing. |
Forum user @Aphonia, who raised the issue, has provided me with his list of (125) podcasts that were causing him pain. One matter has arisen: It is quite acceptable for a podcast not to provide an image. In this case we provide a default image, but only for 24 hours. Thereafter we just fetch the feed again, and re-try. That seems, to me, to be somewhat unnecessary, and perhaps "rude". I shall propose in a follow up PR that we adopt the same circa 90 day policy that is applied to "real" images. I also noted some small changes that might be made to the XML/RSS feed parser, which I will offer up in a follow up PR. |
These proposed changes seek to address issues that have arisen with pre-caching Podcast feed images and data.
At present, there is a problem with the pre-caching process, because it potentially generates a large number of simultaneous feed requests which can overload a modestly powered server, especially where the user subscribes to a large number of feeds. It's not an obvious problem on, say, an iMac, but it has created problems on a Raspberry Pi, as reported here: Slow loading of a list of podcast episodes.
I have submitted the PR against LMS 8.2, on the premise that the issue is a bug in 8.2. But it can be applied equally well to 8.3.
The proposed changes are:
Podcast: Pre-caching - Guard against fetching same feed multiple times
This change ensures that only one pre-caching attempt per day will be made for each feed. At present, a pre-caching attempt may be made on each occasion that the user browses within the podcast menu.
Podcast: Pre-caching - Randomize image & moreInfo caching period
This change implements randomization of the cache period, to reduce the likelihood that future cache refreshes will happen all at once. It also sets the cache period for a podcast's 'moreInfo' data to the same period as the podcast image. At present that data is only cached for the default period of one hour, which can become burdensome if the user has many feeds.
Podcast: Pre-caching - Refresh pre-cached data while browsing a feed
This change implements an opportunistic refresh of the cache when the user browses into a podcast feed. It will help to keep the cache up to date.
Podcast: Pre-caching - Schedule activity over an extended period
This change spreads out the pre-caching activity by launching each feed at 30 second intervals. Lower powered servers will benefit from this somewhat laid back approach. It does mean that a user with a large number of feeds will wait longer to see the podcast images properly populated, but perhaps that is not so very important.
Update: Commit now revised - refer below.