Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podcasts: Pre-caching image and more-info data can bring the server to a crawl #668

Merged
merged 5 commits into from Sep 28, 2021

Conversation

mw9
Copy link
Contributor

@mw9 mw9 commented Sep 26, 2021

These proposed changes seek to address issues that have arisen with pre-caching Podcast feed images and data.

At present, there is a problem with the pre-caching process, because it potentially generates a large number of simultaneous feed requests which can overload a modestly powered server, especially where the user subscribes to a large number of feeds. It's not an obvious problem on, say, an iMac, but it has created problems on a Raspberry Pi, as reported here: Slow loading of a list of podcast episodes.

I have submitted the PR against LMS 8.2, on the premise that the issue is a bug in 8.2. But it can be applied equally well to 8.3.

The proposed changes are:

  • Podcast: Pre-caching - Guard against fetching same feed multiple times
    This change ensures that only one pre-caching attempt per day will be made for each feed. At present, a pre-caching attempt may be made on each occasion that the user browses within the podcast menu.

  • Podcast: Pre-caching - Randomize image & moreInfo caching period
    This change implements randomization of the cache period, to reduce the likelihood that future cache refreshes will happen all at once. It also sets the cache period for a podcast's 'moreInfo' data to the same period as the podcast image. At present that data is only cached for the default period of one hour, which can become burdensome if the user has many feeds.

  • Podcast: Pre-caching - Refresh pre-cached data while browsing a feed
    This change implements an opportunistic refresh of the cache when the user browses into a podcast feed. It will help to keep the cache up to date.

  • Podcast: Pre-caching - Schedule activity over an extended period
    This change spreads out the pre-caching activity by launching each feed at 30 second intervals. Lower powered servers will benefit from this somewhat laid back approach. It does mean that a user with a large number of feeds will wait longer to see the podcast images properly populated, but perhaps that is not so very important.
    Update: Commit now revised - refer below.

When browsing within the podcast menu, pre-caching activity will be
initiated for each feed where pre-cached image or podcast info data is
not found.

A guard is needed to prevent launching the pre-cache process multiple
times for the same feed. This will happen whenever the user is browsing
within the podcast menu and a pre-existing pre-cache process has been
launched but has not completed.

This can have an adverse impact on modestly powered servers, and/or
where there are a large number of feeds to be handled.

This change handles the issue by pre-caching place-holder data each time
a feed pre-cache process is started. The place-holder data will be
replaced by real data when the process completes.
At present, a podcast's 'moreInfo' data is cached for the default period
of one hour, in contrast to image data which is cached for 90 days.

This generates regular pre-caching activity, at at least hourly intervals,
whenever the user browses within the podcast menu. This can become a
significant burden if the user subscribes to more than a handful of feeds.

This change aligns the caching period with image data. Staleness will not
be a significant issue, as this data rarely changes.

Additionally, the caching period is now randomized to fall within a window
of +/- 10 days about 90 days. This will reduce the chance that all cached
data becomes stale at the same time, thereby spreading the impact of
pre-caching activity over a number of user browse sessions.
Take the opportunity to refresh/update a podcast's pre-cached data while
browsing the feed.

This is, essentially, free, and extends the life of the cached data.

By preventing cached data from becoming stale, we reduce the burden of
podcast feed pre-caching activity that can take place whenever the user
browses within the podcast menu.
@philippe44
Copy link
Contributor

philippe44 commented Sep 26, 2021

You're right about the load on more simple servers and the need to not query the same thing multiple times, I often forget that LMS redescend XML tree multiple time over navigation, but don't you think that something basic like that would be good enough

sub handleFeed {
	my ($client, $cb, $params, $args) = @_;

	my $items = [];
	my $provider = getProviderByName();

	# populate provider's custom menu
	foreach my $item (@{$provider->getMenuItems($client)}) {
		$item->{name}  ||= cstring($client, 'PLUGIN_PODCAST_SEARCH');
		$item->{type}  ||= 'search';
		$item->{url}   ||= \&searchHandler unless $item->{enclosure};
		$item->{passthrough} ||= [ { provider => $provider, item => $item } ];

		if (!$item->{image} || ref $item->{image}) {
			$item->{image} = 'html/images/search.png';
		}

		push @$items, $item;
	}

	# then add recently played
	push @$items, {
		name  => cstring($client, 'PLUGIN_PODCAST_RECENTLY_PLAYED'),
		url   => \&recentHandler,
		type  => 'link',
		image => __PACKAGE__->_pluginDataFor('icon'),
	};

	# then existing feeds
	my @feeds = @{$prefs->get('feeds')};
	my @need;
	my $fetch;
	
	foreach ( @feeds ) {
		my $url = $_->{value};
		my $image = $cache->get('podcast-rss-' . $url);

		push @$items, {
			name => $_->{name},
			url  => $url,
			favorites_url => $url,
			favorites_type => 'link',
			parser => 'Slim::Plugin::Podcast::Parser',
			image => $image || __PACKAGE__->_pluginDataFor('icon'),
			playlist => $url,
		};

		unless ($image && $cache->get('podcast_moreInfo_' . $url)) {
			$cache->set('podcast-rss-' . $url, __PACKAGE__->_pluginDataFor('icon'), '1days');
			$cache->set('podcast_moreInfo_' . $url, {}, '1days');
			push (@need, $url);
		}	
	}

	$fetch = sub {
		my $url = pop @need;
		Slim::Formats::XML->getFeedAsync(
			sub {
				_precacheShowDetails($url, $_[0]);
				$fetch->();
			},
			sub {
				$log->warn("can't get $url RSS feed information: ", $_[0]);
				$fetch->();
			},
			{
				parser => 'Slim::Plugin::Podcast::Parser',
				url => $url,
				expires => 86400,
			}
		) if $url;
	};
	
	# get missing cache images & moreinfo if any
	$fetch->();
	
	$cb->({
		items => $items,
		actions => {
			info => {
				command   => ['podcastinfo', 'items'],
				variables => [ 'url', 'url', 'name', 'name', 'image', 'image' ],
			},
		}
	});
}

I'm not a big fan of timers in general and this alternative simply serialize the requests, so there is no flooding of 100's and when descending a tree, grabbing the episodes of a single feed will simply enter into concurrence with just one caching request. It's faster (no need to wait 30s) when the opportunity permits, while not being in the way of other user-requested browsing.

@michaelherger
Copy link
Member

Thanks @mw9 for identifying this issue - good catch! I like the idea of the randomized TTL for the cache and the update whenever available.

But I'd agree with @philippe44 about the timers: they're always wrong - too short or too long, but never right 😀. I mentioned some modules in the forum thread, but @philippe44's approach seems good enough to me. The Async::Util::amap() would have had a slight advantage in that it would allow for some level of parallelism without opening too many connections at the same time. But that's optimization.

Could you please factor in @philippe44's suggestion to get rid of the timer? The rest looks good to me. Thanks again!

This change adapts the podcast image and 'more info' pre-caching
activity to take place over an extended period of time.

The intention is to avoid overly burdening modestly endowed servers.
Some users may have a considerable number of subscribed feeds (in excess
of 100 have been reported), and to fetch and parse this number of feeds
all at the same time can cause considerable delays (reportedly in the
10s of seconds) when browsing the podcast UI.

This will be particularly noticeable when using the podcast menu after
clearing server caches, or re-installing LMS. In those circumstances all
feeds need to be pre-cached, and this will be attempted on first use of
the podcast menu.

With this change, fetching and parsing each feed takes place
sequentially, So the server will only be processing one feed at a time,
instead of processing all feeds concurrently.

No explicit expiry time need be set for the retrieved feed, we will be
caching the data we want from it.
@mw9
Copy link
Contributor Author

mw9 commented Sep 27, 2021

I have taken on board your discomfort with my laid back timers 😀, and have taken up @philippe44's elegant suggestion in its place.

It seems to work fine on my modestly powered server.

Just the last commit is changed:

  • Podcast: Pre-caching - Schedule activity over an extended period
    This change spreads out the pre-caching activity by fetching and parsing each feed sequentially. Lower powered servers will benefit from this approach, because the server will only be processing one feed at a time, instead of processing all feeds concurrently.

@mw9
Copy link
Contributor Author

mw9 commented Sep 27, 2021

A remark:

While playing around with this last week, I ran across a curious issue. It seems that, if there is an http failure, there are circumstances in which Slim::Formats::XML->getFeedAsync can call its error callback twice. For the same feed.

But the remote http server that was triggering the problem has now started to behave itself, and I have not been able to reproduce the circumstances. At first I thought it might be something to do with Async::Util::amap(). But it isn't. Perhaps, one day, I shall track it down.

There is no particularly adverse effect on this proposed change. It just means that we would be triggered into fetching the next feed a little earlier than anticipated. It will probably be a rare occurrence.

Copy link
Contributor

@mherger mherger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I was wondering whether you didn't pre-cache things twice now: once in the parser, then in the callback triggered with the parser's result?

Slim/Plugin/Podcast/Plugin.pm Outdated Show resolved Hide resolved
Slim/Plugin/Podcast/Plugin.pm Show resolved Hide resolved
sub 'preCacheFeedData' requires a URL and a feed as parameters.
Check that we are given something that looks believable.

Also, 'preCacheFeedData' is already called by the parser, we don't need
to do it again.
@mherger mherger changed the base branch from public/8.2 to public/8.3 September 28, 2021 11:47
@mherger mherger merged commit ab2f9d3 into LMS-Community:public/8.3 Sep 28, 2021
@mherger
Copy link
Contributor

mherger commented Sep 28, 2021

Thanks for this PR! I merged it to 8.3 for now. Should it turn out to be good, I'll back-port to 8.2. Please give it some testing.

@mw9
Copy link
Contributor Author

mw9 commented Oct 3, 2021

Thanks for this PR! I merged it to 8.3 for now. Should it turn out to be good, I'll back-port to 8.2. Please give it some testing.

Forum user @Aphonia, who raised the issue, has provided me with his list of (125) podcasts that were causing him pain.
I have replicated his pain on my own (low powered) system, and confirmed that these changes do seem to resolve the issues.

One matter has arisen:

It is quite acceptable for a podcast not to provide an image. In this case we provide a default image, but only for 24 hours. Thereafter we just fetch the feed again, and re-try. That seems, to me, to be somewhat unnecessary, and perhaps "rude". I shall propose in a follow up PR that we adopt the same circa 90 day policy that is applied to "real" images.

I also noted some small changes that might be made to the XML/RSS feed parser, which I will offer up in a follow up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants