Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question concerning Podcast download time-of-day #849

Open
delrizzo opened this issue Aug 13, 2019 · 6 comments

Comments

@delrizzo
Copy link

commented Aug 13, 2019

Hi,
I've scanned podcast-ingest related issues, and have posted on discourse...

Just curious about what triggers a podcast download? How often is the RSS scanned, and is it immediately going to download a new episode if the option is checked in the podcast definition?

I have a podcast that is 100% consistent, and daily, and I need to specify when I want it downloaded, so that there is enough time for the auto-playlist feature to schedule it. It would be nice to be able to specify a hard time-of-day and perhaps even day-of-week when a podcast needs to be downloaded?

This isn't as important for weekly podcasts, but for daily programs like Democracy Now!, we need it to be in our library at a certain time, guaranteed.

Thanks!
d

@hairmare

This comment has been minimized.

Copy link
Member

commented Aug 17, 2019

As far as I can tell we check hourly for new podcasts and download them when available.

/**
* @var int how often, in seconds, to check for and ingest new podcast episodes
*/
private static $_PODCAST_POLL_INTERVAL_SECONDS = 3600; // 1 hour
/**
* Check whether $_PODCAST_POLL_INTERVAL_SECONDS have passed since the last call to
* downloadNewestEpisodes
*
* @return bool true if $_PODCAST_POLL_INTERVAL_SECONDS has passed since the last check
*/
public static function hasPodcastPollIntervalPassed() {
$lastPolled = Application_Model_Preference::getPodcastPollLock();
return empty($lastPolled) || (microtime(true) > $lastPolled + self::$_PODCAST_POLL_INTERVAL_SECONDS);
}

This is one of the many side-effects that gets triggered by a generic call to the libretime API (which pypo, et.al. do at a regular time).

If you need to be absolutely sure that you always have a new version and I have my numbers right the episode would need to be published roughly 1.5 hours before it's next schedule (1 hour if the episode is published right after our last check + lead time for the autoscheduler).

An easy workaround to improve the behaviour could be to make the polling interval configurable to allow for shorter windows.

Making it so it's schedulable at a hard time-of-day (or even day-of-week) would be a bit more work because the code currently operates on all podcasts at once. We would need move checking for an interval to the individual podcasts and have the PodcastManager poll each individual podcast to check if it needs an update. Some ui work to expose the feature would also be needed. The path that does the actual downloading already supports what's needed as the business logic is strictly in the php parts.

@delrizzo

This comment has been minimized.

Copy link
Author

commented Aug 22, 2019

Posting this here (originally sent to Discourse)...

Thanks for the quick reply, and for the code ref. I've checked and checked again, so I'm afraid that this is now a bug report.

Democracy Now! uploads their newest episode usually shortly after 0900h EST (despite the timestamp in the RSS being incorrect and reporting 0900h GMT). This means that it's up before 0700h PST (where we live) and we play it at 1200h PST from an automated playlist. Since I have configured the DN! podcast and auto playlist, I have had to intervene and force the import prior to the autoplaylist populating the block (which according to the docs should happen at 1100h PST). When I edit the podcast, the new episode always shows up immediately, and when I select it to be imported, it downloads immediately, so there don't seem to be any issues with the RSS or the mechanism. As I noted, the timestamp of the in the RSS reads 0900GMT for some reason, but I can't imagine that's throwing anything off.

If I'm doing something wrong or missing something, I'd be happy to try to figure that out with some guidance from the group. Otherwise, I'm hoping to help solve this problem. The podcasts that I have configured that are released weekly have had no issues so far. But this daily one I would love to figure out as it is going to potentially be a pain to have to intervene every day.

Thanks all for checking this issue out.

@hairmare

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

It looks like your edits are triggering explicit refreshes of the podcast which is why it's working in such cases. The GMT vs. EST timestamps shouldn't matter, I don't think we are using them anywhere.

I'll have to set up an environment to reproduce this which will take some time. @Robbt If I remember correctly, you are also in the Pacifica network and use DN podcasts. Are @delrizzo ad I missing something?

I guess Democracy Now! podcasts affect a large part of our user base which is why I set this up as a blocker on the 3.0 release.

@delrizzo

This comment has been minimized.

Copy link
Author

commented Aug 27, 2019

@hairmare

This comment has been minimized.

Copy link
Member

commented Aug 31, 2019

The built-in importer uses SimplePie which does have some issues with non-standard RSS-feeds that contain audio-enclosures. From what I can tell most podcast-feeds are non-standard since embedding audio-enclosures was specified after the original RSS specification was published.

I guess it would make sense to update or replace SimplePie with something that understands audio-enclosures without the need for too much hacks.

Please share links to affected podcasts if you can. It looks like their are quite a few ways the podcasts are implementing their feeds and having some real examples always helps debug such cases.

@delrizzo

This comment has been minimized.

Copy link
Author

commented Sep 9, 2019

I can provide the XSLT code that I have been using to parse all of the podcasts that we are using. It's been customized to navigate the different enclosure formats for audio files. Likely a bit messy, but will try to share later today. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.