Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge youtube-dl and fix Youtube Feeds #245

Merged
merged 19 commits into from
Nov 30, 2020

Conversation

pukkandan
Copy link
Contributor

@pukkandan pukkandan commented Nov 20, 2020

This is my attempt at merging 'ytdl-org/youtube-dl/master' release 2020.11.19 2020.11.26 into youtube-dlc

I have tried to resolve all conflicts by preferring youtube-dl's code when it doesn't sacrifice any functionality of dlc
Major changes needed to be made only to vlive, skyit and youtube extractors.


For vlive, youtube-dl no longer has a playlist extractor. I do not know why this is. So I have kept the playlist extractor from dlc while replacing the others.

youtube-dl's vlive extractor does not support playlists or posts. I have reverted the changes to it. See #248 and related patches
So for the timebeing, I am reverting any chages to it. Someone who's more familiar with vlive needs to take a look at this

Edit: youtube-dl has now added support for all url types. So now it can be completely merged


yt-dl added a skyit extractor. However, dlc already has one which seems to be implemented differently. So I am not merging it. It is probably best to handle it in a seperate PR like for vlive


For youtube, Channel, Playlists and Live extractors have been replaced by the new Tab extractor. youtube-dl no longer have YoutubeSearchURLIE , YoutubeFavouritesIE and YoutubeShowIE extractors. Also, all their feed extractors are broken.

I have made the following changes from the youtube-dl's extractor:

  • Rewrote Search URL, Favourites (liked videos) and all the Feed extractors (Subscriptions, History , Recommendations) - All youtube extractors are working now.
  • Made watchlater and favourites accessible by their direct IDs WL and LL respectively in addition to :ytwatchlater, :ytfav
  • Added support for Direct YouTube URLs like youtube.com/AsapSCIENCE - See 404 Not Found for channel URL without /c/ or /channel/ #186, [youtube] "youtube.com/name" can be a user or a channel #190
  • I have kept the age gate fix from dlc intact. However, it would be wise to rigorously review that section of the code again. I have tested it with both embeddable and non-embeddable age-restricted videos.

Edit: yt-dl now has fixed the feed extractors. However, they only download one page while my implementation allows continuations. I have tried to use as much of the code from yt-dl without sacrificing any functionality

ShowIE is removed in youtube-dl. I am not sure whether youtube still has show URLs. I couldn't find any. So I kept it removed

Another thing to note is that youtube-dl's youtube extractor behaves differently than previous versions. Now it directly downloads the videos in the URL that is given to it instead of trying to interpret it as user/channel/playlist etc. For example, in older versions, if you tried to download https://www.youtube.com/user/HBO, you would get all the uploads by HBO. But now, it downloads the playlists in the home page ("Transhood TRANSlation Summit" and "New on HBO"). If you want to download the whole channel, you must give /videos in the URL (https://www.youtube.com/user/HBO/videos. I am not sure how to revert this to older behaviour or if it even should be reverted in the first place.
Edit: This behaviour has been reverted for this PR. Related ytdl-org/youtube-dl#27099, ytdl-org/youtube-dl#27166, ytdl-org/youtube-dl#27186, ytdl-org/youtube-dl#27180


The following files were not merged:

.github/ISSUE_TEMPLATE/1_broken_site.md
.github/ISSUE_TEMPLATE/2_site_support_request.md
.github/ISSUE_TEMPLATE/3_site_feature_request.md
.github/ISSUE_TEMPLATE/4_bug_report.md
.github/ISSUE_TEMPLATE/5_feature_request.md
youtube_dlc/version.py
Changelog

Closes #200, Closes #216, Closes #230, Closes #233, Closes #186, Closes #172, Closes #247, Closes #256, Closes #226
Makes #190, #253, #249, #248, #227 obsolete

@pukkandan
Copy link
Contributor Author

Consider this a work in progress. I really need to get some sleep right now. I will look at any failing tests tomorrow.

I would really appreciate it if someone can test/review this

Old Extractors left behind:
	VLivePlaylistIE
	YoutubeSearchURLIE
	YoutubeShowIE
	YoutubeFavouritesIE

If removing old extractors, make corresponding changes in
	docs/supportedsites.md
	youtube_dlc/extractor/extractors.py

Not merged:
	.github/ISSUE_TEMPLATE/1_broken_site.md
	.github/ISSUE_TEMPLATE/2_site_support_request.md
	.github/ISSUE_TEMPLATE/3_site_feature_request.md
	.github/ISSUE_TEMPLATE/4_bug_report.md
	.github/ISSUE_TEMPLATE/5_feature_request.md
	test/test_all_urls.py
	youtube_dlc/version.py
	Changelog
@pukkandan
Copy link
Contributor Author

@kyuyeunk I see that you have some PRs open regarding vlive. So I assume you are familiar with its code. Can you have a look at youtube-dl's vlive extractor code and check how much of it we can/should merge

@pukkandan pukkandan marked this pull request as ready for review November 20, 2020 07:40
@pukkandan
Copy link
Contributor Author

Another thing that is worth noting before merging this is that the youtube extractor behaves differently now. Now it directly downloads the videos in the URL that is given to it instead of trying to interpret it as user/channel/playlist etc.

For example, in older versions, if you tried to download https://www.youtube.com/user/HBO, you would get all the uploads by HBO. But now, it downloads the playlists in the home page ("Transhood TRANSlation Summit" and "New on HBO
"). If you want to download the whole channel, you must give the URL https://www.youtube.com/user/HBO/videos

I am not sure how to revert this to older behaviour or if it even should be reverted in the first place

@pukkandan
Copy link
Contributor Author

Making youtube direct URLs work without any false positives was much harder than I thought.

For example, /AsapSCIENCE and /feed are valid, but /embed, /feed/library are not.

I ended up having to keep a record of the reserved names and perform a negative lookahead. If anyone has a better idea to fix this, you are welcome to try.

_RESERVED_NAMES = (
    r'course|embed|watch|w|results|storefront|'
    r'shared|index|account|reporthistory|t/terms|about|upload|signin|logout|'
    r'feed/(watch_later|history|subscriptions|library|trending|recommended)')

@pukkandan
Copy link
Contributor Author

@blackjack4494 Please review and merge this when you have time

@SeonjaeHyeon
Copy link
Contributor

SeonjaeHyeon commented Nov 20, 2020

I think we don't need to merge vlive extractor codes from youtube-dl. The plugin was already fixed #101 .
And @kyuyeunk opened PR for playlist support #223 .

It seems there isn't new feature to merge.

@kyuyeunk
Copy link
Contributor

@kyuyeunk I see that you have some PRs open regarding vlive. So I assume you are familiar with its code. Can you have a look at youtube-dl's vlive extractor code and check how much of it we can/should merge

I've compared the vlive.py for youtube_dl and youtube_dlc and came to following conclussions.

  1. youtube_dl does not support new url format while youtube_dlc does.
    vlive.tv recently started supporting two types (i.e., old and new) urls. Which are https://www.vlive.tv/video/<videoId> and https://www.vlive.tv/post/<postId>.
    For example, https://www.vlive.tv/video/111110 (old) and https://www.vlive.tv/post/1-18363798 (new) both points to the same video.
    While youtube_dl only supports old url types, youtube_dlc supports both old and new types.

  2. youtube_dl uses web api to fetch data that is only needed while youtube_dlc downloads whole web page and parse the data.
    I personally think approach of youtube_dl feels more proper. But web api that is being used only supports videoId and not postId.

  3. youtube_dl gathers more data about a video.
    youtube_dl versions gather information such as views, durations. Which will come in handy if a user uses options such as --min-views, or --get-duration.

All in all, vlive.py in youtube_dlc is superior because of number 1 - it works for both old and new type of urls. So there is no urgent need to change/merge it with youtube_dl version.

Meanwhile, I will try to work on a code that is based on youtube_dl version that also supports new url type and make a pull request of it. This is mostly because of reasons stated in 2 and 3. Also, maintaining similarity between youtube_dl and youtube_dlc codebase could be important.

@pukkandan
Copy link
Contributor Author

@kyuyeunk I agree with you. While maintaining code similarity to youtube-dl is important, there is no need for us to do so at the cost of functionality. I will leave vlive.py completely out of the merge. Meanwhile you can work on it at your own pace

@pukkandan pukkandan force-pushed the merge-main branch 2 times, most recently from a8ee7f3 to 6063e73 Compare November 21, 2020 14:51
@pukkandan pukkandan changed the title Merging the latest version of youtube-dl Merge youtube-dl and fix Youtube Feeds Nov 21, 2020
@pukkandan
Copy link
Contributor Author

pukkandan commented Nov 21, 2020

I have added support for ytsubs, ythistory, ytrec. So all youtube extractors should be working correctly now.

I have set maximum page count to 5 for the timebeing.
I removed max pages, since user can already provide --playlist-end to limit the number of videos

@Zirro
Copy link

Zirro commented Nov 22, 2020

Another thing that is worth noting before merging this is that the youtube extractor behaves differently now. Now it directly downloads the videos in the URL that is given to it instead of trying to interpret it as user/channel/playlist etc.

For example, in older versions, if you tried to download https://www.youtube.com/user/HBO, you would get all the uploads by HBO. But now, it downloads the playlists in the home page ("Transhood TRANSlation Summit" and "New on HBO"). If you want to download the whole channel, you must give the URL https://www.youtube.com/user/HBO/videos

Sudden changes in behaviour here could be problematic for those of us who regularly crawl a large number of channels in the background, causing a lot of unwanted content from other channels to get downloaded instead. I think the change by itself might make sense - especially if it remains consistent with youtube-dl's behaviour going forward - but if you could print a deprecation warning for a few releases when an affected URL is provided before the change is made I'm sure it'll save quite a few headaches for all the archivers out there.

@pukkandan
Copy link
Contributor Author

pukkandan commented Nov 22, 2020

@Zirro It is a change introduced by youtube-dl itself. I also prefer it the old way. It is easy enough to "fix" it by redirecting the channel home page to /video.

But in that case, there would be no way for the user to download the home page (why someone would want to do that, I dont know)
Edit: The user could explicitly provide \home

Also, this will make dlc behave differently than dl, which in my opinion is not a good idea.

Adding a warning is a good idea. I don't know how much actual practical value it has, but there is no harm in adding it.

@Zirro
Copy link

Zirro commented Nov 22, 2020

I also prefer it the old way.

I think you may have misunderstood me - I'm fine with either behaviour as the default in the long run. My hesitation is about changing it right away without a prior warning/notice to users.

If a temporary condition is added to retain the old behaviour (by appending /videos to bare channel/user URLs for instance) which also causes a warning to be printed over the next month or so, that is hopefully enough time for people to notice it and update their commands and scripts to include /videos everywhere they want the previous behaviour.

@Zirro
Copy link

Zirro commented Nov 26, 2020

Okay, sounds good to me. While testing this PR I found another issue that resulted in unexpected videos being downloaded, though it is more of an edge case. I had a few incorrect entries along the line of https://www.youtube.com/watch?list=UUXIkr0SRTnZO4_QpZozvCCA (where /watch should be /playlist) in my list of URLs to download. I hadn't noticed before, as the previous code appeared to handle these as playlists either way (arguably the appropriate approach is to throw an error).

However, unlike most incorrect paths on the site which result in either a 404 or a YouTube-specific error, /watch with unknown parameters apparently redirects to YouTube's front page and causes yt-dlc to start downloading the videos there. This should probably be caught as an invalid URL/error at an earlier stage.

@pukkandan
Copy link
Contributor Author

I had a few incorrect entries along the line of https://www.youtube.com/watch?list=UUXIkr0SRTnZO4_QpZozvCCA (where /watch should be /playlist) in my list of URLs to download

@Zirro Generally speaking, users shouldn't expect youtube-dl to handle redirects caused by invalid URLs correctly. But for this specific case, I have made a patch. It will issue a warning and then download the playlist.

Copy link
Owner

@blackjack4494 blackjack4494 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

50/66
had some troubles with few extractors. sometimes vpn did the job. guess most of the time it's my connection dropping.

@blackjack4494
Copy link
Owner

@Zirro I have added a commit that redirects channel homepage to /video and issues a warning about the redirect. The warning currently does not mention any future plans. It can easily be added if we decide that the redirection to /videos will be temporary.

@blackjack4494 I would like to know your opinion on this matter

Yeah changing current behaviour out of the blue may raise issues. It's better to do some light transitioning over time.

@blackjack4494 Awesome. But I @ you to ask what your opinion is on this matter.

Another thing that is worth noting before merging this is that the youtube extractor behaves differently now. Now it directly downloads the videos in the URL that is given to it instead of trying to interpret it as user/channel/playlist etc.
For example, in older versions, if you tried to download https://www.youtube.com/user/HBO, you would get all the uploads by HBO. But now, it downloads the playlists in the home page ("Transhood TRANSlation Summit" and "New on HBO"). If you want to download the whole channel, you must give the URL https://www.youtube.com/user/HBO/videos

Sudden changes in behaviour here could be problematic for those of us who regularly crawl a large number of channels in the background, causing a lot of unwanted content from other channels to get downloaded instead. I think the change by itself might make sense - especially if it remains consistent with youtube-dl's behaviour going forward - but if you could print a deprecation warning for a few releases when an affected URL is provided before the change is made I'm sure it'll save quite a few headaches for all the archivers out there.

Also, just out of curiosity, are you the only maintainer for this repo?

right now i am the only maintainer. have already asked if some other people were interested. there were two people I had in mind a way back then. Will try to add 1-2 maintainer the following week if they are interested.

Really tried to release an updated version this weekend but didn't make it due to be being more busy than I expected.

Copy link
Owner

@blackjack4494 blackjack4494 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is broken in viki extractor. related to subtitles. I guess some header is missing.

Copy link
Owner

@blackjack4494 blackjack4494 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I approve this now there are some (minor) issues with
googledrive
youtube
viki
vlive

but those can be fixed afterwards.

@blackjack4494 blackjack4494 merged commit ef5a4db into blackjack4494:master Nov 30, 2020
@blackjack4494
Copy link
Owner

As for youtube downloading behaviour:
yt-dlc should download channles like it is expected as usual.
however there will be some light transition.

What I thought of is a flag that will change to new/experimental downloading like youtube-dl behaves now. Later on this behaviour could become standard but it should be possible to switch back to old behaviour with another flag.
This is the most friendly way I can think of. Allowing depending scripts/programs to adapt to new changes by experimenting with new changes but maintaining the old behaviour in the meantime. Then at some point they could still use the old behaviour with a new flag.

@pukkandan tell me what you think. haven't digged deep into how you reverted to old behaviour. maybe it's as simple as to check a certain parameter to change behaviour.

@pukkandan
Copy link
Contributor Author

pukkandan commented Nov 30, 2020

@blackjack4494

Even though I approve this now there are some (minor) issues with
googledrive
youtube
viki
vlive

but those can be fixed afterwards.

Could you elaborate on what the exact issues are? Then I could have a look; especially youtube - that's the extractor I'm most familiar with

@pukkandan tell me what you think. haven't digged deep into how you reverted to old behaviour. maybe it's as simple as to check a certain parameter to change behaviour.

I look at the provided URL and if it directly points to the channel page, append a /video to it (036fcf3). Adding a switch to reverse it would be easy.

I personally think /videos should always be used - no need to transition. The user has the option to provide /home to download the home page instead. The only reason to transition is to keep in line with youtube-dl's behavior.

@pukkandan
Copy link
Contributor Author

pukkandan commented Nov 30, 2020

@blackjack4494 I found a minor bug in my code and have pushed a commit to fix it. Should I make it a separate PR?

@nixxo
Copy link
Contributor

nixxo commented Nov 30, 2020

@nixxo youtube-dl now has a skyit extractor. Could you please have a look at it and see if it can completely replace your skyitalia extractor?

@pukkandan
I looked at it and yes, you can replace my skyitalia extractor with skyit. It supports more websites.

p.s. sorry i'm late with the answer, the notification email went to the spam folder for some reason.

@pukkandan
Copy link
Contributor Author

@nixxo This PR has already been merged. Could you make a seperate PR with just the skyitalia extractor

@nixxo
Copy link
Contributor

nixxo commented Dec 1, 2020

@pukkandan pr done. #270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment