New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Page Size Overrides per Channel #702
base: testing
Are you sure you want to change the base?
Conversation
b9120da
to
8fcb513
Compare
Thanks for taking a stab at that. Current backlog:
Please be patient. |
OK, I took some time to look into this. That is some old code, I don't think I ever touched it since project beginning. :-) I think it's time to break out the query building into a separate class there. That is more or less self contained. Then this also prepares us for future expansion if we ever need to add additional types. This is what I came up with: class VideoQueryBuilder:
"""Build queries for yt-dlp."""
def __init__(self, config: dict, channel_overwrites: dict | None = None):
self.config = config
self.channel_overwrites = channel_overwrites or {}
def build_queries(
self, video_type: VideoTypeEnum | None, limit: bool = True
) -> list[tuple[VideoTypeEnum, int | None]]:
"""Build queries for all or specific video type."""
query_methods = {
VideoTypeEnum.VIDEOS: self.videos_query,
VideoTypeEnum.STREAMS: self.streams_query,
VideoTypeEnum.SHORTS: self.shorts_query,
}
if video_type:
# build query for specific type
query_method = query_methods.get(video_type)
if query_method:
query = query_method(limit)
if query[1] != 0:
return [query]
return []
# Build and return queries for all video types
queries = []
for build_query in query_methods.values():
query = build_query(limit)
if query[1] != 0:
queries.append(query)
return queries
def videos_query(self, limit: bool) -> tuple[VideoTypeEnum, int | None]:
"""Build query for videos."""
return self._build_generic_query(
video_type=VideoTypeEnum.VIDEOS,
overwrite_key="subscriptions_channel_size",
config_key="channel_size",
limit=limit,
)
def streams_query(self, limit: bool) -> tuple[VideoTypeEnum, int | None]:
"""Build query for streams."""
return self._build_generic_query(
video_type=VideoTypeEnum.STREAMS,
overwrite_key="subscriptions_live_channel_size",
config_key="live_channel_size",
limit=limit,
)
def shorts_query(self, limit: bool) -> tuple[VideoTypeEnum, int | None]:
"""Build query for shorts."""
return self._build_generic_query(
video_type=VideoTypeEnum.SHORTS,
overwrite_key="subscriptions_shorts_channel_size",
config_key="shorts_channel_size",
limit=limit,
)
def _build_generic_query(
self,
video_type: VideoTypeEnum,
overwrite_key: str,
config_key: str,
limit: bool,
) -> tuple[VideoTypeEnum, int | None]:
"""Generic query for video page scraping."""
if not limit:
return (video_type, None)
if overwrite_key in self.channel_overwrites:
overwrite = self.channel_overwrites[overwrite_key]
return (video_type, overwrite)
if overwrite := self.config["subscriptions"].get(config_key):
return (video_type, overwrite)
return (video_type, None) That then should simplify def get_last_youtube_videos(
self,
channel_id,
limit=True,
query_filter=None,
channel_overwrites=None,
):
"""get a list of last videos from channel"""
query_handler = VideoQueryBuilder(self.config, channel_overwrites)
queries = query_handler.build_queries(query_filter)
last_videos = []
for vid_type_enum, limit_amount in queries:
obs = {
"skip_download": True,
"extract_flat": True,
}
vid_type = vid_type_enum.value
if limit:
obs["playlistend"] = limit_amount
url = f"https://www.youtube.com/channel/{channel_id}/{vid_type}"
channel_query = YtWrap(obs, self.config).extract(url)
if not channel_query:
continue
last_videos.extend(
[
(i["id"], i["title"], vid_type)
for i in channel_query["entries"]
]
)
return last_videos This is minimally tested... Not sure if that covers all cases... But is much more explicit and less confusing to what I had before there and to what you had to work with. :-) I'm also thinking that we might want to have a more sophisticated download for at some point, e.g. when you add a channel to the form, to download x amount of videos, or something like that... Does that make sense? Also you might want to rebase on testing branch, I pushed quite a few things since you branched here. |
Yeah this looks much simpler. I'll put it together into a commit
This sounds great. It would fix the problem I had when setting up and importing all my youtube subscriptions where I really only wanted new videos so I had to manually ignore all the added videos. I think that's outside the scope of this PR but I'll keep it in mind. |
Based on bbilly1's code from their comment in tubearchivist#702
8fcb513
to
e4c2ac0
Compare
Added your code, works great. Only had to make one change here since in the case the main config entry is False it means no videos of that type should be queried, rather than unlimited. I also added a check that the channel override is not None as that is what it was set to when it was removed. |
This is a minimum viable product. Tested all 3 overrides and they worked. The current method of resetting the override is clunk (setting to negative number). I've also upended some of the build query in subscriptions and haven't fully tested if that messes with things. Moved query building into its own class Based on bbilly1's code from their comment in tubearchivist#702
e13fe0e
to
fef2750
Compare
Looking into this more, I think this was from me not properly deleting entries from the channel overrides config when I was testing earlier. Fresh install does not need this check as the keys are actually deleted when the override is unset. |
This is a minimum viable product for adding per channel overwrites on the number of videos, shorts, and livestreams to query when refreshing subscriptions. Tested all 3 overrides and they worked. The current method of resetting the override is clunky (setting to negative number). Open to ideas on how to better implement that. I've also upended some of the build query in subscriptions and haven't fully tested if that messes with things, but have been running this on my install the past few days with no ill effects that I've found.
New channel -> about overwrites page:
Sorry I can't contribute to the burn down of maintainability items, I'm not too experienced and don't know where to start for any of the remaining open items.