-
Notifications
You must be signed in to change notification settings - Fork 0
/
WISHLIST
162 lines (147 loc) · 10.4 KB
/
WISHLIST
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
#--------------------------------------------------------------------------------------------
# Wish list
## General/UI
# - list control - follow list, pause, stop and resume post what is finished.
# - instead of pretty print. Video meta can be explained in more human readable format.
# - in case of multiple parallel downloads, the current \r funda won't work. So need central reporting engine.
# - file issues in git for well defined items.
# - prepare readme - specially for explaining usage and download_ref.
# - specify dependencies -e.g. certifi
# - make ytdownload or ytfetch and unify all apps.
## Reliable Download:
# - create your own powerful downloader which can go chunk-by-chunk with better control.
# - get watch page to retrive the play duration - dont' need to downloard the full thing for it.
# - deal with network failures. -- retry, wait-n-resume
# - deal with resume on socket. (resume partial downloads)
# - network reconnect should re-establish things [ start partial download from where it was left]
# - partial download based on number of bytes already downloaded (either in the a-v file or individual compoents)
# - if file is already a combined muxed, compare based on duration rather than bytes to avoid repeat downloard
# - parallel thread downloading
# - If youtube throws 404/403 - in actual downloads. it should not proceed like a normal flow. Exception should be handled. [ Done! ]
# - Deal with signatured streams that otherwise throws 403 (e.g. '-RMD88DNaGk')
# - Video's that are not served by youtube for copyright reasons (a-dshRotmnQ Felitsa Live Elmorro) should not be shown as success! [Done!]
# - there exists custom urls for channels. see https://www.youtube.com/account_advanced
# - my own channel is different from std (makemegenious) https://www.youtube.com/channel/UCzvGqEme9-yngxiYXY_Zk7A
# - maintain partial download work for lists -
# - retry and exception handling should also apply in extract so - omit what can't be parsed but not fail.
# - v=4h1ESUz2H3E fails in parsing
# Page caching:
# - cache every ytpage. it will solve many problems:
# - if (large) extract fails and it is restarted, it shouldn't bring ALL pages which were once succesful!
# - once you do extract you have got all pages - now doing fetch on full channel/playlist OR part need not get_page() again.
# - if a particular video is part of multiple playlist or you try again without realizing, or somehow .yl is deleted
# caching will reduce hits going to youtube
## Trace and No repeat download:
# - in list downloard - if the file is already present (with roughly same size) in target folder, then skip download
# - at the very minimal - if you try the same command again - whatever work has already been done, should not require redownload.
# - maintain a trace and don't redownload what you have already got. { verify in the directory ... }
# - skip what is already downloaded { verify in the directory ... }
# - you can also check if the file is physically present.
# - downalod_stream function should provide data of actual status and details of download (put it to trace)
# - MD5 or play duration verification of Media files (large ones)
# - needs to see that duration available locally matches with what is claimed by YT { to verify if the file downloaded is complete! }
# - where and how to keep the trace? what is the file format?
# - maintain csv generator that can keep the trace.
# - ask if the file to be downloaded already exists
`
## Implicit/smart behaviror
# - mkdir if folder for -f doesn't exist.
# - don't say 'Enjoy the video' when there is no download. [DONE]
# - Resume incomplete downloads from where they were left
# - capture critical stuff and do clean exit when user presses Ctr+C
# - '/' is earased from title because it is not allowed in a file name. However it should be replace by some char not earased.
# - default folder name is ./ But it can also be title of playlist or owner_name etc. for muliple videos
# - presss ^+C first should pause and ask user if any current download needs to stop or ytdragon should exit.
# - pressing ^+C should ask for option for: 'S'- skip current, 'Q'- quit, 'O'- for options {e.g. res etc.}
# - add bitrate under progress bar
# - show final (human readbale) file size in "Fetch compelete"
# - special flag to merge section wise files playlist in a single file using FFMPEG
## Generic Config and post processing:
# - make config.py to access all configs (editable by users) which works as defaults
# - generic methods/ or customzable file titles.
# - generic policies in what to fetch : AudioOnly/VideoOnly/AV, Max res,
# - post processing of video. e.g. convert to mp3
# - generalize the output file format ... make it what user wants. Can be done at FFMPEG level.
# - filters for download. From a given list - downloard only with specific Resolution, Playtime (skip 360p)
# - or downalod if title contains specific string. ( download Documentary.* in title - skip others)
# - default folder can be given in ytconfg.py (or some yml file)
# - use varaible in default config as formatting items: e.g.
# - %H -> default_download_dir, %F -> current folder, %A -> Author/Channel name, %P -> playlist name, %T -> Title,
# - %R -> res %I -> id %S -> site name (youtube) %N -> sequence number (in list)
# - all configuration should be modifiable from commandline args also.
# - support all types of youtube URLs
# - allow to download only audio - (for music)
## .yl file format and power fetch:
# - preserve comments (inline as well as full) in yl.
# - now the .yl in turn can have references to full channel, playlist etc. expand all of them before download
# - when playlist URL is given instead of video in .yl file, recognise and trasnlate accordingly.
# - probe for file size during extract and capture it in yl format?
# - tabs being compulsory in .yl files is a problem, specially when the files are handwritten.
# - fetch can now be given multiple uid_references which it can download all serially
# - download channels --by-playlist - so will keep each playlist in different subfolder
# - preserve COMMENTS in original .yl file if extracting again.
# - by default making .yl to .yl again -> the tab limited fields can be erased. Everything written post # should be preserved.
# - this way repeated .yl to .yl will not add repeated columns, and also will not erease user notes.
# - create .yl (or equivalent) as a result of download of list
# - meta attrib should download size of file / add in .yl
# - download list partially using filter such as names~=/regx/ OR quality=x OR duration>/< T
# - use search string/URL as list and extract to .yl or fetch
## Command line arguments
# - if -o 'filename' supplied AFTER the URL_REF then it doesn't take
# - -n switch to specify how many max videos to download. download only first 5/10 video rather than full list.
# - the extract script should go away and should become --info
# - supply -a to append in existing list.
# - supply --skip-till <vid> to skill till a point (useful till auto skip is not present)
# - supply --stop-at <vid> to stop downloading till a point.
# - supply --min-res <res> to download all above certain resolution
# - --specific_key_info - specific information to be extracted
## Big Ideas:
# - make it interactive shell? i.e. commands (same as args) can be given while work is going on. Press Ctr+C to interrupt
# - save user/channel by playlist
# Smart Downloads: - OK these are huge - work but might as well WISH
# - collect all meta tags og:video:tag.
# - When you extract full channels, large playlists, use these tags to somehow cluster - make groups with comments.
# - other form of grouping is, in a given channel - if parts of video are in playlist form make that a group.
# - similarly "N Grams" in title - "How stuff works - x", "How stuff works - y" etc. can be grouped.
## Beyond youtube:
# - prioritize OpenLoad
# - download from Dailymotion
# - download from vimeo
# - download from these sites including playlists/ channels.
# - download from openload.co
# - list maker for: ihavenotv, topdocumentaryfilms.com, documentarytube, docuwiki.net,
# - documentaryheaven, cosmolearning, documentary.net
# - Courses - MIT, Stanford, Coursera, edX
#----------------------------------------------------------------------------------------------------
# Code TODO
# - make item logger global (how will you handle in multithread situation?)
# - centrlized all standard strings (including youtube.com/xxx where we are assuming known url
# - get rid of print_smap_XXX functions and no extra return value from parse_stream_map
# - keep all common strings in a central place
# - it's time, the code must have classes for important stuff.
# - load_list of different type should come with simple and single abstraction.
# - use of many 'prints' is not clean - convert it to log functions
# - vmeta is very clumsy = it has two layers v[xxx] attribs and v[vmeta][xxx] attribs with quite an overlap. clean this.
# - video item ~ should be a CLASS!
# - finally logging is no loger need to be as dense as it was required and logging as we do might be unclean for external users. fix
# - simplify ytmeta. Single function with power of regex should be able to decode any URL, category and site
# - extract has no additional functionality now. convert this into swith --extract under ./fetch
# - handle /usr/bin/python3 vs /usr/local/bin/python3
# - print on consol, how many Gb downloaded and how many files with total play_duration. How many failed.
# - captions are no longer working! Fix it soon :(
# - if no available smap is available (may be as per criteria of config) then raise exception
# - there is NO logging (now) before meta parsing successful
## THE BIG MIGRATION:
# - deal with any UNICODE strings anywhere. [works under python3?]
# - viditem should be a class
# - use right mechanism for HTTP downloads.
# - geting page, streams and captions all have different types of download functions. Generalize!
# - CLASSES! - convert all stuff in classes. Then apply inheritance per source
# - make all stuff as classes: vid_meta
# - Migrate this wishlist in issues on Github
# New naming conventions:
# - better name for project!
# - name for module, generic classes, and specific classes by source
# - we have v= vid= video= ,,,too many optinos to do same. so need to get rid of some options.
# - NEW REFS: Revise refrence short-hand notation keeping in mind all other sites.
#-----------------------------------------------------------------------------------------------------