Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not cleaning up files after upload #2

Closed
if1mad opened this issue Jun 25, 2018 · 5 comments
Closed

Not cleaning up files after upload #2

if1mad opened this issue Jun 25, 2018 · 5 comments

Comments

@if1mad
Copy link

if1mad commented Jun 25, 2018

Is this by intention to not clean up files left over, e.g. covers, json, description, occasional .webm?

I was running it on a rather large channel (2TB+) and eventually the VPS disk got filled due to those remaining files building up, and for some reason some .webm.parts - not sure on the cause of that yet.

@bardisty
Copy link
Owner

bardisty commented Jun 26, 2018

There should be no leftover files unless you modified the rclone_command variable from move to copy:

-rclone_command="move"
+rclone_command="copy"

If you haven't already, try running the script with debug=true. This should tell what the culprit is. If not, please post the output here so I can take a look.

As for the .webm.parts, I'd guess youtube-dl hasn't completed those downloads yet or failed to for some reason. youtube-dl should continue downloading them next time you run ytdlrc. If it doesn't, check to see if their ID's are in the archive.list and if so, delete them and run ytdlrc again.

@bardisty
Copy link
Owner

I've added some checks to prevent the script from running if the rclone config isn't found or if the specified remote has any issues (not found, unauthorized, etc.). If either of these are the culprit the latest version should let you know.

To update:

  • Navigate to where you downloaded the repository
  • If you haven't already:
    • git add ytdlrc
    • git commit -m "Modify options to my taste"
  • git pull and merge

@bardisty
Copy link
Owner

bardisty commented Jun 26, 2018

Just realized you meant it's only failing to upload the metadata, not everything. This is in part due to youtube-dl's --exec option, it only uploads the downloaded video on completion and not any metadata. Metadata is uploaded/moved after all videos have been downloaded and moved to the rclone remote. It's possible if the channel is large enough - and the VPS disk is small enough - that the metadata alone might fill the disk before the script finishes processing all the videos.

For a channel with ~450 videos, the metadata (.description, .info.json, .jpg) should come out to roughly 120-130MB. .jpg's amount for about 80MB of that, .info.json ~45MB, and .description ~350KB. I'm guessing your disk filled up primarily due to those .webm.part files (I've yet to encounter this myself; curious what's causing it).

Couple ways you can try to circumvent this:

  • If your disk is really small and you don't care about thumbnail images, you could remove the --write-thumbnail line from the download_all_the_things() function:

    ytdlrc/ytdlrc

    Line 258 in 3d6b2fc

    --write-thumbnail \
    If the channel has thousands of videos this may save enough space for ytdlrc to finish processing the entire channel.
  • Utilize the --datebefore or --dateafter options in youtube-dl. E.g., add --datebefore 20150101 to the download_all_the_things() function (I recommend putting it after the --continue option). This would only download videos uploaded before Jan 1st 2015. Once that completes, change it to 20160101 and run it again, each time incrementing the date by one year until all videos are processed, then remove the line.

As for cleaning up the mess so ytdlrc can continue:

  • Grab the youtube ID's for the incomplete downloads in case you need to remove them from the archive.list: ls *.webm.part
  • Search the archive.list for the ID's and remove them if they exist
  • Free up some disk space, delete the .webm.part files
  • Run ytdlrc with debug=true

@if1mad
Copy link
Author

if1mad commented Jun 26, 2018

Thank you for the very in-depth reply!

After some testing and thinking, I believe I figured out the cause - I was running rclone with custom flag --drive-chunk-size=128M and ran out of RAM at some point which cancelled the rclone upload and ytdlrc continued working its way down the list, after this repeating enough times my drive became full!

Totally my fault I must admit!

Something I noticed when looking through the limited screen session log is that ytdlrc will continue working through the list even with a full disk:


[download] Downloading video 605 of 661
[youtube] gFoq8-Xszjs: Downloading webpage
[youtube] gFoq8-Xszjs: Downloading video info webpage
WARNING: unable to extract uploader nickname
WARNING: video doesn't have subtitles
[youtube] gFoq8-Xszjs: Looking for automatic captions
WARNING: Couldn't find automatic captions for gFoq8-Xszjs
ERROR: unable to create directory [Errno 28] No space left on device: '/opt/ytdlrc/stage/FailArmy'
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1751, in ensure_dir_exists
os.makedirs(dn)
File "/usr/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 28] No space left on device: '/opt/ytdlrc/stage/FailArmy'


I have a suggestion if you don't mind: A minimum space remaining on disk check before each youtube download is started.

Thank you for creating this wonderful script!

@bardisty
Copy link
Owner

bardisty commented Jul 6, 2018

Suggestions are more than welcome!

A free space check before each download is a possibility if I can work out a sane way to add it.

What you can do in the meantime is tell youtube-dl to abort if it encounters any errors by removing the --ignore-errors flag:

ytdlrc/ytdlrc

Line 252 in 3d6b2fc

--ignore-errors \

The downside to this is some videos tend to be unavailable (usually due to copyright) and youtube-dl treats these as errors, as such it will skip the rest of the channel even if your VPS has plenty of space left. If you opt to go this route, you'll need to check your syslog periodically or run ytdlrc manually now and then to see if any videos are causing youtube-dl to abort, and if so, manually add their youtube ID's to the archive.list so youtube-dl can continue with the rest of the channel.

FWIW, it should be fairly safe to keep --ignore-errors. I've been running ytdlrc for over a year now on 15 channels and the only time I've run out of disk space (on a tiny 10GB partition) is when one of the channels uploaded a 24hr long 4k video. Of course, YMMV.

@bardisty bardisty closed this as completed Jul 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants