Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem downloading taichi dataset #11

Closed
AndroYD84 opened this issue Feb 21, 2020 · 6 comments
Closed

Problem downloading taichi dataset #11

AndroYD84 opened this issue Feb 21, 2020 · 6 comments

Comments

@AndroYD84
Copy link

When I run this command:
python load_videos.py --metadata taichi-metadata.csv --format .mp4 --out_folder taichi --workers 1
At some point I hit this error, it happens on both Windows and Linux:

2it [02:13, 58.17s/it]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\first-order-model\data\taichi-loading\load_videos.py", line 76, in run
    crop = img_as_ubyte(resize(crop, args.image_shape, anti_aliasing=True))
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\site-packages\skimage\transform\_warps.py", line 166, in resize
    preserve_range=preserve_range)
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\site-packages\skimage\transform\_warps.py", line 807, in warp
    raise ValueError("Cannot warp empty image with dimensions", image.shape)
ValueError: ('Cannot warp empty image with dimensions', (120, 0, 3))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\first-order-model\data\taichi-loading\load_videos.py", line 78, in run
    except imageio.core.format.CannotReadFrameError:
AttributeError: module 'imageio.core.format' has no attribute 'CannotReadFrameError'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "load_videos.py", line 113, in <module>
    for chunks_data in tqdm(pool.imap_unordered(run, zip(video_ids, args_list))):
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\site-packages\tqdm\std.py", line 1091, in __iter__
    for obj in iterable:
  File "C:\ProgramData\Anaconda3\envs\ptlast37\lib\multiprocessing\pool.py", line 748, in next
    raise value
AttributeError: module 'imageio.core.format' has no attribute 'CannotReadFrameError'
2it [02:23, 71.69s/it]

The timing when this error occurs and which videos it downloads are random (ie. sometimes it happens after 1 minute, sometimes after 5 minutes, first time it downloads a set of videos then second time it downloads videos that completely different from before, even after emptying the folders first).

@AliaksandrSiarohin
Copy link
Owner

I trying it now, seems fine (running for 7it and 12 minutes).
What is your version of imageio, imageio-ffmpeg and youtube-dl?
Mine is 2.6.1, 0.4.0, 2018.03.14
If you have the same versions and still have the problem, contact me by email.

@AliaksandrSiarohin
Copy link
Owner

AliaksandrSiarohin commented Feb 21, 2020

You are right, there is an error. The problem is that resolution of many videos changed since the time I was testing it. Should be fixed by now, need some time to check if everything could be finished properly.

@AndroYD84
Copy link
Author

There're some broken links as shown in the log:

python load_videos.py --metadata taichi-metadata.csv --format .mp4 --out_folder taichi --workers 1
67it [5:36:13, 301.09s/it]Can not load video iFMbu9-Mejc, broken link
111it [9:16:43, 300.93s/it]Can not load video _XRyc2kiTlM, broken link
117it [9:42:00, 298.47s/it]Can not load video JdiIQg47Wc4, broken link
134it [10:57:11, 294.27s/it]Can not load video EDGjhmIMCnw, broken link
164it [12:58:33, 284.84s/it]Can not load video LstLDRUBAp4, broken link
186it [14:39:24, 283.68s/it]Can not load video KYdyIdusD0g, broken link
206it [16:08:48, 282.18s/it]Can not load video VhprHat04dk, broken link
280it [21:39:13, 278.41s/it]

These videos seems to be unavailable on Youtube now, but the rest seems to have been downloaded successfully, I guess it's not a major problem, should still be enough data to train.
Haven't tried the "--format .png" flag to check if this can somehow break something during the way because I did the conversion from .MP4 to .PNG with a batch process.

@AliaksandrSiarohin
Copy link
Owner

One benefit of the .png is that it is lossless. By storing first in .mp4 and then converting you rick to introduce some unnessesary compression artifacts from .mp4.

@WenjiaWang0312
Copy link

I met this problem as well, is there any solution? thanks. @AliaksandrSiarohin

@Mooseburger1
Copy link

This is super old, but commenting here anyways for anyone still trying their hand at this. There was an update to the youtube-dl regex pattern that broke things. So there is an overall issue with that. So what I did to fix the issue was

  • Install a "fixed" version of youtube-dl by following these directions.
  • In the load_videos.py script I changed the download function to the following:
def download(video_id, args):
    video_path = os.path.join(args.video_folder, video_id + ".mp4")
    try:
        result = subprocess.check_output([args.youtube, '-f', "''best/mp4''", '--write-auto-sub', '--write-sub',
                        '--sub-lang', 'en', '--skip-unavailable-fragments',
                        "https://www.youtube.com/watch?v=" + video_id, "--output",
                        video_path], shell=True)
        print(result)
    except Exception as e:
        print(f"Video {video_id} failed to download because of {e}")
        
    return video_path

You'll notice things are wrapped in a try-catch because some of the videos would throw an exception since the account associated with those videos were deactivated. An exception to any of the sub process kills the whole program. So the try-catch prevents that.

  • Changed the default to the --youtube arg to be the youtube-dl command that is now on $PATH since it was installed via pip
parser.add_argument("--youtube", default='youtube-dl', help='Path to youtube-dl')

And that should take care of all the issues. Doing the above fixed the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants