Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too Many Request when Downloading YT-SB-1b #72

Open
LengSicong opened this issue Jan 5, 2024 · 7 comments
Open

Too Many Request when Downloading YT-SB-1b #72

LengSicong opened this issue Jan 5, 2024 · 7 comments

Comments

@LengSicong
Copy link

Thanks for your work!
I try to use video2dataset to download YT-Temporal-1B. However, it reports too many requests while downloading... Could you give me some advice on how to fix this problem?

@SlotherCui
Copy link
Contributor

If you are using the official video2dataset script to download raw videos, YouTube may restrict your request frequency, resulting in too many requests issues. To address this problem, you can consider employing techniques such as setting up IP proxies to alleviate the restrictions. However, when constructing YT-SB-1B, we only made requests to the interface responsible for obtaining storyboard images. Fortunately, this specific interface does not impose restrictions on the number of requests(at least not during our crawling process).

@LengSicong
Copy link
Author

Hi, thanks for your prompt reply. May I know how I can just make requests to the interface responsible for obtaining storyboard images? Since the official instruction given here is using video2dataset for downloading storyboard images.

@SlotherCui
Copy link
Contributor

We use the thumbframes_dl

@LengSicong
Copy link
Author

May I know if the storyboard images downloaded through thumbframes_dl contain the time stamp information, which may be used to construct the interleaved video-text data in the next step?

@SlotherCui
Copy link
Contributor

You can refer to this code , The time intervals of storyboard images are continuous and fixed, and the timestamps can be inferred.

@clownrat6
Copy link

Hello, I meet the same problem ("HTTPError: 429 Client Error: Too Many Requests for url: xxx") when downloading subtitles. Is there any advice?

@SlotherCui
Copy link
Contributor

SlotherCui commented Jan 16, 2024

Certainly. The most widely-used and effective solution is to set up IP proxies. However, this requires purchasing IP proxy services. Another approach is to extend the interval between requests. Adjusting the request frequency might help alleviate the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants