Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Imgur 404 error but link works in browser #869

Open
3 tasks done
luckybear992 opened this issue May 31, 2023 · 8 comments
Open
3 tasks done

[BUG] Imgur 404 error but link works in browser #869

luckybear992 opened this issue May 31, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@luckybear992
Copy link

  • I am reporting a bug.
  • I am running the latest version of BDfR
  • I have read the Opening an issue

Description

imgur links keep giving a 404 error even though they work on my browser. An imgur link such as https://i.imgur.com/xxxxxx.gifv opens up on my browser. https://i.imgur.com/xxxxxx WITHOUT the gifv extension loads a 404 page. The two 404 links in the log I provided work fine on my browser using the i.imgur link that ends with .gifv extension

Command

python3 -m bdfr download L:\bdfr --subreddit thatsthespot --no-dupes

Environment

  • OS: Windows 10
  • Python version: 3.10.11

Logs

[2023-05-31 09:50:26,348 - bdfr.connector - DEBUG] - Setting maximum download wait time to 120 seconds
[2023-05-31 09:50:26,348 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2023-05-31 09:50:26,349 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2023-05-31 09:50:26,349 - bdfr.connector - Level 9] - Created download filter
[2023-05-31 09:50:26,349 - bdfr.connector - Level 9] - Created time filter
[2023-05-31 09:50:26,349 - bdfr.connector - Level 9] - Created sort filter
[2023-05-31 09:50:26,350 - bdfr.connector - Level 9] - Create file name formatter
[2023-05-31 09:50:26,350 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2023-05-31 09:50:26,351 - bdfr.connector - Level 9] - Created site authenticator
[2023-05-31 09:50:26,802 - bdfr.connector - DEBUG] - Added submissions from subreddit thatsthespot
[2023-05-31 09:50:26,803 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-05-31 09:50:26,803 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-05-31 09:50:26,803 - bdfr.connector - Level 9] - Retrieved user data
[2023-05-31 09:50:26,803 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-05-31 09:50:38,557 - bdfr.downloader - DEBUG] - Attempting to download submission 13w4i73
[2023-05-31 09:50:38,558 - bdfr.downloader - DEBUG] - Using Imgur with url https://i.imgur.com/DnZYrnB.gifv
[2023-05-31 09:50:38,750 - bdfr.downloader - ERROR] - Site Imgur failed to download submission 13w4i73: Server responded with 404 to https://imgur.com/DnZYrnB
[2023-05-31 09:50:38,751 - bdfr.downloader - DEBUG] - Attempting to download submission 13vrada
[2023-05-31 09:50:38,751 - bdfr.downloader - DEBUG] - Using Redgifs with url https://redgifs.com/watch/parchedvalidhog
[2023-05-31 09:50:38,939 - bdfr.downloader - DEBUG] - File L:\bdfr\thatsthespot\twitchrule_She is really really cute when she want that cum_13vrada.mp4 from submission 13vrada already exists, continuing
[2023-05-31 09:50:38,939 - bdfr.downloader - INFO] - Downloaded submission 13vrada from thatsthespot
[2023-05-31 09:50:38,940 - bdfr.downloader - DEBUG] - Attempting to download submission 13vc2l6
[2023-05-31 09:50:38,940 - bdfr.downloader - DEBUG] - Using Redgifs with url https://www.redgifs.com/watch/pointlesscanineibizanhound#rel=user%3Aariacolexo;order=new
[2023-05-31 09:50:39,105 - bdfr.downloader - DEBUG] - File L:\bdfr\thatsthespot\ariacole___I am an expert at finding just the right spot.._13vc2l6.mp4 from submission 13vc2l6 already exists, continuing
[2023-05-31 09:50:39,106 - bdfr.downloader - INFO] - Downloaded submission 13vc2l6 from thatsthespot
[2023-05-31 09:50:39,106 - bdfr.downloader - DEBUG] - Attempting to download submission 13vnb2c
[2023-05-31 09:50:39,106 - bdfr.downloader - DEBUG] - Using Imgur with url https://i.imgur.com/SScXYtM.gifv
[2023-05-31 09:50:39,306 - bdfr.downloader - ERROR] - Site Imgur failed to download submission 13vnb2c: Server responded with 404 to https://imgur.com/SScXYtM
@luckybear992 luckybear992 added the bug Something isn't working label May 31, 2023
@michaeljaeger95
Copy link

I also cannot download anything via Imgur, regardless of file type despite the link working as intended in the browser.

Except for me the error is (for every download):

Site Imgur failed to download submission xxxxxx: server responded with 404 to https://api.imgur.com/3/image/yyyyyyy

@ElleEllie
Copy link

ElleEllie commented Jun 4, 2023

Can confirm as well, that I too can't download anything from the imgur.

@Barborica-Alexandru
Copy link

Barborica-Alexandru commented Jun 5, 2023

Yes and navigating to the link in a browser will unveil the reason:
error | "Authentication required"
It would seem the API has been gated.
The error reported by BDFR is a 404, even though the actual error is a 401. This might be a bug in the code, unrelated to this issue.

@OMEGARAZER
Copy link
Contributor

Yes and navigating to the link in a browser will unveil the reason: error | "Authentication required" It would seem the API has been gated. The error reported by BDFR is a 404, even though the actual error is a 401. This might be a bug in the code, unrelated to this issue.

The reason you're getting 401 from that link is the same reason I mention in #828 you're missing the auth headers to access that API link.

As for the rest of the issue at hand here, There are a lot of things being removed from Imgur right now. It seems they're being removed from the API first and the direct file links will sometimes work for a bit afterwards. You can work around this for direct links with an edit to the download_factory but I would not advise it long term as any dead link will just pick up the removed image and treat it like it's been successful. Also any malformed links provided by the Reddit API can just download the HTML of the 404 page as the downloader will not see the redirect and think it's getting the right file. It's the main reason the change to the API was made in the first place.

If you are willing to run with those caveats or are willing to double-check them all here is the patch:

change this:

        if re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url):
            return Imgur
        elif re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
            return Redgifs
        elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url):
            return Gfycat
        elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource(
            sanitised_url
        ):
            return Direct

to this:

        if re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
            return Redgifs
        elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url):
            return Gfycat
        elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource(
            sanitised_url
        ):
            return Direct
        elif re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url):
            return Imgur

Any gifv links will download as such with that change. If you would like them downloaded as mp4 you can insert the two new lines to downloader at line 96:

        try:
            if submission.url.endswith(".gifv"):
                submission.url = submission.url.replace(".gifv", ".mp4")
            downloader_class = DownloadFactory.pull_lever(submission.url)

These edits are provided as-is and I won't be providing additional support for them.

@Barborica-Alexandru
Copy link

Yes and navigating to the link in a browser will unveil the reason: error | "Authentication required" It would seem the API has been gated. The error reported by BDFR is a 404, even though the actual error is a 401. This might be a bug in the code, unrelated to this issue.

The reason you're getting 401 from that link is the same reason I mention in #828 you're missing the auth headers to access that API link.

As for the rest of the issue at hand here, There are a lot of things being removed from Imgur right now. It seems they're being removed from the API first and the direct file links will sometimes work for a bit afterwards. You can work around this for direct links with an edit to the download_factory but I would not advise it long term as any dead link will just pick up the removed image and treat it like it's been successful. Also any malformed links provided by the Reddit API can just download the HTML of the 404 page as the downloader will not see the redirect and think it's getting the right file. It's the main reason the change to the API was made in the first place.

If you are willing to run with those caveats or are willing to double-check them all here is the patch:

change this:

        if re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url):
            return Imgur
        elif re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
            return Redgifs
        elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url):
            return Gfycat
        elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource(
            sanitised_url
        ):
            return Direct

to this:

        if re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
            return Redgifs
        elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url):
            return Gfycat
        elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource(
            sanitised_url
        ):
            return Direct
        elif re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url):
            return Imgur

Any gifv links will download as such with that change. If you would like them downloaded as mp4 you can insert the two new lines to downloader at line 96:

        try:
            if submission.url.endswith(".gifv"):
                submission.url = submission.url.replace(".gifv", ".mp4")
            downloader_class = DownloadFactory.pull_lever(submission.url)

These edits are provided as-is and I won't be providing additional support for them.

Oh i understand now. Some of the submissions where very recent so I hadn't considered they could already be removed.

@AlexTu2
Copy link

AlexTu2 commented Jun 12, 2023

or are willing to double-check them all here is

@OMEGARAZER

Is there a way to figure out which files need to be double checked?
Then a way to save the corresponding file to the right location, named and all?

@miguel7501
Copy link

@AlexTu2

or are willing to double-check them all here is

@OMEGARAZER

Is there a way to figure out which files need to be double checked? Then a way to save the corresponding file to the right location, named and all?

bdfr has the --no-dupes option that promises to avoid downloading the same image/video twice by comparing hashes. Since the 'removed' image is the same every time, that option catches it. You'll just get one of them and bdfr will skip all other posts that were removed by imgur.

I'm currently re-downloading my saved posts with this fix and the --no-dupes option, the log displays "Resource hash d835884373f4d6c8f24742ceabe74946 from submission downloaded elsewhere" messages every now and then so I'm confident it's working.

@Serene-Arc
Copy link
Owner

Plus the images are all exactly the same (absurdly low) size. It's easy to use a tool like find to get them all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants