urllib.error.HTTPError: HTTP Error 404: Not Found #36

oregonpillow · 2021-10-07T20:49:42Z

Traceback (most recent call last):
  File "run.py", line 7, in <module>
    print(get_subtitles(video, lang='chi_sim+eng', sim_threshold=70, conf_threshold=65))
  File "/home/ubuntu/Github/videocr/env/lib/python3.8/site-packages/videocr/api.py", line 8, in get_subtitles
    utils.download_lang_data(lang)
  File "/home/ubuntu/Github/videocr/env/lib/python3.8/site-packages/videocr/utils.py", line 21, in download_lang_data
    with urlopen(url) as res, open(filepath, 'w+b') as f:
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

not sure why this is happening. I'm guessing it's a version problem. Trying to run the example code with my own video (full system path specified)

@apm1467 Any chance you could provide the exact python version, tesseract version you used successfully?

The text was updated successfully, but these errors were encountered:

Mschul · 2021-10-19T22:04:14Z

I am facing to exact same problem. I also tried fixing the urls referenced in constants.py
TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/master/{}.traineddata'
TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/master/script/{}.traineddata'
since paths changed, but didnt solve the problem.

Mageikk · 2021-10-26T19:50:46Z

I had the same issue, and I don't know how to fix the automated download. However, if you simply go to https://github.com/tesseract-ocr/tessdata_best or https://github.com/tesseract-ocr/tessdata_fast, manually download the language files you need (so when in doubt just get all of them) and put them into the folder also referenced in constants.py you will not need the automated download anymore. Not perfect, but good enough for me

hw-lunemann · 2021-10-31T15:40:42Z

I ran into the same issue and putting

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/blob/main/{}.traineddata?raw=true'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/blob/main/{}.traineddata?raw=true'

in constants.py fixes the downloading issue!

feanor3 · 2021-10-31T20:58:59Z

@hadis-git are you sure? it still gives me error.
substituting {} with the language needed worked.

hw-lunemann · 2021-10-31T23:41:21Z

Yes, I am sure,
The lang parameter in

videocr/videocr/api.py

Line 5 in 9b97c99

def get_subtitles(

is split by '+', substituted into those constants. Then the models are downloaded here

videocr/videocr/utils.py

Line 9 in 9b97c99

def download_lang_data(lang: str):

So you have to make sure that your lang parameter corresponds to one or more of the available models.

hw-lunemann · 2021-10-31T23:42:12Z

What's the error you get?

feanor3 · 2021-11-01T12:54:14Z

Traceback (most recent call last):
File "example.py", line 6, in
videocr.save_subtitles_to_file('out.mkv', lang='dan')
File "C:\Users\CrisMattGiov\AppData\Roaming\Python\Python38\site-packages\videocr\api.py", line 20, in save_subtitles_to_file
f.write(get_subtitles(
File "C:\Users\CrisMattGiov\AppData\Roaming\Python\Python38\site-packages\videocr\api.py", line 8, in get_subtitles
utils.download_lang_data(lang)
File "C:\Users\CrisMattGiov\AppData\Roaming\Python\Python38\site-packages\videocr\utils.py", line 21, in download_lang_data
with urlopen(url) as res, open(filepath, 'w+b') as f:
File "C:\Program Files\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Program Files\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Program Files\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Program Files\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Program Files\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

hw-lunemann · 2021-11-01T18:49:37Z

Well, it's telling you that the url to the language models is wrong.
How about you print(url) out that url right here?

videocr/videocr/utils.py

Line 20 in 9b97c99

Then you'll see if you edited the right constants.py

hsnfirdaus · 2021-11-14T12:59:01Z

This is because the branch name of tessdata_fast and tessdata_best changed from master to main, so the URL in file videocr/constants.py must changed, from :

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/master/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/master/script/{}.traineddata'

to

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/main/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/main/script/{}.traineddata'

we must wait for owner of this repository fix this issue, otherwise if you want to change it yourself, change this file in your pip library installation directory, in linux if you install using pip the directory is ~/.local/lib/python{version}/site-packages/videocr/ or /usr/local/lib/python{version}/dist-packages check in google for other OS.

xiaoliwang · 2022-05-15T15:23:11Z

It should move

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/master/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/master/script/{}.traineddata'

to

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/blob/main/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/blob/main/{}.traineddata'

now.

you can also download the traineddata file and put it to filepath as well.

This was referenced Nov 15, 2021

The URL used in utils/constants.py is no longer valid #39

Open

urllib.error.URLError: <urlopen error [Errno 61] Connection refused> #25

Open

ozkutuk mentioned this issue Feb 22, 2022

python3Packages.videocr: init at 0.1.6 NixOS/nixpkgs#161418

Merged

13 tasks

alexposito mentioned this issue Jan 6, 2023

404 error by using the example code #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib.error.HTTPError: HTTP Error 404: Not Found #36

urllib.error.HTTPError: HTTP Error 404: Not Found #36

oregonpillow commented Oct 7, 2021 •

edited

Loading

Mschul commented Oct 19, 2021

Mageikk commented Oct 26, 2021

hw-lunemann commented Oct 31, 2021

feanor3 commented Oct 31, 2021 •

edited

Loading

hw-lunemann commented Oct 31, 2021

hw-lunemann commented Oct 31, 2021

feanor3 commented Nov 1, 2021

hw-lunemann commented Nov 1, 2021 •

edited

Loading

hsnfirdaus commented Nov 14, 2021

xiaoliwang commented May 15, 2022 •

edited

Loading

urllib.error.HTTPError: HTTP Error 404: Not Found #36

urllib.error.HTTPError: HTTP Error 404: Not Found #36

Comments

oregonpillow commented Oct 7, 2021 • edited Loading

Mschul commented Oct 19, 2021

Mageikk commented Oct 26, 2021

hw-lunemann commented Oct 31, 2021

feanor3 commented Oct 31, 2021 • edited Loading

hw-lunemann commented Oct 31, 2021

hw-lunemann commented Oct 31, 2021

feanor3 commented Nov 1, 2021

hw-lunemann commented Nov 1, 2021 • edited Loading

hsnfirdaus commented Nov 14, 2021

xiaoliwang commented May 15, 2022 • edited Loading

oregonpillow commented Oct 7, 2021 •

edited

Loading

feanor3 commented Oct 31, 2021 •

edited

Loading

hw-lunemann commented Nov 1, 2021 •

edited

Loading

xiaoliwang commented May 15, 2022 •

edited

Loading