Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.error.HTTPError: HTTP Error 404: Not Found #36

Open
oregonpillow opened this issue Oct 7, 2021 · 10 comments
Open

urllib.error.HTTPError: HTTP Error 404: Not Found #36

oregonpillow opened this issue Oct 7, 2021 · 10 comments

Comments

@oregonpillow
Copy link

oregonpillow commented Oct 7, 2021

Traceback (most recent call last):
  File "run.py", line 7, in <module>
    print(get_subtitles(video, lang='chi_sim+eng', sim_threshold=70, conf_threshold=65))
  File "/home/ubuntu/Github/videocr/env/lib/python3.8/site-packages/videocr/api.py", line 8, in get_subtitles
    utils.download_lang_data(lang)
  File "/home/ubuntu/Github/videocr/env/lib/python3.8/site-packages/videocr/utils.py", line 21, in download_lang_data
    with urlopen(url) as res, open(filepath, 'w+b') as f:
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

not sure why this is happening. I'm guessing it's a version problem. Trying to run the example code with my own video (full system path specified)

@apm1467 Any chance you could provide the exact python version, tesseract version you used successfully?

@Mschul
Copy link

Mschul commented Oct 19, 2021

I am facing to exact same problem. I also tried fixing the urls referenced in constants.py
TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/master/{}.traineddata'
TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/master/script/{}.traineddata'
since paths changed, but didnt solve the problem.

@Mageikk
Copy link

Mageikk commented Oct 26, 2021

I had the same issue, and I don't know how to fix the automated download. However, if you simply go to https://github.com/tesseract-ocr/tessdata_best or https://github.com/tesseract-ocr/tessdata_fast, manually download the language files you need (so when in doubt just get all of them) and put them into the folder also referenced in constants.py you will not need the automated download anymore. Not perfect, but good enough for me

@hw-lunemann
Copy link

I ran into the same issue and putting

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/blob/main/{}.traineddata?raw=true'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/blob/main/{}.traineddata?raw=true'

in constants.py fixes the downloading issue!

@feanor3
Copy link

feanor3 commented Oct 31, 2021

@hadis-git are you sure? it still gives me error.
substituting {} with the language needed worked.

@hw-lunemann
Copy link

Yes, I am sure,
The lang parameter in

def get_subtitles(
is split by '+', substituted into those constants. Then the models are downloaded here
def download_lang_data(lang: str):

So you have to make sure that your lang parameter corresponds to one or more of the available models.

@hw-lunemann
Copy link

What's the error you get?

@feanor3
Copy link

feanor3 commented Nov 1, 2021

Traceback (most recent call last):
File "example.py", line 6, in
videocr.save_subtitles_to_file('out.mkv', lang='dan')
File "C:\Users\CrisMattGiov\AppData\Roaming\Python\Python38\site-packages\videocr\api.py", line 20, in save_subtitles_to_file
f.write(get_subtitles(
File "C:\Users\CrisMattGiov\AppData\Roaming\Python\Python38\site-packages\videocr\api.py", line 8, in get_subtitles
utils.download_lang_data(lang)
File "C:\Users\CrisMattGiov\AppData\Roaming\Python\Python38\site-packages\videocr\utils.py", line 21, in download_lang_data
with urlopen(url) as res, open(filepath, 'w+b') as f:
File "C:\Program Files\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Program Files\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Program Files\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Program Files\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Program Files\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

@hw-lunemann
Copy link

hw-lunemann commented Nov 1, 2021

Well, it's telling you that the url to the language models is wrong.
How about you print(url) out that url right here?

Then you'll see if you edited the right constants.py

@hsnfirdaus
Copy link

This is because the branch name of tessdata_fast and tessdata_best changed from master to main, so the URL in file videocr/constants.py must changed, from :

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/master/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/master/script/{}.traineddata'

to

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/main/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/main/script/{}.traineddata'

we must wait for owner of this repository fix this issue, otherwise if you want to change it yourself, change this file in your pip library installation directory, in linux if you install using pip the directory is ~/.local/lib/python{version}/site-packages/videocr/ or /usr/local/lib/python{version}/dist-packages check in google for other OS.

@xiaoliwang
Copy link

xiaoliwang commented May 15, 2022

It should move

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/master/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/master/script/{}.traineddata'

to

TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/blob/main/{}.traineddata'

TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/blob/main/{}.traineddata'

now.

you can also download the traineddata file and put it to filepath as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants