-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Only one of languages and ocr_languages should be specified. languages is preferred. ocr_language... #2293
Comments
@awalker4, I want to double-check that you want handle the incorrect ways of passing |
Yeah, I suppose that should get turned into a 400 error |
This was referenced Jan 8, 2024
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 11, 2024
This PR is one in a series of PRs for refactoring and fixing the `languages` parameter so it can address incorrect input by users. #2293 This PR adds a dictionary for helping map fully spelled out languages to tesseract language codes --------- Co-authored-by: Roman Isecke <136338424+rbiseck3@users.noreply.github.com>
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 16, 2024
This PR is one in a series of PRs for refactoring and fixing the `languages` parameter so it can address incorrect input by users. #2293 Refactor `_convert_language_code_to_pytesseract_lang_code` and extract `_get_iso639_language_object` to its own function ``` from unstructured.partition.lang import _convert_language_code_to_pytesseract_lang_code as convert convert("English") # this will raise an error on both main and this branch convert("en") # this will return "eng" on both branches ```
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 19, 2024
This PR is one in a series of PRs for refactoring and fixing the languages parameter so it can address incorrect input by users. #2293 This PR adds _clean_ocr_languages_arg. There are no calls to this function yet, but it will be called in later PRs related to this series.
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 29, 2024
This PR is the last in a series of PRs for refactoring and fixing the language parameters (`languages` and `ocr_languages` so we can address incorrect input by users. See #2293 It is recommended to go though this PR commit-by-commit and note the commit message. The most significant commit is "update check_languages..."
Closing as final PR for this issue is merged |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The API is hitting this error from Unstructured. We had decided that this shouldn't throw an error, and just take the value from
languages
if both are provided.Some example inputs that cause this:
So it seems we need to:
ocr_languages
being sent whenlanguages
is empty. We can log a warning that this is deprecated.languages
if both are setThe text was updated successfully, but these errors were encountered: