Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
LayoutLMv2FeatureExtractor now supports non-English languages when ap…
…plying Tesseract OCR. (huggingface#14514) * Added the lang argument to apply_tesseract in feature_extraction_layoutlmv2.py, which is used in pytesseract.image_to_data. * Added ocr_lang argument to LayoutLMv2FeatureExtractor.__init__, which is used when calling apply_tesseract * Updated the documentation of the LayoutLMv2FeatureExtractor * Specified in the documentation of the LayoutLMv2FeatureExtractor that the ocr_lang argument should be a language code. * Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Split comment into two lines to adhere to the max line size limit. * Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
- Loading branch information