Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform OCR for more than one language by fscrawler #626

Closed
mohsen-zanjani opened this issue Nov 11, 2018 · 2 comments
Closed

Perform OCR for more than one language by fscrawler #626

mohsen-zanjani opened this issue Nov 11, 2018 · 2 comments
Assignees

Comments

@mohsen-zanjani
Copy link

It seems fscrawler doesn't have the ability to perform OCR simultaneously (at the same time) for multiple languages (more than one language).

I have also described the problem in the below link:
Perform OCR for more than one language by fscrawler

It would be very nice if this feature was added to the fscrawler.
Something like this:

"fs" : {
    "ocr" : {
      "language": "eng+fra"
    }
  }

OR this:

"fs" : {
    "ocr" : {
      "language": "eng"
    },
   "ocr" : {
      "language": "fra"
    }
  }

Thanks ...

@mohsen-zanjani
Copy link
Author

After investigations, I realized that it was possible in Tika to specify multiple languages ​​for OCR.
To do this, simply concatenate the desired languages ​​with the '+' sign.
For example: "eng+fas+fra"

The same can be done in the fscrawler and set as a value for the language attribute.

"fs" : {
    "ocr" : {
      "language": "eng+fas+fra"
    }
  }

@dadoonet
Copy link
Owner

Thanks! I added some documentation based on your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants