You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
as I was having a lot of issues installing all the dependencies on our system, I tried to provide the OCRmyPDF with docker image. Can you please check if this is done the right way, and it's compatible with workflow_ocr. Things are working, but it's not fully tested. Maybe this can help some people to not worry about so much dependencies. Thank you very much for the amazing work!
Providing OCRmyPDF with docker
/opt/ocrmypdf/dockerfile
FROM jbarlow83/ocrmypdf
RUN apt install tesseract-ocr-yourlang
Hi @l00v3, sounds like an interesting idea. Of course in general it is possible to replace the command line binary ocrmypdf through a docker call as long as it is able to stream in the stdin and stream out the stdout (because the app relies on this feature).
Regarding your recipe i would give you the following feedback:
I'd suggest to tag your custom image with docker build -t jbarlow83/ocrmypdf-custom . (see here). You could then just run the command like docker run ... jbarlow83/ocrmypdf-custom "$@" (see here) and you don't have to keep track of your image id.
The command issued by the app is defined here and mainly runs ocrmypdf --redo-ocr -q - -. As you can see currently there is no language information added (-l eng+yourlang) but this could be a feature for the future and might then conflict with your docker spin up command if the parameter is added twice (once by your script and once by the app itself).
Depending on the system you're running the user executing the webserver process might differ from apache. On debian systems it is most likely www-data. But of course this user has to be added to the docker group to run docker containers.
From a security point of view point 3 could be quite dangerous because then an attacker taking over control of your webserver is able to control your complete docker ecosystem via the docker daemon. This is a point i'd really like to avoid even though i find your idea very interesting. Maybe we have some ideas to get this under control @bahnwaerter ? Then we could offer an alternative installation method of course.
Hello,
as I was having a lot of issues installing all the dependencies on our system, I tried to provide the OCRmyPDF with docker image. Can you please check if this is done the right way, and it's compatible with workflow_ocr. Things are working, but it's not fully tested. Maybe this can help some people to not worry about so much dependencies. Thank you very much for the amazing work!
Providing OCRmyPDF with docker
/opt/ocrmypdf/dockerfile
Build docker image
/usr/bin/ocrmypdf
Chmod
On our system I had to add apache user to docker group
The text was updated successfully, but these errors were encountered: