Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate document_ocr v10 #140

Closed

Conversation

cmsalmeida
Copy link

@cmsalmeida cmsalmeida commented Jun 1, 2017

Migration of module document_ocr to 10.0
With some enhancements:

  1. System parameters to control DPI and Quality of conversion from PDF to image
  2. System parameter to define default language for OCR
  3. OnChange() language in attachment form, apply OCR again
  4. OCR of PDF one page at a time, this allow one to process big PDF files
  5. All PDFs will pass through OCR

@pedrobaeza pedrobaeza mentioned this pull request Jun 1, 2017
12 tasks
@cmsalmeida
Copy link
Author

It's failing in tesseract dependency.
It must be installed like: apt-get install tesseract-ocr tesseract-ocr-eng
Can someone help me how to configure repo to pass this requirement?
Thanks in advance

@pedrobaeza
Copy link
Member

Just put a requirements.txt.

@cmsalmeida
Copy link
Author

I did, it's there

requirements.txt Outdated
@@ -0,0 +1,2 @@
tesseract-ocr
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here!

requirements.txt Outdated
@@ -0,0 +1,2 @@
tesseract-ocr
tesseract-ocr-eng
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

@pedrobaeza
Copy link
Member

Then everything is OK in that part. The problem should be another.

@pedrobaeza
Copy link
Member

Maybe an outdated .travis.yml? Check latest template on https://github.com/OCA/maintainer-quality-tools/blob/master/sample_files/.travis.yml and compare with this.

@cmsalmeida
Copy link
Author

This module needs tesseract.
In Ubuntu one can install it issuing:
sudo apt-get install tesseract-ocr tesseract-ocr-eng
How to make it work in travis?

@cmsalmeida cmsalmeida closed this Jun 5, 2017
@pedrobaeza
Copy link
Member

How is this install in previous versions? Use the same technique.

@cmsalmeida
Copy link
Author

cmsalmeida commented Jun 5, 2017

I didn't did the previous versions, I got the code from 8.0, did the migration and enhancements, it's working in my servers.
I did this PR to OCA to share, because now it's really good, and it works for big PDFs (many pages).
It's shared, if someone can merge it would be great.

@cmsalmeida cmsalmeida reopened this Jun 5, 2017
@cmsalmeida
Copy link
Author

I can't spend more time on this module, sorry....
It's demotivator trying to contribute, and not being able to :(
Any way you have the code there, or if someone interested in use it it's available for all at:
https://github.com/thinkopensolutions/oca-knowledge
Thanks any way.

@pedrobaeza
Copy link
Member

OK, as you consider. I think knowing all the internals of the processes, like Travis and so on, make you better later facing other developments, but if you don't want to expense more time, we can't force you.

@cmsalmeida
Copy link
Author

Can you please point me to documentation on that?
Or any code/configuration that works?

@pedrobaeza
Copy link
Member

In 8.0, the dependency is added via Travis packages: https://github.com/OCA/knowledge/blob/8.0/.travis.yml#L10.

You can make the same here.

@ovnicraft
Copy link
Member

👍

@cmsalmeida cmsalmeida force-pushed the migrate_document_ocr_v10 branch 4 times, most recently from f1c5c82 to a58c406 Compare June 6, 2017 11:40
@cmsalmeida
Copy link
Author

cmsalmeida commented Jun 6, 2017

Hi,
For the tests to run I need to force the import of PdfImagePlugin and PalmImagePlugin from PIL.
But there is no statement where to use that import. Flake8 is giving import not used error.
Can one tell me how to solve this?

@cmsalmeida
Copy link
Author

I found that, if one use pylint and flake8 one can ignore unused import warning in both tools in this way:
import IMPORT # noqa # pylint: disable=unused-import
This way you explicitly tell pylint and flake8 that this unused import in intended.

@pedrobaeza
Copy link
Member

Yeah, that's it. I don't understand why don't you use it if you import it, but you can bypass it that way.

Copy link
Member

@hbrunn hbrunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, you didn't follow the guidelines for migrations, so previous commits are not preserved and it's quite inefficient to review this way (there's no way to tell what you added and what existed before), that's why I'm not going to go into details right now. I'd suggest to follow the above and squash all your commits to one, then reviewing this commit is sufficient.
I understand newcomer's frustration, but most of this can be avoided by reading the new project's contribution guidelines before contributing.

@pedrobaeza
Copy link
Member

I also add the link to the specific migration guide to 10.0. As this module is not in 9.0, you should replace 9.0 by 8.0: https://github.com/OCA/maintainer-tools/wiki/Migration-to-version-10.0

@cmsalmeida
Copy link
Author

cmsalmeida commented Jun 6, 2017

Thank you for your help.
Unfortunately I really can't spend more time in this process.
This module is now publicly available at: https://github.com/thinkopensolutions/tko-addons/tree/10.0/tko_document_ocr
It is available on Odoo apps also.

@cmsalmeida cmsalmeida closed this Jun 6, 2017
@cmsalmeida
Copy link
Author

I will try this again soon...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants