trying to upload the image and while generating the hocr format getting this issue #15

guptaaman2011 · 2018-03-08T19:15:57Z

python gcv2hocr.py Capture.jpg.json > capture.hocr
Traceback (most recent call last):
File "gcv2hocr.py", line 146, in
page = fromResponse(resp, **args.dict)
File "gcv2hocr.py", line 99, in fromResponse
word.htmlid="word_%d_%d" % (len(page.content) - 1, len(curline.content))
AttributeError: 'NoneType' object has no attribute 'content'

dinosauria123 · 2018-03-08T23:32:27Z

Thank you for using gcv2hocr.

please upload your Capture.jpg.json.

How to use makepdf.sh

Go to the same place at makepdf.sh
Execute " sh ./makepdf.sh "

You have to edit makepdf.sh before execute.
In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have page001.jpg to page032.jpg.
You may want to convert different number of jpegs, If you have only one jpeg,
You just edit the first line of makepdf.sh as "while [ $a -le 1 ]"

guptaaman2011 · 2018-03-08T23:36:03Z

thanks for quick update I am new to ocr technology and just checking the scope of it.Found very interesting

…

On Fri, Mar 9, 2018 at 5:02 AM, dinosauria123 ***@***.***> wrote: Thank you for using gcv2hocr. please upload your Capture.jpg.json. How to use makepdf.sh 1. Go to the same place at makepdf.sh 2. Execute " sh ./makepdf.sh " You have to edit makepdf.sh before execute. In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have page001.jpg to page032.jpg. You may want to convert different number of jpegs, If you have only one jpeg, You just edit the first line of makepdf.sh as "while [ $a -le 1 ]" — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMaNOV0NPxmcJbMwEIdxg6-f54S6Lkutks5tcb-LgaJpZM4SjMqH> .

-- <https://bottr.me/amangupta577?utm_source=emailSignature> Aman Gupta @amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature> <https://www.facebook.com/app_scoped_user_id/1747714118589975/>

guptaaman2011 · 2018-03-08T23:39:30Z

Hi dinosauria123 i want to convert hocr format to different format xls,xml,pdf,docx is there any tool or script there.

…

On Fri, Mar 9, 2018 at 5:05 AM, aman gupta ***@***.***> wrote: thanks for quick update I am new to ocr technology and just checking the scope of it.Found very interesting On Fri, Mar 9, 2018 at 5:02 AM, dinosauria123 ***@***.***> wrote: > Thank you for using gcv2hocr. > > please upload your Capture.jpg.json. > > How to use makepdf.sh > > 1. Go to the same place at makepdf.sh > 2. Execute " sh ./makepdf.sh " > > You have to edit makepdf.sh before execute. > In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have > page001.jpg to page032.jpg. > You may want to convert different number of jpegs, If you have only one > jpeg, > You just edit the first line of makepdf.sh as "while [ $a -le 1 ]" > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#15 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AMaNOV0NPxmcJbMwEIdxg6-f54S6Lkutks5tcb-LgaJpZM4SjMqH> > . > -- <https://bottr.me/amangupta577?utm_source=emailSignature> Aman Gupta @amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature> <https://www.facebook.com/app_scoped_user_id/1747714118589975/>

-- <https://bottr.me/amangupta577?utm_source=emailSignature> Aman Gupta @amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature> <https://www.facebook.com/app_scoped_user_id/1747714118589975/>

dinosauria123 · 2018-03-08T23:45:25Z

This is what you may want ?

https://www.zotero.org/support/dev/translators

dinosauria123 · 2018-03-08T23:47:39Z

Or this one ?

https://hub.docker.com/r/ubma/ocr-fileformat/

guptaaman2011 · 2018-03-08T23:47:54Z

I dont get it it dont have hocr format in it

…

On Fri, Mar 9, 2018 at 5:15 AM, dinosauria123 ***@***.***> wrote: This is what you may want ? https://www.zotero.org/support/dev/translators — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMaNOWE5yE0UgGHL49Sei6RQWFV557bBks5tccKVgaJpZM4SjMqH> .

-- <https://bottr.me/amangupta577?utm_source=emailSignature> Aman Gupta @amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature> <https://www.facebook.com/app_scoped_user_id/1747714118589975/>

dinosauria123 · 2018-03-08T23:52:03Z

Do you want to convert images to hocr ?

You may use Tesseract OCR.

https://github.com/tesseract-ocr/tesseract

dinosauria123 · 2018-03-08T23:53:24Z

Check here.

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#hocr-output

guptaaman2011 · 2018-03-08T23:54:22Z

no i got the hocr format , i see i can convert it to pdf but the challenge now is i want to convert this hocr to different formats like xml,txt,docx,xls extensions .

…

On Fri, Mar 9, 2018 at 5:22 AM, dinosauria123 ***@***.***> wrote: Do you want to convert images to hocr ? You may use Tesseract OCR. https://github.com/tesseract-ocr/tesseract — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMaNOVZ61Ty1nSMc7JfggjSJoEXPE7Kbks5tccQkgaJpZM4SjMqH> .

-- <https://bottr.me/amangupta577?utm_source=emailSignature> Aman Gupta @amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature> <https://www.facebook.com/app_scoped_user_id/1747714118589975/>

dinosauria123 · 2018-03-09T00:00:33Z

I think you have to use multiple tools.
for example, hocr to pdf is possible hocr-tools.
https://github.com/tmbdev/hocr-tools#hocr-pdf

pdf may have many tools to convert to other format...

guptaaman2011 · 2018-03-09T00:02:03Z

yes i was trying that but after trying to change online recongized pdf into excel format , its saying cant detect the file and not changing to xls so stuck here

…

On Fri, Mar 9, 2018 at 5:30 AM, dinosauria123 ***@***.***> wrote: I think you have to use multiple tools. for example, hocr to pdf is possible hocr-tools. https://github.com/tmbdev/hocr-tools#hocr-pdf pdf may have many tools to convert to other format... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMaNOZgBUgAG6Y6ONyvZ2RS-hWA0rNdAks5tccYigaJpZM4SjMqH> .

-- <https://bottr.me/amangupta577?utm_source=emailSignature> Aman Gupta @amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature> <https://www.facebook.com/app_scoped_user_id/1747714118589975/>

dinosauria123 · 2018-03-09T00:16:37Z

Do you know Alto ?
https://en.wikipedia.org/wiki/ALTO_(XML)

If you want to deal with OCR format, Alto is better than hocr.

https://github.com/altoxml/documentation/wiki/Software

guptaaman2011 · 2018-03-09T01:03:08Z

Dear User, Your file "scanned.pdf" contains scanned or image textual data. Converting this PDF requires OCR to successfully complete the conversion and retrieve the text. This feature is exclusively available to our Cometdocs Premium Users. Learn more about how to become a premium user here: http://www.cometdocs.com/user/subscriptions Best Regards, Cometdocs Team. Privacy Policy <http://www.cometdocs.com/privacy-policy.html> 21530700 Ontario Inc 102A-1075 Bay Street, Suite 324, Toronto, ON, M5S 2B2 <https://maps.google.com/?q=1075+Bay+Street,+Suite+324,+Toronto,+ON,+M5S+2B2&entry=gmail&source=g> GOT THIS FYI

…

On Fri, Mar 9, 2018 at 5:53 AM, dinosauria123 ***@***.***> wrote: More easy ways, Google Drive converts pdf to Excel files. https://techtites.com/convert-pdf-google-drive/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMaNOeoy2U06IeqM1YM4R6TuCbcfefDsks5tcctmgaJpZM4SjMqH> .

-- <https://bottr.me/amangupta577?utm_source=emailSignature> Aman Gupta @amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature> <https://www.facebook.com/app_scoped_user_id/1747714118589975/>

dinosauria123 · 2018-03-09T02:21:40Z

I never used this, but I think it is what you want ...
https://github.com/tabulapdf/tabula-extractor

http://tabula.technology/

I think this topic is not related to gcv2hocr, may I close this issue ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trying to upload the image and while generating the hocr format getting this issue #15

trying to upload the image and while generating the hocr format getting this issue #15

guptaaman2011 commented Mar 8, 2018

dinosauria123 commented Mar 8, 2018

guptaaman2011 commented Mar 8, 2018 via email

guptaaman2011 commented Mar 8, 2018 via email

dinosauria123 commented Mar 8, 2018

dinosauria123 commented Mar 8, 2018

guptaaman2011 commented Mar 8, 2018 via email

dinosauria123 commented Mar 8, 2018

dinosauria123 commented Mar 8, 2018

guptaaman2011 commented Mar 8, 2018 via email

dinosauria123 commented Mar 9, 2018

guptaaman2011 commented Mar 9, 2018 via email

dinosauria123 commented Mar 9, 2018

guptaaman2011 commented Mar 9, 2018 via email

dinosauria123 commented Mar 9, 2018 •

edited

Loading

trying to upload the image and while generating the hocr format getting this issue #15

trying to upload the image and while generating the hocr format getting this issue #15

Comments

guptaaman2011 commented Mar 8, 2018

dinosauria123 commented Mar 8, 2018

guptaaman2011 commented Mar 8, 2018 via email

guptaaman2011 commented Mar 8, 2018 via email

dinosauria123 commented Mar 8, 2018

dinosauria123 commented Mar 8, 2018

guptaaman2011 commented Mar 8, 2018 via email

dinosauria123 commented Mar 8, 2018

dinosauria123 commented Mar 8, 2018

guptaaman2011 commented Mar 8, 2018 via email

dinosauria123 commented Mar 9, 2018

guptaaman2011 commented Mar 9, 2018 via email

dinosauria123 commented Mar 9, 2018

guptaaman2011 commented Mar 9, 2018 via email

dinosauria123 commented Mar 9, 2018 • edited Loading

dinosauria123 commented Mar 9, 2018 •

edited

Loading