-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trying to upload the image and while generating the hocr format getting this issue #15
Comments
Thank you for using gcv2hocr. please upload your Capture.jpg.json. How to use makepdf.sh
You have to edit makepdf.sh before execute. |
thanks for quick update I am new to ocr technology and just checking the
scope of it.Found very interesting
…On Fri, Mar 9, 2018 at 5:02 AM, dinosauria123 ***@***.***> wrote:
Thank you for using gcv2hocr.
please upload your Capture.jpg.json.
How to use makepdf.sh
1. Go to the same place at makepdf.sh
2. Execute " sh ./makepdf.sh "
You have to edit makepdf.sh before execute.
In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have
page001.jpg to page032.jpg.
You may want to convert different number of jpegs, If you have only one
jpeg,
You just edit the first line of makepdf.sh as "while [ $a -le 1 ]"
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMaNOV0NPxmcJbMwEIdxg6-f54S6Lkutks5tcb-LgaJpZM4SjMqH>
.
|
Hi dinosauria123 i want to convert hocr format to different format
xls,xml,pdf,docx is there any tool or script there.
…On Fri, Mar 9, 2018 at 5:05 AM, aman gupta ***@***.***> wrote:
thanks for quick update I am new to ocr technology and just checking the
scope of it.Found very interesting
On Fri, Mar 9, 2018 at 5:02 AM, dinosauria123 ***@***.***>
wrote:
> Thank you for using gcv2hocr.
>
> please upload your Capture.jpg.json.
>
> How to use makepdf.sh
>
> 1. Go to the same place at makepdf.sh
> 2. Execute " sh ./makepdf.sh "
>
> You have to edit makepdf.sh before execute.
> In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have
> page001.jpg to page032.jpg.
> You may want to convert different number of jpegs, If you have only one
> jpeg,
> You just edit the first line of makepdf.sh as "while [ $a -le 1 ]"
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#15 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AMaNOV0NPxmcJbMwEIdxg6-f54S6Lkutks5tcb-LgaJpZM4SjMqH>
> .
>
--
<https://bottr.me/amangupta577?utm_source=emailSignature>
Aman Gupta
@amangupta577 <https://bottr.me/amangupta577?utm_source=emailSignature>
<https://www.facebook.com/app_scoped_user_id/1747714118589975/>
|
This is what you may want ? |
Or this one ? |
I dont get it it dont have hocr format in it
…On Fri, Mar 9, 2018 at 5:15 AM, dinosauria123 ***@***.***> wrote:
This is what you may want ?
https://www.zotero.org/support/dev/translators
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMaNOWE5yE0UgGHL49Sei6RQWFV557bBks5tccKVgaJpZM4SjMqH>
.
|
Do you want to convert images to hocr ? You may use Tesseract OCR. |
no i got the hocr format , i see i can convert it to pdf but the challenge
now is i want to convert this hocr to different formats like
xml,txt,docx,xls extensions .
…On Fri, Mar 9, 2018 at 5:22 AM, dinosauria123 ***@***.***> wrote:
Do you want to convert images to hocr ?
You may use Tesseract OCR.
https://github.com/tesseract-ocr/tesseract
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMaNOVZ61Ty1nSMc7JfggjSJoEXPE7Kbks5tccQkgaJpZM4SjMqH>
.
|
I think you have to use multiple tools. pdf may have many tools to convert to other format... |
yes i was trying that but after trying to change online recongized pdf into
excel format , its saying cant detect the file and not changing to xls so
stuck here
…On Fri, Mar 9, 2018 at 5:30 AM, dinosauria123 ***@***.***> wrote:
I think you have to use multiple tools.
for example, hocr to pdf is possible hocr-tools.
https://github.com/tmbdev/hocr-tools#hocr-pdf
pdf may have many tools to convert to other format...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMaNOZgBUgAG6Y6ONyvZ2RS-hWA0rNdAks5tccYigaJpZM4SjMqH>
.
|
Do you know Alto ? If you want to deal with OCR format, Alto is better than hocr. |
Dear User,
Your file "scanned.pdf" contains scanned or image textual data. Converting
this PDF requires OCR to successfully complete the conversion and retrieve
the text. This feature is exclusively available to our Cometdocs Premium
Users.
Learn more about how to become a premium user here:
http://www.cometdocs.com/user/subscriptions
Best Regards,
Cometdocs Team.
Privacy Policy <http://www.cometdocs.com/privacy-policy.html>
21530700 Ontario Inc
102A-1075 Bay Street, Suite 324, Toronto, ON, M5S 2B2
<https://maps.google.com/?q=1075+Bay+Street,+Suite+324,+Toronto,+ON,+M5S+2B2&entry=gmail&source=g>
GOT THIS FYI
…On Fri, Mar 9, 2018 at 5:53 AM, dinosauria123 ***@***.***> wrote:
More easy ways, Google Drive converts pdf to Excel files.
https://techtites.com/convert-pdf-google-drive/
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMaNOeoy2U06IeqM1YM4R6TuCbcfefDsks5tcctmgaJpZM4SjMqH>
.
|
I never used this, but I think it is what you want ... I think this topic is not related to gcv2hocr, may I close this issue ? |
python gcv2hocr.py Capture.jpg.json > capture.hocr
Traceback (most recent call last):
File "gcv2hocr.py", line 146, in
page = fromResponse(resp, **args.dict)
File "gcv2hocr.py", line 99, in fromResponse
word.htmlid="word_%d_%d" % (len(page.content) - 1, len(curline.content))
AttributeError: 'NoneType' object has no attribute 'content'
The text was updated successfully, but these errors were encountered: