Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when start=0 #25

Closed
R4yL-dev opened this issue Jul 21, 2020 · 2 comments
Closed

Error when start=0 #25

R4yL-dev opened this issue Jul 21, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@R4yL-dev
Copy link

Hello,

First, thank's for your amazing job.
I have a problem with the lib when i try to use it in a Python script.

I try to convert a PDF who got 14 pages and i can't convert the first one. When i use the parameter start=0 for the parse() method i got this error:

>>> parse(pdf_file, docx_file, start=0, end=13)
Processing 0/14...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/main.py", line 35, in parse
    layout = pdf.parse(page)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 126, in parse
    layout = self.layout(page)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 108, in layout
    raw_layout['rects'] = self.rects(page)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 89, in rects
    rects = pdf_shape.rects_from_source(page_content, height)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/pdf_shape.py", line 194, in rects_from_source
    if  lines[i+1] in ('f', 'F', 'f*'):
IndexError: list index out of range

If i use parse() with start=1, it's work, but i don't get the first page of the document. I tried to use parse() without parameter and i got the same error.

Do i use it bad ? I miss something ?

@dothinking
Copy link
Collaborator

Hi Ray-cmd, thanks for your comments.

The way you using this lib is correct, but the released version 0.1.0 is a little bit out-of-date. It's not able to accommodate the layout, especially shapes, in your first page.

I'm releasing version 0.2.0 with latest commits from master branch. It should fix your issue, but not sure if it converts your pdf perfectly. Please upgrade your lib and have a try.

pip install --upgrade pdf2docx

@dothinking dothinking self-assigned this Jul 21, 2020
@dothinking dothinking added the bug Something isn't working label Jul 21, 2020
@R4yL-dev
Copy link
Author

I just updated to 0.2.0 and now it's work !

thank you for your availability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants