Error when start=0 #25

R4yL-dev · 2020-07-21T13:23:33Z

Hello,

First, thank's for your amazing job.
I have a problem with the lib when i try to use it in a Python script.

I try to convert a PDF who got 14 pages and i can't convert the first one. When i use the parameter start=0 for the parse() method i got this error:

>>> parse(pdf_file, docx_file, start=0, end=13)
Processing 0/14...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/main.py", line 35, in parse
    layout = pdf.parse(page)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 126, in parse
    layout = self.layout(page)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 108, in layout
    raw_layout['rects'] = self.rects(page)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 89, in rects
    rects = pdf_shape.rects_from_source(page_content, height)
  File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/pdf_shape.py", line 194, in rects_from_source
    if  lines[i+1] in ('f', 'F', 'f*'):
IndexError: list index out of range

If i use parse() with start=1, it's work, but i don't get the first page of the document. I tried to use parse() without parameter and i got the same error.

Do i use it bad ? I miss something ?

The text was updated successfully, but these errors were encountered:

dothinking · 2020-07-21T16:02:09Z

Hi Ray-cmd, thanks for your comments.

The way you using this lib is correct, but the released version 0.1.0 is a little bit out-of-date. It's not able to accommodate the layout, especially shapes, in your first page.

I'm releasing version 0.2.0 with latest commits from master branch. It should fix your issue, but not sure if it converts your pdf perfectly. Please upgrade your lib and have a try.

pip install --upgrade pdf2docx

R4yL-dev · 2020-07-21T16:27:58Z

I just updated to 0.2.0 and now it's work !

thank you for your availability.

dothinking self-assigned this Jul 21, 2020

dothinking added the bug Something isn't working label Jul 21, 2020

R4yL-dev closed this as completed Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when start=0 #25

Error when start=0 #25

R4yL-dev commented Jul 21, 2020

dothinking commented Jul 21, 2020

R4yL-dev commented Jul 21, 2020

Error when start=0 #25

Error when start=0 #25

Comments

R4yL-dev commented Jul 21, 2020

dothinking commented Jul 21, 2020

R4yL-dev commented Jul 21, 2020