You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thank's for your amazing job.
I have a problem with the lib when i try to use it in a Python script.
I try to convert a PDF who got 14 pages and i can't convert the first one. When i use the parameter start=0 for the parse() method i got this error:
>>> parse(pdf_file, docx_file, start=0, end=13)
Processing 0/14...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/main.py", line 35, in parse
layout = pdf.parse(page)
File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 126, in parse
layout = self.layout(page)
File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 108, in layout
raw_layout['rects'] = self.rects(page)
File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/reader.py", line 89, in rects
rects = pdf_shape.rects_from_source(page_content, height)
File "/home/luca/.local/share/virtualenvs/taptouche_account_creator-2TJMnHyn/lib/python3.8/site-packages/pdf2docx/pdf_shape.py", line 194, in rects_from_source
if lines[i+1] in ('f', 'F', 'f*'):
IndexError: list index out of range
If i use parse() with start=1, it's work, but i don't get the first page of the document. I tried to use parse() without parameter and i got the same error.
Do i use it bad ? I miss something ?
The text was updated successfully, but these errors were encountered:
The way you using this lib is correct, but the released version 0.1.0 is a little bit out-of-date. It's not able to accommodate the layout, especially shapes, in your first page.
I'm releasing version 0.2.0 with latest commits from master branch. It should fix your issue, but not sure if it converts your pdf perfectly. Please upgrade your lib and have a try.
Hello,
First, thank's for your amazing job.
I have a problem with the lib when i try to use it in a Python script.
I try to convert a PDF who got 14 pages and i can't convert the first one. When i use the parameter
start=0
for theparse()
method i got this error:If i use
parse()
withstart=1
, it's work, but i don't get the first page of the document. I tried to useparse()
without parameter and i got the same error.Do i use it bad ? I miss something ?
The text was updated successfully, but these errors were encountered: