Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_pdf fails with an index out of range #306

Closed
sweco-sekrsv opened this issue Apr 10, 2019 · 4 comments
Closed

read_pdf fails with an index out of range #306

sweco-sekrsv opened this issue Apr 10, 2019 · 4 comments
Labels

Comments

@sweco-sekrsv
Copy link

Camelot 0.7.2
The pdf-file is also attached.

running this:
tables = camelot.read_pdf('3713-B31-24-04401_error.pdf',flavor='lattice', line_scale=30)

results in this error:

File "camelot_test04.py", line 205, in <module>
    tables = camelot.read_pdf(filename,flavor='lattice', line_scale=30)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\camelot\io.py", line 106, in read_pdf
    layout_kwargs=layout_kwargs, **kwargs)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\camelot\handlers.py", line 156, in parse
    self._save_page(self.filepath, p, tempdir)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\camelot\handlers.py", line 109, in _save_page
    layout, dim = get_page_layout(fpath)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\camelot\utils.py", line 689, in get_page_layout
    interpreter.process_page(page)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\pdfinterp.py", line 852, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\pdfinterp.py", line 862, in render_contents
    self.init_resources(resources)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\pdfinterp.py", line 362, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\pdfinterp.py", line 197, in get_font
    font = PDFTrueTypeFont(self, spec)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\pdffont.py", line 594, in __init__
    PDFSimpleFont.__init__(self, descriptor, widths, spec)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\pdffont.py", line 560, in __init__
    CMapParser(self.unicode_map, BytesIO(strm.get_data())).run()
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\cmapdb.py", line 287, in run
    self.nextobject()
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\psparser.py", line 616, in nextobject
    self.do_keyword(pos, token)
  File "C:\Users\seks13473\AppData\Local\Programs\Python\Python36\lib\site-packages\pdfminer\cmapdb.py", line 393, in do_keyword
    self.cmap.add_cid2unichr(s1+i, code[i])
IndexError: list index out of range

Any ideas?
3713-B31-24-04401_error.pdf

@vinayak-mehta
Copy link
Contributor

Looks like a pdfminer bug. Let me try to reproduce it.

@sweco-sekrsv
Copy link
Author

Thanks! I have a few pdf' files that generate this error. I can provide them if that helps.

@vinayak-mehta
Copy link
Contributor

Yes, please do.

@sweco-sekrsv
Copy link
Author

I'm attaching another 10 files that seems to have the same problem.
list_index_bug_pdfs.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants