IndexError: list index out of range #357

heixincai · 2019-07-09T05:52:46Z

When i trying to read this pdf,i got this question:
i don't how to solve it,thanks
AreaPercent-lc-multPageTable1.pdf

D:\LES\Environment\Python\python.exe D:/LES/Python/Code/ReadNoBorderTable.py
Traceback (most recent call last):
File "D:/LES/Python/Code/ReadNoBorderTable.py", line 7, in
tables = camelot.read_pdf(pdfPath,flavor=tableType,strip_text=' .\n',columns=['58,107,139,189,327,258'],split_text=True)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\io.py", line 106, in read_pdf
layout_kwargs=layout_kwargs, **kwargs)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\handlers.py", line 162, in parse
layout_kwargs=layout_kwargs)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\parsers\stream.py", line 425, in extract_tables
cols, rows = self._generate_columns_and_rows(table_idx, tk)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\parsers\stream.py", line 321, in _generate_columns_and_rows
if self.columns is not None and self.columns[table_idx] != "":
IndexError: list index out of range

Process finished with exit code 1

heixincai · 2019-07-09T06:14:41Z

when i use Excalibur,i find Autodetect Tables will find a smallTable,look at the picture,

if i remove the small, Camelot can extract tables.May be this is the same question,this is anything what i know.Thanks~

pachacamac · 2019-08-13T18:01:13Z

Pretty sure I have the same problem.

Running with flavor='stream', columns=['62,105,185,252'] and get

File "/home/user/.local/lib/python3.7/site-packages/camelot/io.py", line 117, in read_pdf
    **kwargs
  File "/home/user/.local/lib/python3.7/site-packages/camelot/handlers.py", line 172, in parse
    p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
  File "/home/user/.local/lib/python3.7/site-packages/camelot/parsers/stream.py", line 458, in extract_tables
    cols, rows = self._generate_columns_and_rows(table_idx, tk)
  File "/home/user/.local/lib/python3.7/site-packages/camelot/parsers/stream.py", line 336, in _generate_columns_and_rows
    if self.columns is not None and self.columns[table_idx] != "":
IndexError: list index out of range

Any way to just ignore tables that don't fall in the provided columns and move on? I get this on a lot of PDFs but the PDFs all have slightly different stuff surrounding the table I'm interested in. So I would just like to move on without failing and later filter the tables that fit my scheme.

PS: I already limited to the right pages but can't / don't know how to give the stream parser a concrete starting and ending point.

pachacamac · 2019-08-13T19:59:56Z

Somewhat dirty workaround that works for my case:

cols = ['62,105,185,252']
cols *= 128 # <-- workaround: just make sure to have enough of the same col set for all tables that will be discovered. e.g. ['62,105,185,252', '62,105,185,252', .....]
camelot.read_pdf(pdf_file, flavor='stream', columns=cols)

@heixincai let me know if it helps you too :)

vinayak-mehta · 2019-08-27T13:43:22Z

@heixincai @pachacamac If you know the approximate location of the table in your PDF (assuming the table always lies in this general area in all PDFs that you have), you can specify table_regions to make camelot look for tables in only these regions.

pachacamac · 2019-08-27T14:11:12Z

@vinayak-mehta for me the problem is that I have PDFs where I'm interested in tables by structure (same columns etc) but different height, y-position, etc. on multiple pages (unknown number of pages)

vinayak-mehta · 2019-08-27T14:23:26Z

Perhaps, we can put in another filter to weed out tables which do not have a certain width/height as a parameter inside the library.

heixincai · 2019-09-20T00:21:22Z

sorry,I have been busy with my project these days.My solution is to just get the location information of the entire PDF page.Then filter the parsed data.Maybe my method is just for myself,But the problem has been solved.
Thanks~

pachacamac · 2019-10-14T15:39:30Z

Has this been fixed now?

vinayak-mehta · 2019-10-15T06:28:54Z

@pachacamac Opened it here camelot-dev/camelot#50

helpgodsg · 2023-09-23T09:39:30Z

@pachacamac Great. It's really work.

vinayak-mehta mentioned this issue Aug 27, 2019

min_height/width to filter out tables? camelot-dev/camelot#50

Open

vinayak-mehta closed this as completed Oct 14, 2019

JosePVB mentioned this issue Feb 1, 2020

[parsers.stream] - Fix IndexError when extracting more tables than there are columns camelot-dev/camelot#112

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: list index out of range #357

IndexError: list index out of range #357

heixincai commented Jul 9, 2019 •

edited

Loading

heixincai commented Jul 9, 2019

pachacamac commented Aug 13, 2019 •

edited

Loading

pachacamac commented Aug 13, 2019 •

edited

Loading

vinayak-mehta commented Aug 27, 2019

pachacamac commented Aug 27, 2019 •

edited

Loading

vinayak-mehta commented Aug 27, 2019 •

edited

Loading

heixincai commented Sep 20, 2019

pachacamac commented Oct 14, 2019

vinayak-mehta commented Oct 15, 2019

helpgodsg commented Sep 23, 2023

IndexError: list index out of range #357

IndexError: list index out of range #357

Comments

heixincai commented Jul 9, 2019 • edited Loading

heixincai commented Jul 9, 2019

pachacamac commented Aug 13, 2019 • edited Loading

pachacamac commented Aug 13, 2019 • edited Loading

vinayak-mehta commented Aug 27, 2019

pachacamac commented Aug 27, 2019 • edited Loading

vinayak-mehta commented Aug 27, 2019 • edited Loading

heixincai commented Sep 20, 2019

pachacamac commented Oct 14, 2019

vinayak-mehta commented Oct 15, 2019

helpgodsg commented Sep 23, 2023

heixincai commented Jul 9, 2019 •

edited

Loading

pachacamac commented Aug 13, 2019 •

edited

Loading

pachacamac commented Aug 13, 2019 •

edited

Loading

pachacamac commented Aug 27, 2019 •

edited

Loading

vinayak-mehta commented Aug 27, 2019 •

edited

Loading