-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: list index out of range #357
Comments
Pretty sure I have the same problem. Running with
Any way to just ignore tables that don't fall in the provided columns and move on? I get this on a lot of PDFs but the PDFs all have slightly different stuff surrounding the table I'm interested in. So I would just like to move on without failing and later filter the tables that fit my scheme. PS: I already limited to the right pages but can't / don't know how to give the stream parser a concrete starting and ending point. |
Somewhat dirty workaround that works for my case:
@heixincai let me know if it helps you too :) |
@heixincai @pachacamac If you know the approximate location of the table in your PDF (assuming the table always lies in this general area in all PDFs that you have), you can specify table_regions to make camelot look for tables in only these regions. |
@vinayak-mehta for me the problem is that I have PDFs where I'm interested in tables by structure (same columns etc) but different height, y-position, etc. on multiple pages (unknown number of pages) |
Perhaps, we can put in another filter to weed out tables which do not have a certain width/height as a parameter inside the library. |
sorry,I have been busy with my project these days.My solution is to just get the location information of the entire PDF page.Then filter the parsed data.Maybe my method is just for myself,But the problem has been solved. |
Has this been fixed now? |
@pachacamac Opened it here camelot-dev/camelot#50 |
@pachacamac Great. It's really work. |
When i trying to read this pdf,i got this question:
i don't how to solve it,thanks
AreaPercent-lc-multPageTable1.pdf
D:\LES\Environment\Python\python.exe D:/LES/Python/Code/ReadNoBorderTable.py
Traceback (most recent call last):
File "D:/LES/Python/Code/ReadNoBorderTable.py", line 7, in
tables = camelot.read_pdf(pdfPath,flavor=tableType,strip_text=' .\n',columns=['58,107,139,189,327,258'],split_text=True)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\io.py", line 106, in read_pdf
layout_kwargs=layout_kwargs, **kwargs)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\handlers.py", line 162, in parse
layout_kwargs=layout_kwargs)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\parsers\stream.py", line 425, in extract_tables
cols, rows = self._generate_columns_and_rows(table_idx, tk)
File "C:\Users\suyongdeng.RD\AppData\Roaming\Python\Python37\site-packages\camelot\parsers\stream.py", line 321, in _generate_columns_and_rows
if self.columns is not None and self.columns[table_idx] != "":
IndexError: list index out of range
Process finished with exit code 1
The text was updated successfully, but these errors were encountered: