You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tabula.convert_into("C:\Meher\pricelist.pdf", "C:\Meher\pricelistoutput.csv"
, spreadsheet=True,output_format="csv", pages="all")
May 16, 2017 6:12:14 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceede
d
at technology.tabula.ObjectExtractor.processTextPosition(ObjectExtractor
.java:329)
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEn
gine.java:504)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:56)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngin
e.java:562)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
ne.java:269)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
ne.java:236)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
java:216)
at technology.tabula.ObjectExtractor.drawPage(ObjectExtractor.java:153)
at technology.tabula.ObjectExtractor.extractPage(ObjectExtractor.java:10
at technology.tabula.PageIterator.next(PageIterator.java:29)
at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:160)
at technology.tabula.CommandLineApp.extractFileInto(CommandLineApp.java:
at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.jav
a:128)
at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:10
at technology.tabula.CommandLineApp.main(CommandLineApp.java:74)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\tabula\wrapper.py", line 114, in convert_i
nto
subprocess.check_output(args)
File "C:\Python27\lib\subprocess.py", line 219, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['java', '-jar', 'C:\Python27\lib\sit
e-packages\tabula\tabula-0.9.2-jar-with-dependencies.jar', '--pages', 'all', '
--guess', '--format', 'CSV', '--outfile', 'C:\Meher\pricelistoutput.csv', '--s
preadsheet', 'C:\Meher\pricelist.pdf']' returned non-zero exit status 1
What did you intend to be?
The text was updated successfully, but these errors were encountered:
But I don't guarantee this way. tabula-py is a simple wrapper of tabula-java, and it is a basic tuning point for Java. If you want to tune furthermore, you can file an issue in the tabula-java issue.
Thank you Chezou. With 4080m It could able to fetch 3000 pdf pages in a single loop at very quick pace. I could able to get it done with a for loop for all 9000 pages. Just one more issue I found. I will open a new issue.
Summary of your issue
My input PDF file is too large ..around 9000 pages (working fine if i select few pages)
Environment
Trying both in windows and linux
Write and check your environment.
python --version
: 2.7.13java -version
: ? 1.7 (tried only python)What did you do when you faced the problem?
//write here
Example code:
tabula.convert_into("C:\Meher\pricelist.pdf", "C:\Meher\pricelistoutput.csv", spreadsheet=True,output_format="csv", pages="all")
Output:
a:128)
at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:10
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\tabula\wrapper.py", line 114, in convert_i
nto
subprocess.check_output(args)
File "C:\Python27\lib\subprocess.py", line 219, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['java', '-jar', 'C:\Python27\lib\sit
e-packages\tabula\tabula-0.9.2-jar-with-dependencies.jar', '--pages', 'all', '
--guess', '--format', 'CSV', '--outfile', 'C:\Meher\pricelistoutput.csv', '--s
preadsheet', 'C:\Meher\pricelist.pdf']' returned non-zero exit status 1
What did you intend to be?
The text was updated successfully, but these errors were encountered: