Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn table extraction off by default for PDFs and images #3021

Closed
MthwRobinson opened this issue May 15, 2024 · 0 comments · Fixed by #3035
Closed

Turn table extraction off by default for PDFs and images #3021

MthwRobinson opened this issue May 15, 2024 · 0 comments · Fixed by #3035

Comments

@MthwRobinson
Copy link
Contributor

The goal of this issue is to update the default value of skip_infer_table_types to skip_infer_table_types=["pdf", "jpg", "png", "xls", "xlsx", "heic"]. Prior to version 0.13, table extraction was off by default. Since the default behavior was changed, we've found there are users who don't know that setting exists, and observed slow downs in PDF processing times.

github-merge-queue bot pushed a commit that referenced this issue May 17, 2024
…images (#3035)

### Summary

Closes #3021 . Turns table extraction for PDFs and images off by
default. The default behavior originally changed in #2588 . The reason
for reversion is that some users did not realize turning off table
extraction was an option and experience long processing times for PDFs
and images with the new default behavior.

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant