Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concat multi page tables #8

Open
vinayak-mehta opened this issue Jul 4, 2019 · 5 comments
Open

Concat multi page tables #8

vinayak-mehta opened this issue Jul 4, 2019 · 5 comments
Labels
enhancement New feature or request
Projects

Comments

@vinayak-mehta
Copy link
Member

Would be nice to have a way to merge tables which span multiple pages.

@vinayak-mehta vinayak-mehta added the enhancement New feature or request label Jul 4, 2019
@akshowhini
Copy link

@vinayak-mehta I would like to contribute to this. However, I would like to know your expectations on the scenarios and how to handle those.

@c0nb4
Copy link

c0nb4 commented Oct 5, 2019

The way I'm Doing this in my personal project is with

pd.concat(self.list_of_dfs)

The only problem I see is when tables have different column-names. So I just Rename them

names = self.list_of_dfs[0].columns.tolist() for df in self.list_of_dfs: df.columns = names

@vinayak-mehta
Copy link
Member Author

vinayak-mehta commented Oct 15, 2019

@akshowhini This is easy to do when the number and name of columns are the same, which doesn't happen very often. A robust way to do this would be to group multiple tables from different pages by partially matching the column names (based on some threshold) and concatenate them.

@vinayak-mehta vinayak-mehta added this to Backlog in TODO! Jul 9, 2020
@AnnasMazhar
Copy link

Has this thread seen any progress. I was looking through the same issues haven't got any permanent solution to merge tables spanning multiple pages.

@vinayak-mehta
Copy link
Member Author

This issue is low priority as there's no general way to merge tables spanning multiple pages across millions of different types of table structures.

tomprogrammer pushed a commit to tomprogrammer/camelot that referenced this issue May 10, 2023
…ror_text_in_bbox

Fixed ZeroDivisionError in text_in_bbox
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
TODO!
  
Awaiting triage
Development

No branches or pull requests

4 participants