Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-7514: Update Apache POI to Latest Version #1991

Closed
wants to merge 1 commit into from

Conversation

cgivre
Copy link
Contributor

@cgivre cgivre commented Feb 20, 2020

DRILL-7514: Update Apache POI to Latest Version

Description

Drill's Excel Format Plugin uses Apache POI to parse Excel files. While this reader is effective in that it parses formulae and data types, it uses memory inefficiently and will struggle to read very large Excel files.
The latest version of POI addresses some of the memory issues and hopefully Drill will be able to query larger Excel files without running out of memory.

This PR updates Drill to use the latest version of Apache POI and also updates the User Agent Parser to a more recent version.

There was a minor change to the POI's behavior with respect to empty sheets, so I had to enclose a line in a try/catch block.

Documentation

No user visible changes.

Testing

All relevant unit tests run and passed.

@cgivre
Copy link
Contributor Author

cgivre commented Feb 20, 2020

@vvysotskyi
Addressed review comments and squashed commits.

@vvysotskyi
Copy link
Member

@cgivre, please address this comment: #1991 (comment)

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgivre, thanks for the PR and making changes, +1.
It is great that the newer version will consume memory more efficiently.

Also, please squash the commits.

@cgivre
Copy link
Contributor Author

cgivre commented Feb 20, 2020

@vvysotskyi Thanks for the review. Commits squashed.

@asfgit asfgit closed this in 8a8e58b Feb 21, 2020
@cgivre cgivre deleted the poi_upgrade branch March 22, 2020 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants