Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add openpyxl dependency #404

Merged
merged 2 commits into from May 26, 2020
Merged

Add openpyxl dependency #404

merged 2 commits into from May 26, 2020

Conversation

dgdekoning
Copy link
Member

When reading large .xlsx files, openpyxl is dramatically faster that xlrd because it does not have to read the entire file to determine the sheet names.

@dgdekoning dgdekoning added the feature Issues/PRs related to a new feature label May 25, 2020
@dgdekoning dgdekoning requested a review from bsteubing May 25, 2020 09:00
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 56.546% when pulling 9d33207 on openpyxl into cd34dd3 on master.

@dgdekoning
Copy link
Member Author

Found out that 'on_demand' does nothing on xlsx files: https://stackoverflow.com/a/31187609.

Note that it is definitely possible to feed the openpyxl Workbook object into pandas.read_excel by using the engine='openpyxl' argument. There is a speed increase in doing so, but it is quite minor compared to the other two methods where openpyxl was used instead of xlrd.

@dgdekoning dgdekoning merged commit 496943f into master May 26, 2020
@dgdekoning dgdekoning deleted the openpyxl branch May 26, 2020 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues/PRs related to a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants