New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache XLSX workbook in Region10SpreadsheetConverter. #1675

Merged
merged 3 commits into from Mar 27, 2018

Conversation

Projects
None yet
2 participants
@toolness
Contributor

toolness commented Mar 27, 2018

When attempting to diagnose #1601, I noticed that an inordinate amount of time was being spent in xlrd.open_workbook()--as in, upwards of 2 minutes for Kelly's region 10 XLSX from October 2017, which has around 57,000 rows. Then I realized that we are making multiple calls to this function during region 10 upload jobs.

This improves the situation by caching the result of the call, which seems to speed up r10 bulk upload by 2-4 minutes.

It's also possible this might help with the memory issue mentioned in #1601, but I'm not sure. If we really want to reduce memory usage and speed things up, we might want to require Kelly to upload a CSV instead of an XLSX: my guess is that whatever Microsoft tool he's using on his end will have no problem doing that, and it will certainly reduce a lot of resource consumption on our end.

@toolness toolness requested a review from jseppi Mar 27, 2018

@jseppi

jseppi approved these changes Mar 27, 2018

Wow! Thanks for the detective work and improvement!

toolness added some commits Mar 27, 2018

@toolness toolness merged commit e13ea62 into develop Mar 27, 2018

2 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
codeclimate All good!
Details

@toolness toolness deleted the cache-r10-xlsx branch Mar 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment