New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process bulk uploads in batches #1714

Merged
merged 1 commit into from Apr 5, 2018

Conversation

Projects
None yet
2 participants
@toolness
Contributor

toolness commented Apr 5, 2018

This potentially fixes #1712 by chopping up the insertion of new contracts into batches of 5000 labor categories each. It also reduces the lifetime of the Region10SpreadsheetConverter instance, so that it won't be around anymore when search indexes are updated.

While I wasn't able to definitively profile memory usage in #1601, I'm hoping that this sufficiently reduces memory usage to the point that our rq_workers don't crash anymore.

This PR also brings in the process_bulk_upload management command from #1601 to make future manual testing and profiling easier.

Note that this PR doesn't have ideal test coverage. I don't want to write tests if this doesn't actually fix #1712, so I'm going to hold off on writing them until we actually try this out on cloudfoundry.

Here's example log output of what running manage.py process_bulk_upload with Kelly's latest R10 export looks like:

[05/Apr/2018 12:07:45] INFO [contracts:45] Deleting contract objects related to region 10.
[05/Apr/2018 12:07:49] INFO [contracts:53] Generating new contract objects.
[05/Apr/2018 12:08:04] INFO [contracts:63] Saved 5000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:07] INFO [contracts:63] Saved 10000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:10] INFO [contracts:63] Saved 15000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:13] INFO [contracts:63] Saved 20000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:16] INFO [contracts:63] Saved 25000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:19] INFO [contracts:63] Saved 30000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:22] INFO [contracts:63] Saved 35000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:25] INFO [contracts:63] Saved 40000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:29] INFO [contracts:63] Saved 45000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:32] INFO [contracts:63] Saved 50000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:35] INFO [contracts:63] Saved 55000 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:36] INFO [contracts:63] Saved 55672 contracts so far (0 bad rows found).
[05/Apr/2018 12:08:36] INFO [contracts:67] Updating full-text search indexes.
55672 contracts successfully processed, 0 failed.
@hbillings

Atul and I reviewed this in a screenshare. LGTM!

@toolness toolness merged commit ec872c8 into develop Apr 5, 2018

3 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
codeclimate All good!
Details
codeclimate/total-coverage 92% (0.0% change)
Details

@toolness toolness deleted the batched-bulk-upload branch Apr 5, 2018

@toolness

This comment has been minimized.

Contributor

toolness commented Apr 18, 2018

To follow-up, I did a Kibana visualization of memory usage on calc-rqworker around the time that this problem occurred:

memory-used

I think that the spike on the left is from April 4 when we tried uploading the R10 spreadsheet using the old code, and the one on the right is from April 6 when we successfully pushed this PR to production and re-uploaded. I'm not sure what the three spikes shortly after the huge one are, though.

In any case, the intent of this comment is really just to highlight the fact that these metrics can be monitored and visualized using Kibana, which can be useful for diagnosis in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment