Check and Possibly Reduce Memory usage of Pre-built PDF #1280

bengolder · 2017-10-31T17:46:40Z

This issue is the result of investigating #1275.

If San Francisco does not login for long enough, their prebuilt PDF will be built up in memory with new applications, and may become big enough to exceed our memory quota during the construction of this concatenated PDF.

Assess the memory usage of large prebuilt PDF, determine the ways we can reduce memory usage, and make changes as needed to prevent exceeding our memory quota.

bengolder · 2017-10-31T17:59:54Z

This function sets the .pdf column of a PrebuiltPDFBundle instance to bytes that are a concatenation of all the individual filled PDFs of each of the apps contained in the bundle.
https://github.com/codeforamerica/intake/blob/develop/intake/services/pdf_service.py#L41-L50

Here is the PrebuiltPDFBundle method that sets its own bytes in the .pdf column:
https://github.com/codeforamerica/intake/blob/develop/intake/models/prebuilt_pdf_bundle.py#L19-L28

This is method in the PDF handling python code that joins multiple pdfs into one and returns its bytes:
https://github.com/codeforamerica/intake/blob/develop/intake/pdfparser.py#L114-L122

bengolder · 2017-10-31T18:06:28Z

The largest file size of any PrebuiltPDFBundle.pdf file in cmr-prod is 1.68 MB (well within memory limits). So file size alone is not the likely cause.
The largest bundle contains 10 applications.

bengolder · 2017-10-31T18:18:42Z

It's not clear if the memory issues occur from Python or Java processes. It could be either one.

bengolder · 2017-10-31T18:29:25Z

Here is a script that should be able to reproduce the memory issues caused by pdfs: https://gist.github.com/bengolder/ec437271f0ea5f3050b15ba3082ff983

glassresistor · 2017-11-01T00:13:20Z

After managing to ssh into the active web and worker dynos I was able to determine that in 0 traffic state its using ~250Mb of memory which is made of up of 4-8 50Mb worker processes.

We can see that from the outside when traffic causes sf to generate a bundled pdf(on stage) memory usage can jump up by ~20Mb and not go down for a long period of time and subsequent requests will increase it by another 20Mb.

When testing on stage it taking longer than 5-10min to generate a pdf with 1000s of generated pdfs in it. So its plausible that 5-10 submissions to sf over the course of a day if the bundled pdf are really big could cause this problems.

An easy solution is to reduce the number of celery instances on a worker to just 2. This should both cut the baseline down to 100Mb. If the problem still keeps happening we can look into other solutions to decrease the amount of memory left over and its duration.

glassresistor · 2017-11-01T00:14:06Z

I was not able to build pdfs while sshed into dynos though because Heroku breaks pdf creation when in the JAVA remote debug mode.

glassresistor · 2017-11-01T00:15:51Z

Because Lambda has one instance per task this problem should go away when we are one Lambda.

glassresistor · 2017-11-01T19:07:31Z

I was able to trigger a memory overload but creating 300 submissions and trigger a pdf build but was not able to trigger one using a pdf with a 100 submissions even if i ran it 20times in a row.

This should be good enough for our needs.

bengolder added bug support mode - reduce tech costs labels Oct 31, 2017

bengolder added this to the Support Mode milestone Oct 31, 2017

bengolder mentioned this issue Oct 31, 2017

Error R14 (Memory quota exceeded) #1275

Closed

bengolder added the current sprint label Oct 31, 2017

bengolder assigned glassresistor Oct 31, 2017

glassresistor mentioned this issue Nov 1, 2017

reduced worker processes to 2 #1283

Merged

7 tasks

glassresistor added ready and removed current sprint labels Nov 4, 2017

TiffanyAndrews closed this as completed Nov 6, 2017

TiffanyAndrews removed the ready label Nov 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check and Possibly Reduce Memory usage of Pre-built PDF #1280

Check and Possibly Reduce Memory usage of Pre-built PDF #1280

bengolder commented Oct 31, 2017

bengolder commented Oct 31, 2017

bengolder commented Oct 31, 2017 •

edited

bengolder commented Oct 31, 2017

bengolder commented Oct 31, 2017

glassresistor commented Nov 1, 2017

glassresistor commented Nov 1, 2017

glassresistor commented Nov 1, 2017

glassresistor commented Nov 1, 2017

Check and Possibly Reduce Memory usage of Pre-built PDF #1280

Check and Possibly Reduce Memory usage of Pre-built PDF #1280

Comments

bengolder commented Oct 31, 2017

bengolder commented Oct 31, 2017

bengolder commented Oct 31, 2017 • edited

bengolder commented Oct 31, 2017

bengolder commented Oct 31, 2017

glassresistor commented Nov 1, 2017

glassresistor commented Nov 1, 2017

glassresistor commented Nov 1, 2017

glassresistor commented Nov 1, 2017

bengolder commented Oct 31, 2017 •

edited