Reintroduction of Cursor code to optimization memory usage of CoExpression Service #6834

n1zea144 · 2019-11-18T18:59:04Z

As a follow-up to the investigation of crashes on public portal backend (the source of which has now been identified as requests to the CoExpression service), the PR reintroduces the use of Cursors as a way to limit the amount of heap used by the CoExpression service implementation.

The following screen shots were made by profiling the CoExpression service satisfying requests against the Cancer Cell Line Encyclopedia (Broad, 2019) with a query gene of SCML2.

On multiple occasions, at its peak memory usage, the CoExpression services uses over 7GB to satisfy a request. The is the result of an accumulation of GeneMolecularAlteration instances and a pileup of calls to string splitting. In this case, we have ~33k instances of GeneMolecularAlteration instances, each of which contains > 1500 alteration measurements (string splitting results in close to 50 million strings made for each entity-sample measurement). The GeneMolecularAlteration instances (and resultant genetic alteration strings due to string splitting) are accumulated in memory before any spearman correlation computations are made.

The following screenshot is an example memory telemetries which highlights the peak memory (7.89GB) consumption of a CoExpression service call (this was the greatest peak capture during profiling):

With the introduction of cursors, there is only a single GeneMolecularAlteration instance is in memory at any one moment in time (well two, because the one representing the "query" entity is kept in memory for the correlation computation). Once the spearman correlation is made for this GeneMolecularAlteration instance, the instance is discarded and the next one is fetched from memory.

This screenshot is an example memory telemetry which highlights peak memory (5.73 GB) consumption with the introduction of cursors:

Timings captured indicated that cursors do not add any overhead to satisfying the CoExpression request. In all captured cases, cursors outperformed the not-cursor implementation

In this example, pre-cursor code completes in ~41 seconds

With cursor code, the request takes 38 seconds:

While results will probably vary based on the host environment and datasets evaluated, the introduction of cursors do seem to benefit the CoExpression service implementation.

…sion service.

n1zea144 requested a review from alisman November 18, 2019 18:59

n1zea144 added performance api backend labels Nov 18, 2019

Reintroduction of Cursor code to optimzation memory usage of CoExpres…

0300536

…sion service.

n1zea144 force-pushed the reintroduce-cursors-coexpression-service branch from 8108e40 to 0300536 Compare November 19, 2019 15:38

n1zea144 merged commit 18f1d98 into cBioPortal:master Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduction of Cursor code to optimization memory usage of CoExpression Service #6834

Reintroduction of Cursor code to optimization memory usage of CoExpression Service #6834

n1zea144 commented Nov 18, 2019

Reintroduction of Cursor code to optimization memory usage of CoExpression Service #6834

Reintroduction of Cursor code to optimization memory usage of CoExpression Service #6834

Conversation

n1zea144 commented Nov 18, 2019