Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduction of Cursor code to optimization memory usage of CoExpression Service #6834

Merged

Conversation

n1zea144
Copy link
Contributor

As a follow-up to the investigation of crashes on public portal backend (the source of which has now been identified as requests to the CoExpression service), the PR reintroduces the use of Cursors as a way to limit the amount of heap used by the CoExpression service implementation.

The following screen shots were made by profiling the CoExpression service satisfying requests against the Cancer Cell Line Encyclopedia (Broad, 2019) with a query gene of SCML2.

On multiple occasions, at its peak memory usage, the CoExpression services uses over 7GB to satisfy a request. The is the result of an accumulation of GeneMolecularAlteration instances and a pileup of calls to string splitting. In this case, we have ~33k instances of GeneMolecularAlteration instances, each of which contains > 1500 alteration measurements (string splitting results in close to 50 million strings made for each entity-sample measurement). The GeneMolecularAlteration instances (and resultant genetic alteration strings due to string splitting) are accumulated in memory before any spearman correlation computations are made.

The following screenshot is an example memory telemetries which highlights the peak memory (7.89GB) consumption of a CoExpression service call (this was the greatest peak capture during profiling):

image

With the introduction of cursors, there is only a single GeneMolecularAlteration instance is in memory at any one moment in time (well two, because the one representing the "query" entity is kept in memory for the correlation computation). Once the spearman correlation is made for this GeneMolecularAlteration instance, the instance is discarded and the next one is fetched from memory.

This screenshot is an example memory telemetry which highlights peak memory (5.73 GB) consumption with the introduction of cursors:

image

Timings captured indicated that cursors do not add any overhead to satisfying the CoExpression request. In all captured cases, cursors outperformed the not-cursor implementation

In this example, pre-cursor code completes in ~41 seconds

image

With cursor code, the request takes 38 seconds:

image

While results will probably vary based on the host environment and datasets evaluated, the introduction of cursors do seem to benefit the CoExpression service implementation.

@n1zea144 n1zea144 force-pushed the reintroduce-cursors-coexpression-service branch from 8108e40 to 0300536 Compare November 19, 2019 15:38
@n1zea144 n1zea144 merged commit 18f1d98 into cBioPortal:master Nov 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant