Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak MetricsUpdateTask performance #1481

Merged

Conversation

nscuro
Copy link
Member

@nscuro nscuro commented Mar 19, 2022

I was profiling the MetricsUpdateTask class due to some folks reporting performance concerns and managed to identify a few screws we could tighten to make the task more resource efficient.

Profiling

Setup

  • DT 4.4.2 running on Temurin 11 on macOS.
  • Postgres database running in Docker, using the postgres:14.2-alpine image.
  • 50 projects, all using the DT API Server 4.4.2 BOM. 7850 components in total.
  • MetricsUpdateTask running in a JerseyTest without any resources deployed. The task is running pretty much isolated.
  • Because no other tasks are running, no data is changing, thus the metrics update task won't perform any INSERTs, just UPDATEs.
  • Using Async Profiler in IntelliJ.

Results

The current implementation is performing a lot of allocations in the updateComponentMetrics method. Profiling results showed that the biggest chunk is allocated in AbstractAlpineQueryManager.persist. More specifically, in DataNucleus' preCommit method, where DN maintains its L1 cache:

Screenshot 2022-03-19 at 23 17 22

This is also where a lot of CPU time is spent:

Screenshot 2022-03-19 at 23 16 20

The DN docs say WRT the L1 cache:

Each PersistenceManager maintains a cache of the objects that it has encountered (or have been "enlisted") during its lifetime. [...]
Objects are placed in the L1 cache (and updated there) during the course of the transaction. This provides rapid access to the objects in use in the users application and is used to guarantee that there is only one object with a particular identity at any one time for that PersistenceManager. When the PersistenceManager is closed the L1 cache is cleared.

Because DT currently uses a single PersistenceManager for the entire portfolio metrics update run, the L1 cache keeps on growing, and DN has to do a lot of work in order to maintain it. Performance deteriorates with every project being processed. So far, this matches what @hostalp was reporting in #1210 (comment).

Another factor here is what AbstractAlpineQueryManager.persist does with objects passed to it:

  • Start a transaction
  • Make the object persistent (it may or may not be transient until now)
  • Commit the transaction (this detaches the object again)
  • Refresh the detached object, making it persistent again

Screenshot 2022-03-19 at 23 49 16

Especially for larger objects, that's quite a bit of work being performed. In cases like metrics updates, we only want to INSERT or UPDATE something, but we don't care about the object after that, we don't need it to be persistent.

There are cases in the current logic where persist is called even if it isn't necessary, causing some additional overhead. One of these cases is when updating the lastInheritedRiskScore field of Component, when the metrics themselves didn't change (see here). Semantically, this is (almost?) always a no-op. But in the background, the persistence layer is producing garbage.

When working on objects that already are persistent we could achieve the desired effect using something like this as well:

qm.getPersistenceManager().currentTransaction().begin();
component.setLastInheritedRiskScore(last.getInheritedRiskScore());
qm.getPersistenceManager().currentTransaction();

That safes us the unnecessary calls to makePersistent and refresh.

Tweaks

I didn't want to change too much here, so I only introduced two changes:

  • Every updateProjectMetrics method gets its own QueryManager. This ensures that resources like the L1 cache are regularly cleared. We do want to take advantage of DN's caching to some degree, so using a different QueryManager for each updateComponentMetrics method is not reasonable. According to DN's docs, creating PersistenceManager instances is a cheap operation though.
  • lastInheritedRiskScore fields of projects and components are only updated when they actually changed. In the majority of cases, this won't be the case and we're saving quite a bit of computing power.

Results

Allocations are reduced by more than 50%:

Screenshot 2022-03-19 at 23 22 31

CPU time spent is drastically reduced as well:

Screenshot 2022-03-19 at 23 21 38

With the original code, the task ran for 3m30s. With the changes outlined above, it finished just under 2m30s!

There are probably a few more places throughout the codebase where similar minor changes could be made to achieve such results.

…update run

This ensures that the L1 and L2 cache of the DN persistence manager are regularly cleared and other related resources are released.

Profiling indicates that this cuts the amount of object allocations roughly in half. CPU time spent on cache maintenance is also drastically reduced.

Signed-off-by: nscuro <nscuro@protonmail.com>
…actually changed.

Especially for components, this is the most common scenario. By not performing this operation when nothing changed, we're saving unnecessary transactions and persist->refresh cycles.

Signed-off-by: nscuro <nscuro@protonmail.com>
@nscuro
Copy link
Member Author

nscuro commented Mar 19, 2022

Will hopefully also have an impact on #1370.

@stevespringett stevespringett merged commit 7c483b5 into DependencyTrack:master Mar 31, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 30, 2022
@nscuro nscuro deleted the tweak-metrics-update-performance branch May 8, 2022 13:28
@nscuro nscuro added this to the 4.5 milestone May 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants