Profile and optimize BOM ingestion performance #635

nscuro · 2023-06-28T12:30:18Z

We found that ingestion of large BOMs (>10k components) can take a long time, especially when multiple of those are uploaded concurrently.

We did not touch the ingestion code so far, but it is likely that there's still plenty of headroom for performance optimizations.

Profile ingestion with large BOMs and identify potential bottlenecks, redundant queries, etc.
Implement performance optimizations where practical
Bonus: Look into how feasible it is to make the ingestion process atomic
- If ingestion fails halfway through, we should not leave an incomplete state behind
- Complication: A single, large DB transaction will cripple performance

nscuro · 2023-06-30T13:55:52Z

Transaction size is a big concern, but the approach may still be worth building a PoC for, which should be tested with very large BOMs.

It could be that relying on database transactions becomes feasible again, if the ingestion process itself is less heavy. In the current state it might be problematic.

Another way of emulating transactional behavior is to perform compensating controls in the application, instead of in the database. Simple example: Deleting all previously created components, if processing fails halfway through (only works for new projects).

Once we have a PoC, it should be tested for a longer period of time (multiple days) in a test environment.

nscuro · 2023-07-03T10:23:23Z

Flame graph when importing bloated.bom.json into the in-memory H2 database.

Majority of the time is spent in persisting components. And a good chunk of that time is spent with checking the L1 cache (java.util.HashSet.<init> above TransactionImpl.internalPreCommit).

At the moment, each component is inserted / updated in its own transaction. For the bloated BOM with 9056 components, this results in >= 9056 transactions. If running the entire thing in a single transaction is not feasible, we could at the very least look into "batching" multiple components in a transaction.

The entire import takes about 20sec on my laptop.

nscuro · 2023-07-03T10:50:50Z

There are also some bugs in the current logic, mostly triggered when BOMs include duplicate components:

DependencyTrack hangs when uploading a large SBOM to a project a second time dependency-track#1905
Error while uploading same SBOM Second time dependency-track#2131

One of the suggestions made is that converting from CycloneDX to the DT model should be done entirely without database interactions. This would allow for an additional de-duplication step prior to reading / writing to / from the DB.

nscuro added enhancement New feature or request size/M Medium effort component/api-server labels Jun 28, 2023

nscuro mentioned this issue Jun 28, 2023

Deletion of (large) projects takes too long #636

Closed

nscuro added performance p2 Non-critical bugs, and features that help organizations to identify and reduce risk labels Jun 29, 2023

nscuro added this to Hyades Jun 30, 2023

nscuro moved this to Todo in Hyades Jun 30, 2023

nscuro self-assigned this Jul 3, 2023

nscuro moved this from Todo to In Progress in Hyades Jul 3, 2023

nscuro mentioned this issue Jul 5, 2023

Improve BOM processing performance DependencyTrack/hyades-apiserver#218

Merged

2 tasks

nscuro added the in review Implementation is complete and currently in review label Jul 6, 2023

nscuro closed this as completed in DependencyTrack/hyades-apiserver#218 Jul 10, 2023

github-project-automation bot moved this from In Progress to Done in Hyades Jul 10, 2023

nscuro removed the in review Implementation is complete and currently in review label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile and optimize BOM ingestion performance #635

Profile and optimize BOM ingestion performance #635

nscuro commented Jun 28, 2023

nscuro commented Jun 30, 2023

nscuro commented Jul 3, 2023 •

edited

Loading

nscuro commented Jul 3, 2023

Profile and optimize BOM ingestion performance #635

Profile and optimize BOM ingestion performance #635

Comments

nscuro commented Jun 28, 2023

nscuro commented Jun 30, 2023

nscuro commented Jul 3, 2023 • edited Loading

nscuro commented Jul 3, 2023

nscuro commented Jul 3, 2023 •

edited

Loading