Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile and optimize BOM ingestion performance #635

Closed
nscuro opened this issue Jun 28, 2023 · 3 comments · Fixed by DependencyTrack/hyades-apiserver#218
Closed

Profile and optimize BOM ingestion performance #635

nscuro opened this issue Jun 28, 2023 · 3 comments · Fixed by DependencyTrack/hyades-apiserver#218
Assignees
Labels
component/api-server enhancement New feature or request p2 Non-critical bugs, and features that help organizations to identify and reduce risk performance size/M Medium effort

Comments

@nscuro
Copy link
Member

nscuro commented Jun 28, 2023

We found that ingestion of large BOMs (>10k components) can take a long time, especially when multiple of those are uploaded concurrently.

We did not touch the ingestion code so far, but it is likely that there's still plenty of headroom for performance optimizations.

  • Profile ingestion with large BOMs and identify potential bottlenecks, redundant queries, etc.
  • Implement performance optimizations where practical
  • Bonus: Look into how feasible it is to make the ingestion process atomic
    • If ingestion fails halfway through, we should not leave an incomplete state behind
    • Complication: A single, large DB transaction will cripple performance
@nscuro nscuro added enhancement New feature or request size/M Medium effort component/api-server labels Jun 28, 2023
@nscuro nscuro added performance p2 Non-critical bugs, and features that help organizations to identify and reduce risk labels Jun 29, 2023
@nscuro
Copy link
Member Author

nscuro commented Jun 30, 2023

Transaction size is a big concern, but the approach may still be worth building a PoC for, which should be tested with very large BOMs.

It could be that relying on database transactions becomes feasible again, if the ingestion process itself is less heavy. In the current state it might be problematic.

Another way of emulating transactional behavior is to perform compensating controls in the application, instead of in the database. Simple example: Deleting all previously created components, if processing fails halfway through (only works for new projects).

Once we have a PoC, it should be tested for a longer period of time (multiple days) in a test environment.

@nscuro nscuro added this to Hyades Jun 30, 2023
@nscuro nscuro moved this to Todo in Hyades Jun 30, 2023
@nscuro nscuro self-assigned this Jul 3, 2023
@nscuro nscuro moved this from Todo to In Progress in Hyades Jul 3, 2023
@nscuro
Copy link
Member Author

nscuro commented Jul 3, 2023

Flame graph when importing bloated.bom.json into the in-memory H2 database.

Image

Majority of the time is spent in persisting components. And a good chunk of that time is spent with checking the L1 cache (java.util.HashSet.<init> above TransactionImpl.internalPreCommit).

At the moment, each component is inserted / updated in its own transaction. For the bloated BOM with 9056 components, this results in >= 9056 transactions. If running the entire thing in a single transaction is not feasible, we could at the very least look into "batching" multiple components in a transaction.

The entire import takes about 20sec on my laptop.

@nscuro
Copy link
Member Author

nscuro commented Jul 3, 2023

There are also some bugs in the current logic, mostly triggered when BOMs include duplicate components:

One of the suggestions made is that converting from CycloneDX to the DT model should be done entirely without database interactions. This would allow for an additional de-duplication step prior to reading / writing to / from the DB.

@nscuro nscuro added the in review Implementation is complete and currently in review label Jul 6, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Hyades Jul 10, 2023
@nscuro nscuro removed the in review Implementation is complete and currently in review label Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/api-server enhancement New feature or request p2 Non-critical bugs, and features that help organizations to identify and reduce risk performance size/M Medium effort
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant