Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and speed up VulnerabilityMetricsUpdateTask #2481

Merged
merged 9 commits into from Feb 20, 2023

Conversation

valentijnscholten
Copy link
Contributor

Description

This PR simplifies the VulnerabilityMetricsUpdateTask by letting the database handling the counting. The runtime is reduced from ~50 seconds to about 4 seconds.

Addressed Issue

I noticed the VulnerabilityMetricsUpdateTask was taking around 50 seconds to complete on my instance. The only thing that task does is to count the vulnerabilities that were published each year/month. The old code was loading all ~300k vulnerabilities into memory, in batches of 500. So this meant around 600 queries and 300000 object instantiations and other Datanucleus "overhead".

The PR now performs 2 queries to get all the counts from the database. Then it does some juggling to match up the published and created fields. And then the old metrics are replaced by the new ones.

Additional Details

During the refactoring I found out this data is not even used by the DT front-end, so it might not be on top of the priority list. But it is exposed through the API. Either way, I had fun optimizing it. First time doing something with Datanucleus as well as Java Streams.

Question: The old code had some logic in place to keep the old database ID values in place for each Year-Month combination. This PR just removes old metrics and creates new ones to keep it simple. Not sure if there are any clients/users that expect the ID to be constant? I hope they look at the Year-Month combination...

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
@valentijnscholten valentijnscholten changed the title Metrics opt vulns Simplify and speed up VulnerabilityMetricsUpdateTask Feb 10, 2023
Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
@valentijnscholten
Copy link
Contributor Author

valentijnscholten commented Feb 11, 2023

Test failure:
Looks like DN is generating the wrong SQL for H2, even though H2 should be supported as well as the getYear() and getMonth()methods:

"javax.jdo.JDOException: Exception thrown when executing query : SELECT YEAR(A0.PUBLISHED) AS YEAR,((MONTH(A0.PUBLISHED) - 1) + 1) AS MONTH,COUNT(A0.ID) AS COUNT FROM VULNERABILITY A0 WHERE A0.PUBLISHED IS NOT NULL AND A0.CREATED IS NULL GROUP BY YEAR(A0.PUBLISHED),((MONTH(A0.PUBLISHED) - 1) + 1)
NestedThrowables:
org.h2.jdbc.JdbcSQLSyntaxErrorException: Syntax error in SQL statement "SELECT YEAR(A0.PUBLISHED) AS [*]YEAR,((MONTH(A0.PUBLISHED) - 1) + 1) AS MONTH,COUNT(A0.ID) AS COUNT FROM VULNERABILITY A0 WHERE A0.PUBLISHED IS NOT NULL AND A0.CREATED IS NULL GROUP BY YEAR(A0.PUBLISHED),((MONTH(A0.PUBLISHED) - 1) + 1)"; expected "identifier"; SQL statement:
SELECT YEAR(A0.PUBLISHED) AS YEAR,((MONTH(A0.PUBLISHED) - 1) + 1) AS MONTH,COUNT(A0.ID) AS COUNT FROM VULNERABILITY A0 WHERE A0.PUBLISHED IS NOT NULL AND A0.CREATED IS NULL GROUP BY YEAR(A0.PUBLISHED),((MONTH(A0.PUBLISHED) - 1) + 1) [42001-214]"

@nscuro
Copy link
Member

nscuro commented Feb 11, 2023

@valentijnscholten I think H2 is stumbling here because MONTH and YEAR are reserved function names, and you use those names for variables (AS MONTH).

@valentijnscholten
Copy link
Contributor Author

valentijnscholten commented Feb 11, 2023

Yeah I'm testing a fix and adding some tests. At first it looked like wrong SQL as the "official" way is not YEAR but EXTRACT ...

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great optimization, thanks @valentijnscholten!

I made some suggestions on how the code could be simplified even more, but those are really just of cosmetic nature.

Question: The old code had some logic in place to keep the old database ID values in place for each Year-Month combination. This PR just removes old metrics and creates new ones to keep it simple. Not sure if there are any clients/users that expect the ID to be constant? I hope they look at the Year-Month combination...

To be honest I don't think there was a specific reason for keeping it, other than perhaps the assumption that deleting and inserting it all would be more expensive that updating.

IDs are never exposed to clients, so this change should not matter to users.

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
…s fetch group

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @valentijnscholten!

Any more pending changes you were planning to add, or is this ready to merge from your side?

@valentijnscholten
Copy link
Contributor Author

I think it's good to go.

@nscuro nscuro added this to the 4.8 milestone Feb 20, 2023
@nscuro nscuro merged commit 240ff8d into DependencyTrack:master Feb 20, 2023
6 checks passed
stephan-wolf-ais pushed a commit to AISAutomation/dependency-track that referenced this pull request Mar 1, 2023
…#2481)

* Simplify and speedup VulnerabilityMetricsUpdateTask

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask with safe aliases

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask suggested changes

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask suggested changes

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask remove superfluous fetch group

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

* Simplify and speedup VulnerabilityMetricsUpdateTask remove casts

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>

---------

Signed-off-by: Valentijn Scholten <valentijnscholten@gmail.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants