-
-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Believe this is an issue ... findings change with each scan of same SBOM #2519
Comments
can you check if there is any error logged in the logfile after uploading (could be several minutes later as processing the content is asynchronous)? we saw similar stuff in #2131 when reuploading SBOMs to the same project. We fixed that in 4.7, but If the SBOM processing fails for your SBOM at some point that could be a reason. |
@charbl Those are Jenkins Logs, We would need the Dependency-Track logfiles. Also: After uploading the BOM it takes a while for it to process, in this time the amount of components will change several times as well, is it constant already when you check it? |
as soon as I can get access, will check logs |
see this when our job runs 2023-02-27 18:24:13,172 INFO [BomUploadProcessingTask] Processing CycloneDX BOM uploaded to project: 3d127377-ffd6-4974-90dc-c58407cdc178 |
Any thoughts? |
Any idea? Is this the issue? |
Looks wrong to me. Any idea why "Underlying file changed by an external force"? Do you have two instances running at once? Or is it something in your SBOM? Maybe attach it to this issue? |
The I think the underlying issue is #2131, as pointed out by @rkg-mm. Until now there's only a workaround in place that prevents the SBOM processing to fail entirely, but the root cause is still unaddressed. |
So, I redid our test and was watching logs. Did see the AlreadyClosedException on each BOM upload. Did capture the logs as each BOM was uploaded and processed. Will attach the document that has each log entries as the BOM was processed. this is the org (after all the number changing) So, we are calling this a bug? Have a customer that is wanting answers to why the numbers are different for same SBOM and trying to answer them. DT-BASE.txt |
@charbl another shot: Does your BOM contain duplicates, in terms of multiple components with identical PURLs? |
would it be helpful if I uploaded an scrubbed copy of the SBOM? Masked the Application Name and some of the url paths related to local resolution. |
@charbl yes, as long as PURLS remain the same. You can remove internal components (maybe double check if you can reproduce it in the end with the scrubbed BOM) |
ran test on scrubbed BOM and got same results |
I think this confirms my assumption: For example, look at array-unique@0.2.1. You find it in many variations as bom-refs like PURL is always: pkg:npm/array-unique@0.2.1?vcs_url=git://github.com/jonschlinkert/array-unique.git This is due to the way the NPM cyclonedx tool creates those BOMs now, seeing the same sub-component within different NPM component not as identical entry, because there are possibilities how this "could" in theory not be the same component (a modified one). We had a lot of discussions about the way how this is done by the cyclonedx npm tool, and wether this should be the case, but as of now it is this way. on first upload an entry for each of the duplicates is created (you can confirm this by searching an example component like above in the component list, you will find multiple entries. On second upload, Dtrack first looks for identical components in the Database for the project, to know if it should add a new one or can re-use the existing one. It will find the existing ones, and not create a new one - and further, this leads to cleanup of the data as well, where duplicate components get removed. Not ideal this way, but the result is not wrong, its just de-duplicated. Related to: Background discussions can be found in: |
so best practice for now might be to load the BOM twice at first to get more accurate findings? I have no problem with this as "this is how it's done for now" and just want to ensure this is passed to the applications we support. |
@charbl Probably. I don't know a better way for pure NPM projects as of now. But I am not 100% confident the duplicates are completely cleaned up after 2nd upload already, maybe needs confirmation. |
10-4 .... will try recommendations and re-run to come up with best practices for where we are. thanx |
There is an additional complication in the BOM provided by @charbl:
<component type="library" bom-ref="xmlbuilder2@3.0.2">
<author>Ozgur Ozcitak <oozcitak@gmail.com></author>
<name>xmlbuilder2</name>
<version>3.0.2</version>
<description>An XML builder for node.js</description>
<hashes>
<hash alg="SHA-512">8783146b0198db5093761578c66dc31bd760b2aca10e466f549071f3c6dea97f3026cdd58321908009f956b787b9a7baba74d0c61d75e5a30ae489c25663882b</hash>
</hashes>
<licenses>
<license>
<id>MIT</id>
</license>
</licenses>
<purl>pkg:npm/xmlbuilder2@3.0.2</purl>
<externalReferences>
<reference type="vcs">
<url>git://github.com/oozcitak/xmlbuilder2.git</url>
<comment>as detected from PackageJson property "repository.url"</comment>
</reference>
<reference type="website">
<url>http://github.com/oozcitak/xmlbuilder2</url>
<comment>as detected from PackageJson property "homepage"</comment>
</reference>
<reference type="issue-tracker">
<url>http://github.com/oozcitak/xmlbuilder2/issues</url>
<comment>as detected from PackageJson property "bugs.url"</comment>
</reference>
<reference type="distribution">
<url>https://registry.npmjs.org/xmlbuilder2/-/xmlbuilder2-3.0.2.tgz</url>
<comment>as detected from npm-ls property "resolved"</comment>
</reference>
</externalReferences>
<properties>
<property name="cdx:npm:package:path">node_modules/xmlbuilder2</property>
</properties>
</component> <component type="library" bom-ref="xmlbuilder2@3.0.2">
<author>Ozgur Ozcitak <oozcitak@gmail.com></author>
<name>xmlbuilder2</name>
<version>3.0.2</version>
<description>An XML builder for node.js</description>
<licenses>
<license>
<id>MIT</id>
</license>
</licenses>
<purl>pkg:npm/xmlbuilder2@3.0.2?vcs_url=git://github.com/oozcitak/xmlbuilder2.git</purl>
<externalReferences>
<reference type="vcs">
<url>git://github.com/oozcitak/xmlbuilder2.git</url>
<comment>as detected from PackageJson property "repository.url"</comment>
</reference>
<reference type="website">
<url>http://github.com/oozcitak/xmlbuilder2</url>
<comment>as detected from PackageJson property "homepage"</comment>
</reference>
<reference type="issue-tracker">
<url>http://github.com/oozcitak/xmlbuilder2/issues</url>
<comment>as detected from PackageJson property "bugs.url"</comment>
</reference>
</externalReferences>
<components>
<component type="library" bom-ref="xmlbuilder2@3.0.2|js-yaml@3.14.0">
<author>Vladimir Zapparov <dervus.grim@gmail.com></author>
<name>js-yaml</name>
<version>3.14.0</version>
<description>YAML 1.2 parser and serializer</description>
<licenses>
<license>
<id>MIT</id>
</license>
</licenses>
<purl>pkg:npm/js-yaml@3.14.0</purl>
<externalReferences>
<reference type="vcs">
<url>nodeca/js-yaml</url>
<comment>as detected from PackageJson property "repository"</comment>
</reference>
<reference type="website">
<url>https://github.com/nodeca/js-yaml</url>
<comment>as detected from PackageJson property "homepage"</comment>
</reference>
<reference type="distribution">
<url>https://registry.npmjs.org/js-yaml/-/js-yaml-3.14.0.tgz</url>
<comment>as detected from npm-ls property "resolved"</comment>
</reference>
</externalReferences>
<properties>
<property name="cdx:npm:package:path">node_modules/xmlbuilder2/node_modules/js-yaml</property>
</properties>
</component>
</components>
<properties>
<property name="cdx:npm:package:path">node_modules/xmlbuilder2</property>
</properties>
</component> In both cases, there's also an Given this (#2131 (comment)) query performed during BOM processing: project == :project && ((purl != null && purl == :purl) || (purlCoordinates != null && purlCoordinates == :purlCoordinates) || (swidTagId != null && swidTagId == :swidTagId) || (cpe != null && cpe == :cpe) || (group == :group && name == :name && version == :version)) Only ever one of the two occurrences would match after the first BOM upload. Because There's a trade-off here: If we want accurate matching, the correlation query above needs to be more strict. More along the lines of: project == :project && ((purl != null && purl == :purl) && (group == null && name == :name && version == :version)) In other words, all identities must match, not only individual parts. However, this will make "updating" of existing components with new identities impossible. If this component already exists: {
"name": "foo",
"version": "1.2.3"
} then uploading the following component: {
"name": "foo",
"version": "1.2.3",
"purl": "pkg:maven/acme/foo@1.2.3"
} will cause a new component to be created (and the old one to be deleted), instead of merging them. IMO, stricter matching is the more desirable option. Merging incomplete identities makes the whole process hard to reason about, and the state of the project is not really deterministic. |
Thinking about this more, I believe there are certain cases where one might want component identity matching to be "lax", e.g. when searching for components in the portfolio, and others where matching should be "strict", e.g. when importing BOMs. Any thoughts? |
Import & Tree: Search: |
To address DependencyTrack/dependency-track#2519 (comment). Also add regression test for this specific issue. Signed-off-by: nscuro <nscuro@protonmail.com>
To address DependencyTrack/dependency-track#2519 (comment). Also add regression test for this specific issue. Signed-off-by: nscuro <nscuro@protonmail.com>
To address DependencyTrack/dependency-track#2519 (comment). Also add regression test for this specific issue. Signed-off-by: nscuro <nscuro@protonmail.com>
* Add bloated BOM for ingestion performance testing Signed-off-by: nscuro <nscuro@protonmail.com> * Prevent query compilation cache being bypassed for `matchSingleIdentity` queries See DependencyTrack/dependency-track#2540 This also cleans the query from containing weird statements like `(cpe != null && cpe == null)` in case a component does not have a CPE. Signed-off-by: nscuro <nscuro@protonmail.com> * WIP: Improve BOM processing performance Signed-off-by: nscuro <nscuro@protonmail.com> * Handle dependency graph Signed-off-by: nscuro <nscuro@protonmail.com> * Improve dependency graph assembly Instead of using individual bulk UPDATE queries, use setters on persistent components instead. This way we can again make use of batched flushing. Signed-off-by: nscuro <nscuro@protonmail.com> * Completely replace old processing logic Also decompose large processing method into multiple smaller ones, and re-implement notifications. Signed-off-by: nscuro <nscuro@protonmail.com> * Fix not all BOM refs being updated with new component identities Signed-off-by: nscuro <nscuro@protonmail.com> * Be smarter about indexing component identities and BOM refs Also add more documentation Signed-off-by: nscuro <nscuro@protonmail.com> * Reduce logging noise Signed-off-by: nscuro <nscuro@protonmail.com> * Mark new components as such ... via new transient field. Required for compatibility with #217 Signed-off-by: nscuro <nscuro@protonmail.com> * Compatibility with #217 Signed-off-by: nscuro <nscuro@protonmail.com> * Cleanup tests Signed-off-by: nscuro <nscuro@protonmail.com> * Reduce code duplication Signed-off-by: nscuro <nscuro@protonmail.com> * Cleanup; Process services Signed-off-by: nscuro <nscuro@protonmail.com> * Finishing touches 🪄 Signed-off-by: nscuro <nscuro@protonmail.com> * Make flush threshold configurable The optimal value could depend on how beefy the database server is, and how much memory is available to the API server. Signed-off-by: nscuro <nscuro@protonmail.com> * Clarify `warn` log when rolling back active transactions Signed-off-by: nscuro <nscuro@protonmail.com> * Log number of consumed components and services before and after de-dupe Signed-off-by: nscuro <nscuro@protonmail.com> * Extend BOM processing test with bloated BOM Signed-off-by: nscuro <nscuro@protonmail.com> * Make component identity matching strict To address DependencyTrack/dependency-track#2519 (comment). Also add regression test for this specific issue. Signed-off-by: nscuro <nscuro@protonmail.com> * Add regression test for DependencyTrack/dependency-track#1905 Signed-off-by: nscuro <nscuro@protonmail.com> * Clarify why "reachability on commit" is disabled; Add assertion for persistent object state Signed-off-by: nscuro <nscuro@protonmail.com> * Add tests for `equals` and `hashCode` of `ComponentIdentity` Signed-off-by: nscuro <nscuro@protonmail.com> * Address review comments Signed-off-by: nscuro <nscuro@protonmail.com> --------- Signed-off-by: nscuro <nscuro@protonmail.com>
@nscuro has this been addressed per DependencyTrack/hyades-apiserver#218? and would this be a 4.10 potential fix? |
@melba-lopez Yes, this should be fixed in the PR you linked (we have a regression test specifically for this issue). We're planning to backport it (and multiple other improvements we made for Hyades) to DT. |
* Add bloated BOM for ingestion performance testing Signed-off-by: nscuro <nscuro@protonmail.com> * Prevent query compilation cache being bypassed for `matchSingleIdentity` queries See DependencyTrack/dependency-track#2540 This also cleans the query from containing weird statements like `(cpe != null && cpe == null)` in case a component does not have a CPE. Signed-off-by: nscuro <nscuro@protonmail.com> * WIP: Improve BOM processing performance Signed-off-by: nscuro <nscuro@protonmail.com> * Handle dependency graph Signed-off-by: nscuro <nscuro@protonmail.com> * Improve dependency graph assembly Instead of using individual bulk UPDATE queries, use setters on persistent components instead. This way we can again make use of batched flushing. Signed-off-by: nscuro <nscuro@protonmail.com> * Completely replace old processing logic Also decompose large processing method into multiple smaller ones, and re-implement notifications. Signed-off-by: nscuro <nscuro@protonmail.com> * Fix not all BOM refs being updated with new component identities Signed-off-by: nscuro <nscuro@protonmail.com> * Be smarter about indexing component identities and BOM refs Also add more documentation Signed-off-by: nscuro <nscuro@protonmail.com> * Reduce logging noise Signed-off-by: nscuro <nscuro@protonmail.com> * Mark new components as such ... via new transient field. Required for compatibility with #217 Signed-off-by: nscuro <nscuro@protonmail.com> * Compatibility with #217 Signed-off-by: nscuro <nscuro@protonmail.com> * Cleanup tests Signed-off-by: nscuro <nscuro@protonmail.com> * Reduce code duplication Signed-off-by: nscuro <nscuro@protonmail.com> * Cleanup; Process services Signed-off-by: nscuro <nscuro@protonmail.com> * Finishing touches 🪄 Signed-off-by: nscuro <nscuro@protonmail.com> * Make flush threshold configurable The optimal value could depend on how beefy the database server is, and how much memory is available to the API server. Signed-off-by: nscuro <nscuro@protonmail.com> * Clarify `warn` log when rolling back active transactions Signed-off-by: nscuro <nscuro@protonmail.com> * Log number of consumed components and services before and after de-dupe Signed-off-by: nscuro <nscuro@protonmail.com> * Extend BOM processing test with bloated BOM Signed-off-by: nscuro <nscuro@protonmail.com> * Make component identity matching strict To address DependencyTrack/dependency-track#2519 (comment). Also add regression test for this specific issue. Signed-off-by: nscuro <nscuro@protonmail.com> * Add regression test for DependencyTrack/dependency-track#1905 Signed-off-by: nscuro <nscuro@protonmail.com> * Clarify why "reachability on commit" is disabled; Add assertion for persistent object state Signed-off-by: nscuro <nscuro@protonmail.com> * Add tests for `equals` and `hashCode` of `ComponentIdentity` Signed-off-by: nscuro <nscuro@protonmail.com> * Address review comments Signed-off-by: nscuro <nscuro@protonmail.com> --------- Signed-off-by: nscuro <nscuro@protonmail.com> Signed-off-by: mehab <meha.bhargava@citi.com>
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Current Behavior
So, we have a program using our Jenkins. They produce an SBOM using CycloneDX integrated into their build process. Actually, it's 3 SBOMS and then they use the ClycloneDX CLI to merge into one SBOM. Long story but they report the results into a parent project in DependencyTrack and we asked them to report the results into a child project. They did rerun the job and pointed to the child project. The rerun created an SBOM it was submitted to the child project. The child project reported diff results (components and vulnerabilities) ... this raised the question ... why? We ran the Jenkins job again (publishing to the child project) and the numbers changed again ... they started to align closer to the parent project numbers (still have not removed the parent analysis yet). We started to question the SBOM that were generated from the two different builds, we pulled them and did a compare .... identical aside from the metadata which had different serial numbers.
So, to really test this out, we created a dummy Jenkins job that just report the same SBOM to two different test jobs in DependencyTrack (two projects under parent project). Ran test for project one, sending over the SBOM to test project one and looked in DependencyTrack. The project one was now visible, and we compared the reported numbers to the original project ...
This is what we expected:
Components - 1749
Vulnerabilities: Crti: 11, High: 33, Med 28
Risk Score: 364
This is waht we got on first submission:
Components - 2498
Vulnerabilities: Crti: 23, High: 48, Med 37
Risk Score: 725
Now this was kind of shocking, we poked around and double checked settings and the SBOM diff.
We decided to run our test job and report to test job two .... here are the results we got:
This is waht we got on first submission:
Components - 2312
Vulnerabilities: Crti: 28, High: 43, Med 35
Risk Score: 745
Again, shocking that these two were different. And again, double check all .... same conclusion .... config looks good.
We then decided to run our two jobs and re-report the same SBOM to their DependencyTrack projects ... that already had data!
both runs now reported new numbers below for the second submission:
Job 1
Components - 1764
Vulnerabilities: Crti: 12, High: 35, Med 29
Risk Score: 387
Job 2
Components - 1754
Vulnerabilities: Crti: 11, High: 33, Med 28
Risk Score: 364
They pulled closer to the expected but still different .... I can't explain. Same SBOM ....
Thoughts? Have read all the documentation and are we doing something wrong?
Steps to Reproduce
Expected Behavior
See above
Dependency-Track Version
4.7.1
Dependency-Track Distribution
Container Image
Database Server
PostgreSQL
Database Server Version
RDS 13.7
Browser
Microsoft Edge
Checklist
The text was updated successfully, but these errors were encountered: