Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBOMs are not the same on multiple runs of syft #1944

Closed
tomersein opened this issue Jul 17, 2023 · 12 comments
Closed

SBOMs are not the same on multiple runs of syft #1944

tomersein opened this issue Jul 17, 2023 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@tomersein
Copy link
Contributor

What happened:
I did some tests to the convert endpoint in order to check results are the same -
1st output:

  1. I run syft -o json
  2. I run syft convert cyclonedx
    2nd output:
  3. I run syft -o cyclonedx-json

I compared the results and saw a difference in the purl line:

image

What you expected to happen:
same output

Steps to reproduce the issue:
I've attached the .war file to the slack channel since I can't put it here.
here is the link to the slack thread in order to reproduce:
https://anchorecommunity.slack.com/archives/C019BUXV7R6/p1689149422624669

Anything else we need to know?:

Environment:

  • Output of syft version: 0.84.1
  • OS (e.g: cat /etc/os-release or similar): alpine
@tomersein tomersein added the bug Something isn't working label Jul 17, 2023
@spiffcs spiffcs self-assigned this Jul 17, 2023
@spiffcs
Copy link
Contributor

spiffcs commented Jul 17, 2023

Thanks for filing the issue @tomersein - let me try and reproduce on my end and I'll write up what I think could be the issue here

@tomerse-sg
Copy link

hi, were you able to reproduce? @spiffcs

@spiffcs
Copy link
Contributor

spiffcs commented Jul 19, 2023

👋 hi @tomerse-sg - I saw the difference in the PURL that you had mentioned in the output files on Slack, however when building the image locally using the provided casb.war and Dockerfile I only saw the following:

syft version
Application:        syft
Version:            0.85.0
JsonSchemaVersion:  9.0.0
BuildDate:          2023-07-12T17:14:54Z
GitCommit:          4fc17edd146af34ab06f5b0443ef8ddac3aaf076
GitDescription:     [not provided]
Platform:           darwin/amd64
GoVersion:          go1.20.6
Compiler:           gc

Dockerfile downloaded - built 1944:test

FROM alpine:latest

RUN apk update && apk add python3

COPY casb.war . 

Generate json sbom

syft -o json 1944:test > 1944.json

convert to cyclonedx

syft convert ./1944.json -o cyclonedx-json=1944.cdx

Screenshot of same PURL (top cyclonedx bottom original json output)
Screenshot 2023-07-19 at 9 41 20 AM

Was there a reproduction step I might have missed here? I never saw the org.w3c.dom purl value generated

@tomerse-sg
Copy link

weird,
I've tried to follow again my steps and received the same results:
image

Not sure why I see the diff.
In the flow, are they any different between converting the sbom via -o and converting it via the endpoint command?
If I understood correctly they are both build from the syft-json

@spiffcs
Copy link
Contributor

spiffcs commented Jul 19, 2023

In the flow, are they any different between converting the sbom via -o and converting it via the endpoint command?
If I understood correctly they are both build from the syft-json

Can you copy paste the exact syft commands you're referring to that produced the above output? It might help me find the path you're going down =)

Do you see no issue if you use something like syft convert ./1944.json -o cyclonedx-json=1944.cdx?

@tomerse-sg
Copy link

➜ compare-convert syft war -o cyclonedx-json &> cyclone-war.json
➜ compare-convert syft war -o json &> json-war.json
➜ compare-convert syft convert json-war.json -o cyclonedx-json &> convert-cyclone-war.json
➜ compare-convert diff convert-cyclone-war.json cyclone-war.json

@spiffcs
Copy link
Contributor

spiffcs commented Jul 20, 2023

@tomerse-sg - I reproduced this yesterday thanks to your commands - I was pulled into a quick other thing but am now taking a look again at what a fix looks like for this

@tomersein
Copy link
Contributor Author

can you share what is the root cause? just curious since I don't see any differences in the flow

@spiffcs
Copy link
Contributor

spiffcs commented Jul 24, 2023

@tomersein

can you share what is the root cause? just curious since I don't see any differences in the flow

I'm still trying to find the root cause - when I ran the commands on Friday I was able to reproduce the PURL diff for the XML-API package.

The other diffs I'm not concerned about as those are values that will change for newly generated documents.

I tried to run this again this morning through my debugger and am getting equal PURL strings for the commands supplied above.

 just curious since I don't see any differences in the flow

The original difference in our flows was that I was comparing the PURL from json --> cdx to see if it had been mangled during the conversion.

In the flow provided by @tomerse-sg a cyclonedx document was generated by syft as the original comparison. This would lead to exploring if the PURL generation went through different paths when doing convert vs doing the original document.

Are you using syft v0.85.0 for the above - I'm looking through our updates now to see if there is a chance this was "fixed" or modified at all between versions since I saw the original output was for v0.84.1

Screenshot 2023-07-24 at 12 04 17 PM

Update: There is definitely some non deterministic behavior here which is causing the confusion. The good news is that I've been able to reproduce the diff without doing the convert step:

json-war.json ----> pkg:maven/org.xml.sax/xml-apis@1.0.b2
cyclone-war.json ---> pkg:maven/org.w3c.dom/xml-apis@1.0.b2

Different PURL for the same package here across formats is not what we want.

Getting closer to the root cause, it looks like there are java packages (in this case jaxb-core and xml-api) that can sometimes have multiple groupID values. I'm working through right now if we should create separate packages or find a more deterministic way to pick the groupID winner in this case.

Thanks for all the information and for filing the issue! Non deterministic behavior is being seen in v0.85.0 and is not specific to a package

@kzantow
Copy link
Contributor

kzantow commented Jul 26, 2023

It looks like this is either related to, or a duplicate of: #1464

@spiffcs
Copy link
Contributor

spiffcs commented Jul 26, 2023

Quick update on this one - I've fixed the PURL/GroupID association to be deterministic across package generation:
#2033

The final thing being looked at is when we add pomProperties to the package metadata details

For the one of the above cases we've found that it can be matched on one of the following:

META-INF/maven/com.sun.xml.bind/jaxb-core/pom.properties
META-INF/maven/org.glassfish.jaxb/jaxb-core/pom.properties

The bundle license value for the manifest is http://glassfish.java.net/public/CDDL+GPL_1_1.html so it's my current understanding that we should source the pom proeprties from the 2nd path in the above case.

Basically the update here consists of making the archive parser a bit smarter so that rather than selecting at random from multiple glob matches, we're putting the correct properties/manifest together for a single package.

Here is the uniqueness check that's causing packages to be merged:

// the pom artifactId is the parent name
// note: you CANNOT use name-is-subset-of-artifact-id or vice versa --this is too generic. Shaded jars are a good
// example of this: where the package name is "cloudbees-analytics-segment-driver" and a child is "analytics", but
// they do not indicate the same package.
// NOTE: artifactId might not be a good indicator of uniqueness since archives can contain forks with the same name
// from different groups (e.g. "org.glassfish.jaxb.jaxb-core" and "com.sun.xml.bind.jaxb-core")
// we will use this check as a last resort
if metadata.PomProperties != nil {
if metadata.PomProperties.ArtifactID != "" && parentPkg.Name == metadata.PomProperties.ArtifactID {
return true
}
}

@wagoodman wagoodman changed the title SBOMs are not the same if I run syft & convert SBOMs are not the same on multiple runs of syft Jul 27, 2023
@spiffcs
Copy link
Contributor

spiffcs commented Sep 12, 2023

@tomersein as of the latest release of syft we should be deterministically generating the PURL and generating a package for both orgs given your example. If this is not the case let me know and I will reopen the issue.

Related PR:
#2075
#2080
#2033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
4 participants