-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pacman (alpm) parser support #943
Conversation
@Foxboron nice addition! We've been wanting to add this for a while now 🙌 .
If getting the mtree parsing implemented is difficult, the simple file list would be awesome to get in as a first pass. As for when to/not-to pull in a go dependency, our rule of thumbs are a) as long as it's not too big (subjective hand wave... it's important to check to see what the impacts here are have a conversation) , b) there is no shelling out for core functionality, and c) must not require CGO. From a cursory check, it looks like
Typically this is a "it depends", in this case I think it makes sense to do some filtering (short answer is "yes", include directories). Let me drop an example in: kmod file listing...``` [root@ba0c9a920794 kmod-29-2]# cat files %FILES% etc/ etc/depmod.d/ etc/modprobe.d/ usr/ usr/bin/ usr/bin/depmod usr/bin/insmod usr/bin/kmod usr/bin/lsmod usr/bin/modinfo usr/bin/modprobe usr/bin/rmmod usr/include/ usr/include/libkmod.h usr/lib/ usr/lib/depmod.d/ usr/lib/depmod.d/search.conf usr/lib/libkmod.so usr/lib/libkmod.so.2 usr/lib/libkmod.so.2.3.7 usr/lib/modprobe.d/ usr/lib/pkgconfig/ usr/lib/pkgconfig/libkmod.pc usr/share/ usr/share/bash-completion/ usr/share/bash-completion/completions/ usr/share/bash-completion/completions/kmod usr/share/libalpm/ usr/share/libalpm/hooks/ usr/share/libalpm/hooks/60-depmod.hook usr/share/libalpm/scripts/ usr/share/libalpm/scripts/depmod usr/share/man/ usr/share/man/man5/ usr/share/man/man5/depmod.d.5.gz usr/share/man/man5/modprobe.d.5.gz usr/share/man/man5/modules.dep.5.gz usr/share/man/man5/modules.dep.bin.5.gz usr/share/man/man8/ usr/share/man/man8/depmod.8.gz usr/share/man/man8/insmod.8.gz usr/share/man/man8/kmod.8.gz usr/share/man/man8/lsmod.8.gz usr/share/man/man8/modinfo.8.gz usr/share/man/man8/modprobe.8.gz usr/share/man/man8/rmmod.8.gz ```It seems that the owned files/directories are really the leaves of the filetree. In this case that would mean the final file / directory list should be:
Additionally, we also filter for package manager files claims against what is actually exist in the image/directory -- we drop files/directories that we can't find evidence of. (see an example in the RPMDB parser: syft/syft/pkg/cataloger/rpmdb/parse_rpmdb.go Lines 98 to 99 in 1c2254f
A good edge case, probably not. Shout out if you have any additional questions or want to pair on the PR at all 🙌 |
Hm, actually. Arch heavily uses The question could be framed if syft intends to index the installed package as it's installed or just record the known metadata of the installed package?
So just to be clear the file should be |
I misunderstood! Thanks for reasking/rephrasing -- yes, my preference would be to include
The package catalogers are really meant to raise up raw information from packaging information and we specifically don't validate or check these claims within the package cataloger itself (such as verifying digests or getting file metadata information). Instead we have separate file catalogers where their only job is to collect this kind of information for all files within the image/directory (e.g. file digests cataloger and the file metadata cataloger... note: neither of these are switched on by default, but can be switched on via application configuration. We're thinking about potentially turning these on by default in the future ). So the short answer is we shouldn't use |
EDIT: New revisions implements a full Cataloger like how the rpmdb does. More flexibility in the future |
Reimplemented the catalog, should be better now. I haven't quite figured out how to filter the file listing in a good way. But could maybe be deferred to another pr? Not sure what the file listing if used for currently. The relationships is also not implemented so just some cleanup left probably. EDIT: Well, all the tests are missing as well :) |
I have not filtered the file listing as we discussed. I thought about it but didn't see a simple way to implement this, and the same problem exists in the apk, rpm and deb parser so I don't think it's super important. Thus I think the PR is ready for review :) |
Almost forgot, this also includes an extension to the os-release parser to include
|
Fixes anchore#241 Signed-off-by: Morten Linderud <morten@linderud.pw>
Signed-off-by: Morten Linderud <morten@linderud.pw>
Signed-off-by: Morten Linderud <morten@linderud.pw>
This includes 4 new fields from the os-release specification. The important one is BuildID as some rolling release distros doesn't include a VersionID. This also includes another version fallback on BuildID which is present on Arch Linux. Signed-off-by: Morten Linderud <morten@linderud.pw>
We refactor the parser functions for easier testing. This allows us to compare the ALPM metadata struct instead of only having the Package struct returned. Implement file database into the standard parser. There is strictly no need to have two parser functions. There are only a few different fields in each database and they can both just have the entire package struct returned. Internally we never really care about the file database, but we need the backup array. Signed-off-by: Morten Linderud <morten@linderud.pw>
I have a few changes locally for this, been writing up the purl spec for ALPM/Pacman and partially looking at a grype implementation (neat practical test of Is there anything I should be doing to get attention on this pull-request or is it just currently a lack of time? |
Hey @Foxboron! Sorry for the delay in getting to this PR. Do you mind if I check out your branch and make a few commits to help get this over the line and cleaned it up for the CI checks? While doing that I'll use that time to get familiar with the changes. Thanks again for all the help and work you've already done on this PR. |
Signed-off-by: Morten Linderud <morten@linderud.pw>
Yooo @spiffcs Sure, that is fine with me. Feel free to ask if you have any questions :) |
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@Foxboron sorry for the delay here. I was out sick for the past week+. I made some updates that fixed the static analysis and am doing the unit/integration tests now. After all the status checks are passed I'll provide PR comments and we'll get this moving in the right direction. Thanks again for the contribution here! Can't wait to see it in syft. |
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
No worries :) Thanks for working on this! If you need to understand anything alpm and/or Arch specific about the patch set feel free to reach out and I'll do my best to explain how everything works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass - I still need to touch the parse_alpm_db.go
and brush up on the specification here to give the best review I can. The tests make it super easy to read BTW so thanks for that.
The other changes I'm not so sure about is the url.go
change. with the qualifiers.
Why is the update there to change it so we're appending multiple separate fields of the release?
Thanks again so much for getting the feature underway here!
} | ||
|
||
// PackageURL returns the PURL for the specific Arch Linux package (see https://github.com/package-url/purl-spec) | ||
func (m AlpmMetadata) PackageURL(distro *linux.Release) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just was trying to update the integration tests here and found a panic where distro
was nil, but ID
was being accessed. I'm not sure if it's a case where we have to update the image we're testing, but it got me wondering if we should protect here against distro detection failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure if this is a case that could happen in the wild? We can probably guard it but to me it sounds like passing a nil distro
variable would should just be a hard error somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking with @wagoodman yesterday and I think he has seen cases where images we have pulled in do not have any detectable distro. Let me see if I can update the release detection code here so that if that's the case we don't return a nil pointer but instead an empty release:
https://github.com/anchore/syft/blob/706f2916791acd8b7355126c9b2b166d0dfcf543/syft/linux/identify_release.go
Opinion: we might want to still try to get an SBOM in cases where this release struct cannot be generated instead of a hard error, but I defer to @wagoodman there.
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great; I also ran it on archlinux:latest
and it worked as expected 👍
Version string `mapstructure:"version" json:"version"` | ||
Description string `mapstructure:"desc" json:"description"` | ||
Architecture string `mapstructure:"arch" json:"architecture"` | ||
Size int `mapstructure:"size" json:"size" cyclonedx:"size"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there only one cyclonedx
tag here? I don't see where this metadata type is outputting these values elsewhere in CDX
} | ||
|
||
// nolint:funlen | ||
func parseDatabase(b *bufio.Scanner) (*pkg.AlpmMetadata, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to have a comment with an example entry to more easily understand what's being parsed
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
} | ||
|
||
return packageurl.NewPackageURL( | ||
"alpm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
under https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst there is no pURL type for alpm
(which is fine, this is pretty common), however, under "other candidate types to define" there is "arch
for Arch Linux packages".
I think I like alpm
better, but I wanted to call out that we should probably be defaulting to the candidate arch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wagoodman do you think we should just change this over to "arch" in anticipation of the definition being added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's my vote, but don't feel too strongly. I sense that alpm
is a more accurate name and falls more in line with the same ecosystem decision points made for the existing purl types (e.g. there is maven
not java
), however, my default is leaning towards the initial indications in the purl spec (though it is pretty gray).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested namespace of arch
is incorrect as arch
isn't a package type. alpm
is the framework and pacman
is one of multiple implementations of a frontend.
I believe alpine
used in syft
suffers from the same issue above.
Thus I proposed both the alpm
type and the apk
type to the purl-spec
repositories.
I'm also partially against having a fallback to arch
as any poorly made derivative of Arch should not be tagged as arch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ That's enough for me to be convinced - I'm keeping this as ALPM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the discussion, I'm a-ok with alpm
😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huge addition @Foxboron -- thanks for all the hard work!
@spiffcs I can pull the branch and just triple check the changes over the weekend. I can also do the remaining comments :) |
Hey @Foxboron! Just let me know if the digest portion has to change. I don't see anything wrong with the current approach (including both backup ( |
Signed-off-by: Morten Linderud <morten@linderud.pw>
@spiffcs I've reviewed the change and realized that you do want the slice for the digests. So that change is fine with me :) Included the requested permanent URL to the APK catalog code i borrowed as well. Tested the PR locally and the output looks great as well. All should be fine on my end. Thank you :)! |
🥳 merging after CI |
Brilliant! Great to see this feature added to Syft :) |
* main: (70 commits) fix: add php catalogers to all catalogers (anchore#1065) feat: add use-all-catalogers flag (anchore#1050) Updates parsing of `yarn.lock` to use `resolved` URLs that are pulled from yarn and npm registries (anchore#926) remove OSS Meetup message (anchore#1057) add pom.xml cataloger (anchore#1055) Add support for CBL-Mariner distroless images (anchore#1045) Add catalogers configuration (anchore#1038) add template output (anchore#1051) update stereoscope to latest version (anchore#1052) update zip_read_closer to incorporate zip64 support (anchore#1041) Add pacman (alpm) parser support (anchore#943) Update of README.md (anchore#1027) bump cosign to v1.9.0 to resolve reporting of GHSA-66x3-6cw3-v5gj (anchore#1025) add workflows to test new project automation (anchore#1023) improve LanguageByName and add unit tests (anchore#1034) Read Description from dpkg status files (anchore#996) Add announcement for Anchore OSS Virtual Meetup (anchore#1033) add main module field to go bin metadata (anchore#1026) Add filters to package cataloger (anchore#1021) change draft to false for release process (anchore#1016) ... Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Co-authored-by: Christopher Phillips <christopher.phillips@anchore.com>
Co-authored-by: Christopher Phillips <christopher.phillips@anchore.com>
This is a WIP parser for alpm/pacman packages.
alpm also doesn't list any id/gid nor permissions in the files database. It's purely a file list. There is an
mtree
file that contains all this information, but we need a parser for that. Would something like this be fine? https://github.com/vbatts/go-mtreeOpened as a draft as this is just a quick hack on my end. Would be nice to have an ack/nack if I'm on the right path with this implementation.
Example artifacts
Fixes #241