Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser for dotnet_portable_executable using wrong attribute name. #2029

Closed
Roxedus opened this issue Aug 14, 2023 · 9 comments · Fixed by #2133
Closed

Parser for dotnet_portable_executable using wrong attribute name. #2029

Roxedus opened this issue Aug 14, 2023 · 9 comments · Fixed by #2133
Assignees
Labels
bug Something isn't working

Comments

@Roxedus
Copy link

Roxedus commented Aug 14, 2023

What happened:

As part of v0.86.0 the release, the dotnet portable executable parser were introduced, the implementation does seem to be somewhat flawed, as it uses the dontnet projects FileDescription attribute as the name for the item. This does not seem like the best attribute to use, as both ProductName and InternalName fits better.

This results in scans reporting "names" that seems to require immediate action to remove.

What you expected to happen:

I would expect any project that depends on cli-prompt as PHP Composer dependency to have Hidden Input or hiddeninput as a item, rather than Reads from stdin without leaking info to the terminal and outputs back to stdout

Steps to reproduce the issue:

Scan a image with said Composer dependency:

docker run --rm \
  ghcr.io/anchore/syft:v0.86.0 \
    ghcr.io/linuxserver/grav:1.7.42.3-ls124

Which results in the offending result as:

NAME                                                                              VERSION                 TYPE         
Reads from stdin without leaking info to the terminal and outputs back to stdout  1, 0, 0, 0              dotnet

Anything else we need to know?:

Assumed line causing issues: parse_dotnet_portable_executable.go#L43

Published SBOM from scan: package_versions.txt

Tracked-down project with the description: hidden-input

@Roxedus Roxedus added the bug Something isn't working label Aug 14, 2023
@SuglBug2k
Copy link

SuglBug2k commented Aug 16, 2023

I think i have a similar problem with the prometheus-net library. The published dll has close to no meta data.
Despite the missing metadata the output in v.0.85.0 is fine:
musl-utils 1.2.4-r1 apk
prometheus-net 8.0.0 dotnet
prometheus-net.AspNetCore 8.0.0 dotnet
scanelf 1.3.7-r1 apk

and in v.0.86.0 the name is empty what cases errors in my further processing:
NAME VERSION TYPE
8.0.0 dotnet
BouncyCastle.NET Cryptography (net6.0) 2.2.1.47552 dotnet
Dapper 2.0.143.55328 dotnet

@kzantow
Copy link
Contributor

kzantow commented Aug 16, 2023

@SuglBug2k are you able to provide an image or dockerfile that could be used to get some of these dotnet artifacts exhibiting this behavior?

@mytracks
Copy link

I have exactly the same problem. In my case the problematic assemblies are Prometheus.NetStandard.dll and Prometheus.AspNetCore.dll. They are part of this NUGET https://www.nuget.org/packages/prometheus-net.AspNetCore/8.0.1

@nemchik
Copy link

nemchik commented Aug 17, 2023

In case you need specific links for the OP

The dockerfile
https://github.com/linuxserver/docker-grav/blob/1.7.42.3-ls124/Dockerfile
Example build command
https://github.com/linuxserver/docker-grav/blob/1.7.42.3-ls124/Jenkinsfile#L481-L495
The command we're using to run syft
https://github.com/linuxserver/docker-grav/blob/1.7.42.3-ls124/Jenkinsfile#L586-L590
The syft output
https://github.com/linuxserver/docker-grav/blob/1.7.42.3-ls124/package_versions.txt

The commit where we noticed the difference
linuxserver/docker-grav@dc14e83
This made most lines in the file appear "changed" because the column width changed

@kzantow kzantow self-assigned this Aug 21, 2023
@selzoc
Copy link
Contributor

selzoc commented Aug 25, 2023

I have seen something related when scanning a Windows Server 2019 system (not an image). The code seems to rely on the fact that a FileDescription and FileVersion is either returned or blank from the pe module. However, I have encountered ~600 examples on a running Windows system that are missing either or both of those values. According to the Microsoft docs, the returned value may be null.

I'm not sure of the internals of the pe module, but the behavior for missing attributes seems to be that it returns " ", not "" as is being checked in the code. This leads to name being assigned to a blank character, which is then fed into the purl generator and escaped as %20 and resulting in invalid purls (name is required) such as pkg:nuget/%20@1.0.0.0.

I originally was going to propose a pr along the lines of:

diff --git a/syft/pkg/cataloger/dotnet/parse_dotnet_portable_executable.go b/syft/pkg/cataloger/dotnet/parse_dotnet_portable_executable.go
index b57b2f06..20f2c6e3 100644
--- a/syft/pkg/cataloger/dotnet/parse_dotnet_portable_executable.go
+++ b/syft/pkg/cataloger/dotnet/parse_dotnet_portable_executable.go
@@ -3,6 +3,7 @@ package dotnet
 import (
 	"fmt"
 	"io"
+	"strings"
 
 	"github.com/saferwall/pe"
 
@@ -41,13 +42,13 @@ func parseDotnetPortableExecutable(_ file.Resolver, _ *generic.Environment, f fi
 	}
 
 	name := versionResources["FileDescription"]
-	if name == "" {
+	if strings.TrimSpace(name) == "" {
 		log.Tracef("unable to find FileDescription in PE file: %s", f.RealPath)
 		return nil, nil, nil
 	}
 
 	version := versionResources["FileVersion"]
-	if version == "" {
+	if strings.TrimSpace(name) == "" {
 		log.Tracef("unable to find FileVersion in PE file: %s", f.RealPath)
 		return nil, nil, nil
 	}

to handle this case.

However, it seems to me that this cataloger is very overly broad - it seems to look for literally any file that ends in .dll or .exe. Needless to say, there are many of these such files that are not .net (and especially not nuget) packages.

I see tha this is in progress, happy to help test anything out on some live Windows systems or such - let me know!

@kzantow
Copy link
Contributor

kzantow commented Sep 14, 2023

Hi all (@selzoc @Roxedus @nemchik @mytracks) - I don't happen to have a large set of files to check, but there doesn't seem to be a great answer to definitively fix this.

There are a few fields that seem to be filled in some of the time, but it seems fairly arbitrary what the best field to use is for any given file. A couple examples of this:

The original image in the description (ghcr.io/linuxserver/grav:1.7.42.3-ls124) has the Hidden Input, which includes this information:

FileDescription -> Reads from stdin without leaking info to the terminal and outputs back to stdout
FileVersion -> 1, 0, 0, 0
InternalName -> hiddeninput
LegalCopyright -> Jordi Boggiano - 2012
OriginalFilename -> hiddeninput.exe
ProductName -> Hidden Input
ProductVersion -> 1, 0, 0, 0

... in this case, ProductName seems like the best name here, possibly InternalName.

In the set of samples from the portable executable parsing library we use, a number of these are similar to:

CompanyName -> Microsoft Corporation
FileDescription -> Microsoft Direct2D Debug Layer Library
FileVersion -> 6.3.9600.16384 (winblue_rtm.130821-1623)
InternalName -> D2D1Debug2
LegalCopyright -> © Microsoft Corporation. All rights reserved.
OriginalFilename -> D2D1Debug2
ProductName -> Microsoft® Windows® Operating System
ProductVersion -> 6.3.9600.16384

... in this case, the ProductName doesn't make a lot of sense. The InternalName seems reasonable, and maybe the FileDescription is what could be preferred. And another example:

CompanyName -> Microsoft Corporation
FileDescription -> Windows NT BASE API Client DLL
FileVersion -> 10.0.20152.1000 (WinBuild.160101.0800)
InternalName -> Kernelbase.dll
LegalCopyright -> © Microsoft Corporation. All rights reserved.
OriginalFilename -> Kernelbase.dll
ProductName -> Microsoft® Windows® Operating System
ProductVersion -> 10.0.20152.1000

... in this case, the FileDescription seems less useful than either the InternalName or the OriginalFilename, possibly, but maybe is better by including the ProductName as a part of the package description.

I would also point out that the version to use is equally as confusing which to pick -- the last example seems to have a ProductVersion that is better, but entries like this seem to have a better FileVersion:

Comments -> System.Buffers
FileDescription -> System.Buffers
FileVersion -> 7.0.923.36201
InternalName -> System.Buffers.dll
CompanyName -> Microsoft Corporation
LegalCopyright -> © Microsoft Corporation. All rights reserved.
OriginalFilename -> System.Buffers.dll
ProductName -> Microsoft® .NET
ProductVersion -> 7.0.9+8e9a17b2216f51a5788f8b1c467a4cf3b769e7d7
Assembly Version -> 7.0.0.0

I've made a draft PR to try to improve this, but would definitely like more examples / feedback, if possible!

Here's some example output from the draft PR:

Scanning https://www.nuget.org/packages/prometheus-net.AspNetCore/8.0.1:

NAME                       VERSION  TYPE   
Prometheus.AspNetCore.dll  8.0.1    dotnet

Scanning ghcr.io/linuxserver/grav:1.7.42.3-ls124:

NAME                                  VERSION                 TYPE         
Hidden Input                          1, 0, 0, 0              dotnet        
...

Scanning the aforementioned test data directory:

NAME                                                                    VERSION                                              TYPE   
Brave Browser                                                           80.1.7.92                                            dotnet  
Internet Explorer                                                       11.00.18362.1                                        dotnet  
Microsoft (R) Visual C++ MFCDLL                                         4.1.001                                              dotnet  
Microsoft(R) Windows NT(R) Operating System Active Setup Job Executer   5.00.0000.1                                          dotnet  
Microsoft® .NET mscorlib                                                5.0.0+cf258a14b70ad9069470a108f13765e0e5988f51       dotnet  
Microsoft® Visual Studio® 2017 MFC140U.DLL                              14.16.26905.0                                        dotnet  
Microsoft® Windows® Operating System D2D1Debug2                         6.3.9600.16384                                       dotnet  
Microsoft® Windows® Operating System Kernelbase.dll                     10.0.20152.1000                                      dotnet  
Microsoft® Windows® Operating System Microsoft UYVY Video Decompressor  10.0.20152.1000                                      dotnet  
Microsoft® Windows® Operating System SgrmEnclave                        10.0.18362.145                                       dotnet  
Microsoft® Windows® Operating System Shim Engine DLL (IAT)              10.0.17763.1                                         dotnet  
Microsoft® Windows® Operating System WDF Coinstaller                    1.11.9200.16384                                      dotnet  
Microsoft® Windows® Operating System kernel32                           10.0.19041.1348                                      dotnet  
Microsoft® Windows® Operating System pspluginwkr.dll                    6.1.7600.16385                                       dotnet  
PowerShell                                                              7.3.4 SHA: b59f05d5a1b2fceca231f75c53c203a02edf6203  dotnet  
PuTTY suite                                                             Release 0.73                                         dotnet

Does this look reasonable? Should we not include the Microsoft prefixes? Any feedback is greatly appreciated!

@Roxedus
Copy link
Author

Roxedus commented Sep 14, 2023

The suggested output looks good to me. I wouldnt strip the prefixes, removes some of the transparency this tool should bring

@youngcm2
Copy link

A good example of FileVersion (2.80.3.0) vs ProductVersion (2.80.3) for SkiaSharp.

So scanning the sbom will not find the entry on osv.

OSV has SkiaSharp as 2.80.3

SBOM Entry

{
      "name": "SkiaSharp",
      "SPDXID": "SPDXRef-Package-dotnet-SkiaSharp-3f9d462e5630d346",
      "versionInfo": "2.80.3.0",
      "supplier": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "filesAnalyzed": false,
      "sourceInfo": "acquired package info from dotnet project assets file: /app/Api/SkiaSharp.dll",
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "copyrightText": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "SECURITY",
          "referenceType": "cpe23Type",
          "referenceLocator": "cpe:2.3:a:SkiaSharp:SkiaSharp:2.80.3.0:*:*:*:*:*:*:*"
        },
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceType": "purl",
          "referenceLocator": "pkg:nuget/SkiaSharp@2.80.3.0"
        }
      ]
    },

Incorrect Curl

curl -d \
  '{"package": {"purl": "pkg:nuget/SkiaSharp@2.80.3.0"}, "version": ""}' \
  "https://api.osv.dev/v1/query"

Correct Curl

curl -d \
  '{"package": {"purl": "pkg:nuget/SkiaSharp@2.80.3"}, "version": ""}' \
  "https://api.osv.dev/v1/query"

@spiffcs
Copy link
Contributor

spiffcs commented Oct 31, 2023

We're keeping this issue open as the main issue for discussion what the future implementation should be for getting the correct data.

The cases provided by these two issues were added as tests and fixed in that their packages should no longer contribute to invalid SBOM. PR is listed above this comment for the bug fix while we keep this open for better solutions.
#2241
#2255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

9 participants