Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Javascript files may extract incorrect (and very long) pkg name/version #723

Open
marob opened this issue Nov 17, 2023 · 6 comments
Labels
Consider Funding enhancement New feature or request

Comments

@marob
Copy link
Contributor

marob commented Nov 17, 2023

In my use case, I have a projekktor-1.3.09.min.js file (N.B.: this project looks dead, its website links to dead site/advert spam) that starts with the following line (yes, the version doesn't match, but that's not the issue here):

/*! Projekktor v1.2.37 -jarisflash | http://www.projekktor.com | Copyright 2010, 2011, Sascha Kluger, Spinning Airwhale Media, http://www.spinningairwhale.com | GNU General Public License - http://www.projekktor.com/license/
 */

This first line is generated via Grunt.

When parsed through cdxgen parseMinJs function, delimiter is identified as - because of the one at the end of the line (...License - http://...), which leads to a mis-identification of the name and version.

In the end, cdxgen generates:

    {
      "group": "",
      "name": "projekktor-v1.2.37--jarisflash-|-http://www.projekktor.com-|-copyright-2010,-2011,-sascha-kluger,-spinning-airwhale-media,-http://www.spinningairwhale.com-|-gnu-general-public-license",
      "version": "http://www.projekktor.com/license/",
      "scope": "optional",
      "purl": "pkg:npm/projekktor-v1.2.37--jarisflash-%257C-http:%2F%2Fwww.projekktor.com-%257C-copyright-2010%252C-2011%252C-sascha-kluger%252C-spinning-airwhale-media%252C-http:%2F%2Fwww.spinningairwhale.com-%257C-gnu-general-public-license@http:%2F%2Fwww.projekktor.com%2Flicense%2F",
      "type": "library",
      "bom-ref": "pkg:npm/projekktor-v1.2.37--jarisflash-%7C-http://www.projekktor.com-%7C-copyright-2010%2C-2011%2C-sascha-kluger%2C-spinning-airwhale-media%2C-http://www.spinningairwhale.com-%7C-gnu-general-public-license@http://www.projekktor.com/license/",
      "evidence": {... },
      "properties": [...]
    },

This incorrect name/version leads to generating an error in Dependency Track when feeding it that SBOM because of the purl exceeding 255 chars (which is the current size limit):

ERROR [BomUploadProcessingTask] Error while processing bom
javax.jdo.JDOFatalUserException: Attempt to store value "pkg:npm/projekktor-v1.2.37--jarisflash-%257C-http%3A%2F%2Fwww.projekktor.com-%257C-copyright-2010%252C-2011%252C-sascha-kluger%252C-spinning-airwhale-media%252C-http%3A%2F%2Fwww.spinningairwhale.com-%257C-gnu-general-public-license@http%3A%2F%2Fwww.projekktor.com%2Flicense%2F" in column ""PURLCOORDINATES"" that has maximum length of 255. Please correct your data!
@prabhu
Copy link
Contributor

prabhu commented Nov 17, 2023

@marob, good investigation.

cdxgen/utils.js

Line 1365 in b397bae

const tmpPV = l.split(delimiter);

Could we change this line to below to see if it works? You can send a PR including this and the version fix.

const tmpPV = l.split(delimiter).split("--")[0];

@marob
Copy link
Contributor Author

marob commented Nov 17, 2023

@prabhu That would'n work for 2 reasons:

  • l.split(delimiter) returns an array that doesn't have a split function
  • the -- String would only appear after executing .replace(/ /g, "-") several lines below

I was wondering on which specification you based your code to try to parse the package name and version? I didn't find any specification on how those comments should be written in JavaScript files (minified or not).
Whatever change we'll add in the code, I think we should have several example files to add as unit tests to make sure the code is parsing the data as expected.

@prabhu
Copy link
Contributor

prabhu commented Nov 17, 2023

@marob There are no specifications like you figured. So this will be identifying things based on some patterns so there is some visibility into the min files that are being stored or bundled. I am open to improving the logic.

@prabhu
Copy link
Contributor

prabhu commented Nov 20, 2023

@marob, are you planning to send a pull request for this?

@marob
Copy link
Contributor Author

marob commented Nov 20, 2023

@prabhu Maybe, but probably not in the near future. If you have time and an idea on how to improve the parsing, feel free to do it.
In my mind, we should try to grab several (tens or hundreds of) JS minified files examples to guess the usual format of the comment header and then write a test to check that our (new) code can parse all the identified formats.

@prabhu
Copy link
Contributor

prabhu commented Nov 20, 2023

@prabhu Maybe, but probably not in the near future. If you have time and an idea on how to improve the parsing, feel free to do it.

In my mind, we should try to grab several (tens or hundreds of) JS minified files examples to guess the usual format of the comment header and then write a test to check that our (new) code can parse all the identified formats.

Same. Let's keep this open and see if anyone else contributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Consider Funding enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants