Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: requirements.txt - return unicode only letter/num for version #1361

Merged
merged 3 commits into from
Nov 22, 2022

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Nov 22, 2022

Summary

Closes #1360 - Trim space fails to cut the end of a string given a blocking \

This fix updates the code to use strings.TrimFunc to only return strings with valid Unicode letter/number.

cc @robcresswell

Signed-off-by: Christopher Phillips christopher.phillips@anchore.com

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs added the bug Something isn't working label Nov 22, 2022
Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- just a thought: it looks like the backslash is a typical line continuation. I wonder if this would work if we replaced \\\n with empty string first and then processed versions, or if the --hash=... stuff would muck that all up?

@github-actions
Copy link

github-actions bot commented Nov 22, 2022

Benchmark Test Results

Benchmark results from the latest changes vs base branch
name                                                       old time/op    new time/op    delta
ImagePackageCatalogers/alpmdb-cataloger-2                    13.3ms ± 1%    13.7ms ± 5%     ~     (p=0.056 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2              1.69ms ±18%    1.69ms ±17%     ~     (p=0.841 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2            3.89ms ± 2%    3.94ms ± 4%     ~     (p=0.548 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2    1.31ms ± 5%    1.33ms ± 2%     ~     (p=0.690 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2         947µs ± 4%     936µs ± 5%     ~     (p=0.690 n=5+5)
ImagePackageCatalogers/node-binary-cataloger-2               8.21µs ± 4%    7.39µs ± 4%  -10.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                    1.11ms ± 3%    1.08ms ± 1%     ~     (p=0.056 n=5+5)
ImagePackageCatalogers/rpm-db-cataloger-2                    1.55ms ± 1%    1.56ms ± 3%     ~     (p=0.730 n=4+5)
ImagePackageCatalogers/java-cataloger-2                      18.4ms ± 4%    17.3ms ± 2%   -6.32%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                     1.64ms ± 5%    1.59ms ± 6%     ~     (p=0.421 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2          8.02µs ± 2%    7.48µs ± 4%   -6.80%  (p=0.008 n=5+5)
ImagePackageCatalogers/dotnet-deps-cataloger-2               1.62ms ± 2%    1.68ms ± 2%   +4.04%  (p=0.008 n=5+5)
ImagePackageCatalogers/portage-cataloger-2                    867µs ± 2%     895µs ± 1%   +3.15%  (p=0.008 n=5+5)
ImagePackageCatalogers/sbom-cataloger-2                      5.27ms ± 3%    5.23ms ± 1%     ~     (p=0.841 n=5+5)

name                                                       old alloc/op   new alloc/op   delta
ImagePackageCatalogers/alpmdb-cataloger-2                    5.26MB ± 0%    5.27MB ± 0%     ~     (p=0.056 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2               205kB ± 0%     205kB ± 0%     ~     (p=0.690 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2             961kB ± 0%     961kB ± 0%     ~     (p=1.000 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2     217kB ± 0%     217kB ± 0%     ~     (p=0.690 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2         159kB ± 0%     159kB ± 0%   +0.07%  (p=0.008 n=5+5)
ImagePackageCatalogers/node-binary-cataloger-2               1.12kB ± 0%    1.12kB ± 0%     ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                     199kB ± 0%     199kB ± 0%   +0.06%  (p=0.032 n=5+5)
ImagePackageCatalogers/rpm-db-cataloger-2                     302kB ± 0%     302kB ± 0%   -0.15%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                      3.49MB ± 0%    3.49MB ± 0%     ~     (p=0.222 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                     1.26MB ± 0%    1.26MB ± 0%     ~     (p=0.310 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2          1.12kB ± 0%    1.12kB ± 0%     ~     (all equal)
ImagePackageCatalogers/dotnet-deps-cataloger-2                374kB ± 0%     375kB ± 0%   +0.13%  (p=0.008 n=5+5)
ImagePackageCatalogers/portage-cataloger-2                    138kB ± 0%     138kB ± 0%     ~     (p=0.690 n=5+5)
ImagePackageCatalogers/sbom-cataloger-2                       722kB ± 0%     722kB ± 0%   +0.02%  (p=0.008 n=5+5)

name                                                       old allocs/op  new allocs/op  delta
ImagePackageCatalogers/alpmdb-cataloger-2                     85.7k ± 0%     85.7k ± 0%     ~     (p=0.167 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2               4.24k ± 0%     4.24k ± 0%     ~     (p=0.444 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2             16.5k ± 0%     16.5k ± 0%     ~     (p=0.857 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2     5.50k ± 0%     5.50k ± 0%     ~     (p=0.881 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2         3.33k ± 0%     3.33k ± 0%     ~     (all equal)
ImagePackageCatalogers/node-binary-cataloger-2                 38.0 ± 0%      38.0 ± 0%     ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                     4.46k ± 0%     4.46k ± 0%     ~     (all equal)
ImagePackageCatalogers/rpm-db-cataloger-2                     8.11k ± 0%     8.11k ± 0%     ~     (all equal)
ImagePackageCatalogers/java-cataloger-2                       57.5k ± 0%     57.5k ± 0%     ~     (p=0.889 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                      5.45k ± 0%     5.45k ± 0%     ~     (p=0.722 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2            38.0 ± 0%      38.0 ± 0%     ~     (all equal)
ImagePackageCatalogers/dotnet-deps-cataloger-2                7.12k ± 0%     7.12k ± 0%     ~     (all equal)
ImagePackageCatalogers/portage-cataloger-2                    3.58k ± 0%     3.58k ± 0%     ~     (all equal)
ImagePackageCatalogers/sbom-cataloger-2                       24.4k ± 0%     24.4k ± 0%     ~     (all equal)

@spiffcs
Copy link
Contributor Author

spiffcs commented Nov 22, 2022

LGTM -- just a thought: it looks like the backslash is a typical line continuation. I wonder if this would work if we replaced \\\n with empty string first and then processed versions, or if the --hash=... stuff would muck that all up?

for scanner.Scan() in the changed code block only knows about line = "argh==0.26.2. \"
With our current implementation for scanner.Scan() will do 3 passes and not treat \ as a line continuation.

line = argh==0.26.2 \
line =  --hash=sha256:a9b3aaa1904eeb78e32394cd46c6f37ac0fb4af6dc488daa58971bdc7d7fcaf3 \
line =  --hash=sha256:e9535b8c84dc9571a48999094fda7f33e63c3f1b74f3e5f3ac0105a58405bb65

Are you asking if instead we zipped those above 3 into a single line and parsed that?

line = argh==0.26.2 --hash=sha256:a9b3aaa1904eeb78e32394cd46c6f37ac0fb4af6dc488daa58971bdc7d7fcaf3 --hash=sha256:e9535b8c84dc9571a48999094fda7f33e63c3f1b74f3e5f3ac0105a58405bb65

@kzantow
Copy link
Contributor

kzantow commented Nov 22, 2022

@spiffcs yeah, exactly -- I'm mostly wondering if all requirements.txt will have this exact split or some might put them on a single line, so if we pre-processed the continued lines, in this case as you say to:

argh==0.26.2 --hash=sha256:a9b3aaa1904eeb78e32394cd46c6f37ac0fb4af6dc488daa58971bdc7d7fcaf3 --hash=sha256:e9535b8c84dc9571a48999094fda7f33e63c3f1b74f3e5f3ac0105a58405bb65

And then extracted the version, this may make processing more consistent for other cases.

I'm fine with the change as is, though, especially if this suggestion would make it a significantly larger change (e.g. we're always looking for version at the end of a line and now we need to do other stuff).

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@spiffcs
Copy link
Contributor Author

spiffcs commented Nov 22, 2022

@kzantow I added a case to cover the negative here:
if a version string exists where the line continuation is not used, we are still able to select the version correctly

I'm going to read more about requirements.txt to understand the edge cases of what else could be affected if we remove \\\n and flatten the continuations. I want to be sure if we go that route that we handle other common line constructs correctly

@spiffcs spiffcs changed the title fix: return unicode version only letter/num fix: requirements.txt return unicode version only letter/num Nov 22, 2022
@spiffcs spiffcs changed the title fix: requirements.txt return unicode version only letter/num fix: requirements.txt - return unicode only letter/num for version Nov 22, 2022
@spiffcs spiffcs merged commit bcfe38c into main Nov 22, 2022
@spiffcs spiffcs deleted the 1360-python-requirements-bug branch November 22, 2022 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Python cataloging fails to remove some non-version characters from version string
2 participants