Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cue: unnecessary scan of every directory when loading any package #3155

Closed
Bryant-Cockcroft-IBM opened this issue May 17, 2024 · 11 comments
Assignees
Labels
modules Issues related to CUE modules and the experimental implementation NeedsInvestigation

Comments

@Bryant-Cockcroft-IBM
Copy link

What version of CUE are you using (cue version)?

$ cue version
cue version v0.9.0-alpha.5

go version go1.21.0
      -buildmode exe
       -compiler gc
     CGO_ENABLED 1
          GOARCH ppc64le
            GOOS linux
         GOPPC64 power8
cue.lang.version v0.9.0

Does this issue reproduce with the latest stable release?

No, only on pre-release. Also happens with CUE_EXPERIMENT=modules on earlier v0.9.0 alpha versions. Does not reproduce with the new modules disabled.

What did you do?

cue eval --out json example.com/pkg/foo

I have a large project where cue is only a small component. So the cue module root contains many directories and sub-directories un-related to the cue module. But these directories are separate from the foo package above, so I'd expect them to not effect this evaluation.

What did you expect to see?

I expected similar evaluation time to doing cue eval --out json test.cue Where test.cue just imports the exact same foo package. Which takes less than 1 second in this case.

I expect cue commandline to only look a the foo directory and directories leading to the module root, and nothing else, as described in the package instances https://cuelang.org/docs/concept/modules-packages-instances/#instances

What did you see instead?

The command-line version recursively walks every directory starting at the module root looking for cue.mod. Since my project has 100s if not 1000s of directories in total, this now takes ~14 seconds. (note: it does find one in the actual root, but does not stop until it's searched every directory)

used strace -f -t -e trace=file cue eval --out json example.com/pkg/foo to determine this behavior.

I'm unable to share my actual data, but I can probably create reproducer if necessary.

@Bryant-Cockcroft-IBM Bryant-Cockcroft-IBM added NeedsInvestigation Triage Requires triage/attention labels May 17, 2024
@mvdan
Copy link
Member

mvdan commented May 18, 2024

Thanks for the report. I can reproduce in our repository as well:

$ go install ./cmd/cue && CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
8
$ go install ./cmd/cue && strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
360

Note how the same command without the experiment tries to stat eight cue.mod directories (althoug that is already somewhat wasteful), and the new experiment default statst as many as 360 cue.mod directories - most of which are missing.

@mvdan
Copy link
Member

mvdan commented May 18, 2024

I also don't think this is limited to cue.mod files; when the new modules mode is enabled, we also open (and read and parse) pretty much every CUE file in the entire module, even when they aren't related to a package pattern like ./internal/ci:

$ go install ./cmd/cue && CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue fmt ./internal/ci |& wc -l
51
$ go install ./cmd/cue && strace -f -t -e trace=file cue fmt ./internal/ci |& wc -l
924

So I think this is a larger problem with the new cue/load and not just about statting cue.mod files to find module boundaries.

@mvdan
Copy link
Member

mvdan commented May 18, 2024

I believe https://review.gerrithub.io/c/cue-lang/cue/+/1194921 should fix the issue with cue.mod directories; @Bryant-Cockcroft-IBM let me know how this affects your cue eval times.

cueckoo pushed a commit that referenced this issue May 18, 2024
As reported by a user on a large repository with many hundreds
of directories, CUE_EXPERIMENT=modules now being the default
caused a sudden increase in the number of file operations being done
by cmd/cue when loading packages, causing a slow-down.

We can easily reproduce the issue via strace in our CUE repository,
using `cue fmt ./internal/ci` as an example to load one CUE package.
Where we used to only stat or open 8 cue.mod files, we now do 360:

    $ CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    8
    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    360

The culprit turned out to be AllModuleFiles; the way it needs to skip
walking nested CUE modules did not fit well with the fs.WalkDir API.
Using fs.ReadDir and recursive func calls instead works better:

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    8

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I32dc0c39ea795f795077feff88475d38cb324433
@mvdan
Copy link
Member

mvdan commented May 18, 2024

The number of syscalls across our repo is nearly halved, from about 920 to 540:

$ go install ./cmd/cue && strace -f -t -e trace=file cue fmt ./internal/ci |& wc -l
543

Most of the remaining syscalls relate to us walking the entire module directory tree to list all package imports in CUE files upfront, which is technically not necessary in most cases.

cueckoo pushed a commit that referenced this issue May 18, 2024
As reported by a user on a large repository with many hundreds
of directories, CUE_EXPERIMENT=modules now being the default
caused a sudden increase in the number of file operations being done
by cmd/cue when loading packages, causing a slow-down.

We can easily reproduce the issue via strace in our CUE repository,
using `cue fmt ./internal/ci` as an example to load one CUE package.
Where we used to only stat or open 8 cue.mod files, we now do 360:

    $ CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    8
    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    360

The culprit turned out to be AllModuleFiles; the way it needs to skip
walking nested CUE modules did not fit well with the fs.WalkDir API.
Using fs.ReadDir and recursive func calls instead works better:

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    8

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I32dc0c39ea795f795077feff88475d38cb324433
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1194921
Reviewed-by: Paul Jolly <paul@myitcv.io>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
@mvdan mvdan removed the Triage Requires triage/attention label May 19, 2024
@myitcv myitcv added the modules Issues related to CUE modules and the experimental implementation label May 20, 2024
@mvdan mvdan changed the title cmd/cue: unnecessary scanning every directory for cue.mod cmd/cue: unnecessary scanning every directory when loading any package May 20, 2024
@rogpeppe rogpeppe changed the title cmd/cue: unnecessary scanning every directory when loading any package cmd/cue: unnecessary scan of every directory when loading any package May 20, 2024
@Bryant-Cockcroft-IBM
Copy link
Author

Bryant-Cockcroft-IBM commented May 20, 2024

I do see that about half the elapsed time with that change, but still significantly slower than CUE_EXPERIMENT=modules=0.

0.61s CUE_EXPERIMENT=modules=0
14.07s v0.9.0-alpha.5
7.76s Your change

Most of the remaining syscalls relate to us walking the entire module directory tree

I'm not sure if this is a general problem, or if we're just doing something weird here with our directory structure. But we have a ton of directories that are unrelated to cue that look as if they're in the module.

Taking some guesses at the strace with your change. About 11s total (some overhead from strace itself). 9s of that is spent searching our unrelated directories. When it switches to actually looking at the imports for the files it found, that only takes 1-2s.

For the record, if we decide my use-case is too odd, I think I can workaround this by specifying the files directly instead of the package (because with the choices we made, restructuring might not be an option)

@mvdan
Copy link
Member

mvdan commented May 20, 2024

Thanks @Bryant-Cockcroft-IBM for your update, that halving of wall time makes sense to me - if you have a repository with many directories that don't contain CUE files, the new modules mode used to do two syscalls per directory (one to read its contents, another to check if cue.mod exists inside), and now we do one syscall per directory after my partial fix.

Traversing the entire CUE module directory tree is something that has to happen in some form for commands like cue mod tidy or cue vet ./..., as those need to find all CUE packages. However, for any command involving just one package like cue eval ./some/package, there is a regression with the new CUE_EXPERIMENT=modules default where we still find all CUE packages upfront, even though it's not actually needed.

@rogpeppe is looking into a fix for this for the upcoming release, as we are treating this as a significant enough regression - it will affect any CUE user with a large enough directory structure. It just won't be a trivial fix so it might take us a couple of days to have something ready.

@Bryant-Cockcroft-IBM
Copy link
Author

Thanks that all sounds good to me!

@rogpeppe
Copy link
Member

rogpeppe commented May 21, 2024

In order to fix this issue, we need to evaluate exactly the packages
that are specified on the command line and no more. That's
straightforward when the root packages are listed explicitly, because
we can then trace out their dependencies. But it's a harder job when
there are patterns involved.

Package pattern matching

The cue command supports package patterns of the form xxx...yyy. The
... matches any sequence of characters in a package import path.

In order to do a complete job, we would need to list an arbitrary
number of repositories from the registry, but OCI registries do not in
general support repository enumeration.

Instead of enumerating all possible match candidates, we can use the
build graph itself as the source of possible candidates. This is what
Go does.

However, that approach introduces a potential feedback loop: because
the build graph is limited to modules that contain packages that are
actually used by the main module, the build graph can change if there
are patterns that match packages outside of that set, which can in
turn affect the set of matched packages.

A way forward

The above consideration implies a rather deep change to the code and
is a bigger change than we'd like to introduce at this late stage in
the v0.9.0 release cycle.

Instead, we propose that for now all patterns must be rooted inside
the main module only. That is, either a pattern of the form
./xxx...yyy for some xxx and yyy or of the form
$module/xxx...yyy where $module is the name of the main module.
All other pattern matches would be rejected.

This leaves us open to applying the proper fix at a later date, and
hopefully covers the by-far-most-common use cases.

Possible regression

Note that this approach seems to imply a regression in existing supported behavior.
If a set of packages is present in cue.mod/{gen,usr,pkg} one might think that
we could use a pattern to package packages in those directories.
However, this turns out not to be the case. This functionality was never
actually implemented. The following testscript fails currently with modules disabled
and failed on previous versions too, so the above proposal should not result in
an actual regression.

external-package-patterns.txtar
exec cue vet foo.example/...

-- cue.mod/module.cue --
module: "github.com/foo/bar"
language: {
version: "v0.9.0"
}
-- cue.mod/pkg/foo.example/x/x.cue --
package x

x: int
-- p.cue --
package m

import "foo.example/x"

z: x.x & 123

cueckoo pushed a commit that referenced this issue May 24, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 24, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 26, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 27, 2024
DO NOT REVIEW

This change changes the cue/load logic to expand package
wildcards before invoking the `modpkgload.LoadPackages`
resolution logic. This involves moving the wildcard expansion
code into a new place, independent of the `cue/load.loader`
which is created after doing that. We use the `modimports.AllModuleFiles`
logic to enumerate the packages matched by a wildcard.

All tests pass apart from the ones involving `cue import`,
which expose a significant flaw in the above approach:
`modimports.AllModuleFiles` only looks for CUE files, but
`cue import` relies on the fact that package patterns will currently
match directories that do not contain any CUE files.

Fixes #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 27, 2024
This adds some test cases listed in issue #1138 to make some of the
package resolution semantics explicit in the tests.

 For #3155

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I935e91658b84f419e040c7d9ee845944e8b542e9
cueckoo pushed a commit that referenced this issue May 28, 2024
This test will fail if the package resolution logic scans all packages
in the module, as reported in https://cuelang.org/issue/3155.

We add the test now so we can see that it's fixed in a subsequent CL.

For #3155.

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I57d320cb0f4fdf656fdd5f5f41d1cf150fe6813d
cueckoo pushed a commit that referenced this issue May 28, 2024
This test will fail if the package resolution logic scans all packages
in the module, as reported in https://cuelang.org/issue/3155.

We add the test now so we can see that it's fixed in a subsequent CL.

For #3155.

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I57d320cb0f4fdf656fdd5f5f41d1cf150fe6813d
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195297
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Chief Cueckoo <chief.cueckoo@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
cueckoo pushed a commit that referenced this issue May 28, 2024
This change changes the cue/load logic to expand package wildcards
before invoking the `modpkgload.LoadPackages` resolution logic.
This involves moving the wildcard expansion code into a new place,
independent of the `cue/load.loader` which is created after doing that.
We use the `modimports.AllModuleFiles` logic to enumerate
the packages matched by a wildcard.

This then removes the need to enumerate all packages in the entire
module, because we know all the packages up front. This only works
because we do not allow wildcard matching in external packages
(something that is not currently supported anyway) which would require
integrating the matching logic directly into `modload` which is a
considerably larger refactor which we'll punt on for now.

It would be nice to remove the previous wildcard matching logic,
but unfortunately the current behavior relies on wildcards matching
directories with no CUE files, which doesn't fit well with the way
that the module file enumeration happens, so leave the old logic in
place which considerably reduces the number of visible behavior changes.

Fixes #3155.

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 28, 2024
This change changes the cue/load logic to expand package wildcards
before invoking the `modpkgload.LoadPackages` resolution logic.
This involves moving the wildcard expansion code into a new place,
independent of the `cue/load.loader` which is created after doing that.
We use the `modimports.AllModuleFiles` logic to enumerate
the packages matched by a wildcard.

This then removes the need to enumerate all packages in the entire
module, because we know all the packages up front. This only works
because we do not allow wildcard matching in external packages
(something that is not currently supported anyway) which would require
integrating the matching logic directly into `modload` which is a
considerably larger refactor which we'll punt on for now.

It would be nice to remove the previous wildcard matching logic,
but unfortunately the current behavior relies on wildcards matching
directories with no CUE files, which doesn't fit well with the way
that the module file enumeration happens, so leave the old logic in
place which considerably reduces the number of visible behavior changes.

Fixes #3155.

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 29, 2024
This reduces the number of syscalls we do when loading one or all
packages by about 10% and 2% respectively, going from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    108
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5365

to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
cueckoo pushed a commit that referenced this issue May 29, 2024
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases
where a local package being loaded was a regular file rather than a
directory, and we have had test cases for that scenario for some time.

Since the recent cue/load and modules refactors, this extra io/fs work
appears to no longer be necessary, as we just capture missing files,
which can be done with fs.ReadDir easily.

This reduces the number of syscalls for a single and all packages from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

by about 5% and 0.5% to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
cueckoo pushed a commit that referenced this issue May 29, 2024
This avoids repeated calls to os.Stat via fileSystem.isDir.
It reduces the number of syscalls from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

by about 3% and 20% respectively to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    87
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    4162

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
cueckoo pushed a commit that referenced this issue May 29, 2024
In particular, to test all the scenarios that we explicitly do not wish
to support right now, or whose potential behavior isn't obvious at all.

For #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Ic866bb078ca0f36dc2615606fc65b790a0c5c9a0
cueckoo pushed a commit that referenced this issue May 29, 2024
This change changes the cue/load logic to expand package wildcards
before invoking the `modpkgload.LoadPackages` resolution logic.
This involves adding new wildcard matching logic in a new place,
independent of the `cue/load.loader` which is created after doing that.
We use the `modimports.AllModuleFiles` logic to enumerate
the packages matched by a wildcard.

This then removes the need to enumerate all packages in the entire
module, because we know all the packages up front. This only works
because we do not allow wildcard matching in external packages
(something that is not currently supported anyway) which would require
integrating the matching logic directly into `modload` which is a
considerably larger refactor which we'll punt on for now.

It would be nice to remove the previous wildcard matching logic,
but unfortunately the current behavior relies on wildcards matching
directories with no CUE files, which doesn't fit well with the way
that the module file enumeration happens, so leave the old logic in
place which considerably reduces the number of visible behavior changes.

Fixes #3155.

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 29, 2024
This reduces the number of syscalls we do when loading one or all
packages by about 10% and 2% respectively, going from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    108
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5365

to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
cueckoo pushed a commit that referenced this issue May 29, 2024
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases
where a local package being loaded was a regular file rather than a
directory, and we have had test cases for that scenario for some time.

Since the recent cue/load and modules refactors, this extra io/fs work
appears to no longer be necessary, as we just capture missing files,
which can be done with fs.ReadDir easily.

This reduces the number of syscalls for a single and all packages from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

by about 5% and 0.5% to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
cueckoo pushed a commit that referenced this issue May 29, 2024
This avoids repeated calls to os.Stat via fileSystem.isDir.
It reduces the number of syscalls from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

by about 3% and 20% respectively to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    87
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    4162

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
cueckoo pushed a commit that referenced this issue May 29, 2024
This change changes the cue/load logic to expand package wildcards
before invoking the `modpkgload.LoadPackages` resolution logic.
This involves adding new wildcard matching logic in a new place,
independent of the `cue/load.loader` which is created after doing that.
We use the `modimports.AllModuleFiles` logic to enumerate
the packages matched by a wildcard.

This then removes the need to enumerate all packages in the entire
module, because we know all the packages up front. This only works
because we do not allow wildcard matching in external packages
(something that is not currently supported anyway) which would require
integrating the matching logic directly into `modload` which is a
considerably larger refactor which we'll punt on for now.

It would be nice to remove the previous wildcard matching logic,
but unfortunately the current behavior relies on wildcards matching
directories with no CUE files, which doesn't fit well with the way
that the module file enumeration happens, so leave the old logic in
place which considerably reduces the number of visible behavior changes.

Fixes #3155.

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 29, 2024
This reduces the number of syscalls we do when loading one or all
packages by about 10% and 2% respectively, going from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    108
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5365

to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
cueckoo pushed a commit that referenced this issue May 29, 2024
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases
where a local package being loaded was a regular file rather than a
directory, and we have had test cases for that scenario for some time.

Since the recent cue/load and modules refactors, this extra io/fs work
appears to no longer be necessary, as we just capture missing files,
which can be done with fs.ReadDir easily.

This reduces the number of syscalls for a single and all packages from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

by about 5% and 0.5% to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
cueckoo pushed a commit that referenced this issue May 29, 2024
This avoids repeated calls to os.Stat via fileSystem.isDir.
It reduces the number of syscalls from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

by about 3% and 20% respectively to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    87
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    4162

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
cueckoo pushed a commit that referenced this issue May 30, 2024
In particular, to test all the scenarios that we explicitly do not wish
to support right now, or whose potential behavior isn't obvious at all.

For #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Ic866bb078ca0f36dc2615606fc65b790a0c5c9a0
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195470
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Paul Jolly <paul@myitcv.io>
cueckoo pushed a commit that referenced this issue May 30, 2024
This change changes the cue/load logic to expand package wildcards
before invoking the `modpkgload.LoadPackages` resolution logic.
This involves adding new wildcard matching logic in a new place,
independent of the `cue/load.loader` which is created after doing that.
We use the `modimports.AllModuleFiles` logic to enumerate
the packages matched by a wildcard.

This then removes the need to enumerate all packages in the entire
module, because we know all the packages up front. This only works
because we do not allow wildcard matching in external packages
(something that is not currently supported anyway) which would require
integrating the matching logic directly into `modload`.
This is a considerably larger refactor which we'll punt on for now,
and is being tracked at https://cuelang.org/issue/3183.

It would be nice to remove the previous wildcard matching logic,
but unfortunately the current behavior relies on wildcards matching
directories with no CUE files, which doesn't fit well with the way
that the module file enumeration happens, so leave the old logic in
place which considerably reduces the number of visible behavior changes.

Fixes #3155.

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
cueckoo pushed a commit that referenced this issue May 30, 2024
This reduces the number of syscalls we do when loading one or all
packages by about 10% and 2% respectively, going from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    108
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5365

to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
cueckoo pushed a commit that referenced this issue May 30, 2024
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases
where a local package being loaded was a regular file rather than a
directory, and we have had test cases for that scenario for some time.

Since the recent cue/load and modules refactors, this extra io/fs work
appears to no longer be necessary, as we just capture missing files,
which can be done with fs.ReadDir easily.

This reduces the number of syscalls for a single and all packages from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

by about 5% and 0.5% to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
cueckoo pushed a commit that referenced this issue May 30, 2024
This avoids repeated calls to os.Stat via fileSystem.isDir.
It reduces the number of syscalls from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

by about 3% and 20% respectively to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    87
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    4162

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
cueckoo pushed a commit that referenced this issue May 30, 2024
This reduces the number of syscalls we do when loading one or all
packages by about 10% and 2% respectively, going from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    108
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5365

to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195412
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Paul Jolly <paul@myitcv.io>
cueckoo pushed a commit that referenced this issue May 30, 2024
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases
where a local package being loaded was a regular file rather than a
directory, and we have had test cases for that scenario for some time.

Since the recent cue/load and modules refactors, this extra io/fs work
appears to no longer be necessary, as we just capture missing files,
which can be done with fs.ReadDir easily.

This reduces the number of syscalls for a single and all packages from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    96
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5256

by about 5% and 0.5% to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195413
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Paul Jolly <paul@myitcv.io>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
cueckoo pushed a commit that referenced this issue May 30, 2024
This avoids repeated calls to os.Stat via fileSystem.isDir.
It reduces the number of syscalls from

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    90
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    5224

by about 3% and 20% respectively to

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l
    87
    $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l
    4162

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195415
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Paul Jolly <paul@myitcv.io>
@mvdan
Copy link
Member

mvdan commented May 30, 2024

@Bryant-Cockcroft-IBM we are done with our first round of fixes for this performance regression. The original number of syscalls for loading a single package (and its few dependencies) in our repo ballooned by about 6x with the experiment as of May 20th, as I explained above:

$ go install ./cmd/cue && CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue eval ./internal/ci/repo |& grep '/home/mvdan/src' | wc -l
81
$ go install ./cmd/cue && strace -f -t -e trace=file cue eval ./internal/ci/repo |& grep '/home/mvdan/src' | wc -l
517

With all the changes above, we are now down to a modest ~35% increase in syscalls:

$ go install ./cmd/cue && strace -f -t -e trace=file cue eval ./internal/ci/repo |& grep '/home/mvdan/src' | wc -l
112

This is still not perfect by any means, and we will continue to make improvements to the loading code to be less wasteful, but I think we're most of the way there in terms of fixing the performance regression.

Could you please try the latest master and let me know what the performance is like for your repository?

@Bryant-Cockcroft-IBM
Copy link
Author

Yes, it looks so much better! I'm now seeing a net performance increase from v0.8 too!

2.86s v0.8.0
0.61s CUE_EXPERIMENT=modules=0 v0.9.0-alpha.5
14.07s v0.9.0-alpha.5
7.76s Your change (from May 18th)
0.92s latest master on May 30th

@mvdan
Copy link
Member

mvdan commented May 30, 2024

Oh, excellent :) There were some other speed-ups for cmd/cue along the way. It looks like CUE_EXPERIMENT=modules is still giving you a minor slow-down, similar to what I showed above, but overall we're still in net benefit territory from the last stable release, and the performance still seems reasonable :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modules Issues related to CUE modules and the experimental implementation NeedsInvestigation
Projects
Archived in project
Status: v0.9.0-rc.1
Development

No branches or pull requests

4 participants