-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/cue: unnecessary scan of every directory when loading any package #3155
Comments
Thanks for the report. I can reproduce in our repository as well:
Note how the same command without the experiment tries to stat eight cue.mod directories (althoug that is already somewhat wasteful), and the new experiment default statst as many as 360 cue.mod directories - most of which are missing. |
I also don't think this is limited to cue.mod files; when the new modules mode is enabled, we also open (and read and parse) pretty much every CUE file in the entire module, even when they aren't related to a package pattern like
So I think this is a larger problem with the new |
I believe https://review.gerrithub.io/c/cue-lang/cue/+/1194921 should fix the issue with |
As reported by a user on a large repository with many hundreds of directories, CUE_EXPERIMENT=modules now being the default caused a sudden increase in the number of file operations being done by cmd/cue when loading packages, causing a slow-down. We can easily reproduce the issue via strace in our CUE repository, using `cue fmt ./internal/ci` as an example to load one CUE package. Where we used to only stat or open 8 cue.mod files, we now do 360: $ CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l 8 $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l 360 The culprit turned out to be AllModuleFiles; the way it needs to skip walking nested CUE modules did not fit well with the fs.WalkDir API. Using fs.ReadDir and recursive func calls instead works better: $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l 8 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I32dc0c39ea795f795077feff88475d38cb324433
The number of syscalls across our repo is nearly halved, from about 920 to 540:
Most of the remaining syscalls relate to us walking the entire module directory tree to list all package imports in CUE files upfront, which is technically not necessary in most cases. |
As reported by a user on a large repository with many hundreds of directories, CUE_EXPERIMENT=modules now being the default caused a sudden increase in the number of file operations being done by cmd/cue when loading packages, causing a slow-down. We can easily reproduce the issue via strace in our CUE repository, using `cue fmt ./internal/ci` as an example to load one CUE package. Where we used to only stat or open 8 cue.mod files, we now do 360: $ CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l 8 $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l 360 The culprit turned out to be AllModuleFiles; the way it needs to skip walking nested CUE modules did not fit well with the fs.WalkDir API. Using fs.ReadDir and recursive func calls instead works better: $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l 8 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I32dc0c39ea795f795077feff88475d38cb324433 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1194921 Reviewed-by: Paul Jolly <paul@myitcv.io> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
I do see that about half the elapsed time with that change, but still significantly slower than 0.61s CUE_EXPERIMENT=modules=0
I'm not sure if this is a general problem, or if we're just doing something weird here with our directory structure. But we have a ton of directories that are unrelated to cue that look as if they're in the module. Taking some guesses at the strace with your change. About 11s total (some overhead from strace itself). 9s of that is spent searching our unrelated directories. When it switches to actually looking at the imports for the files it found, that only takes 1-2s. For the record, if we decide my use-case is too odd, I think I can workaround this by specifying the files directly instead of the package (because with the choices we made, restructuring might not be an option) |
Thanks @Bryant-Cockcroft-IBM for your update, that halving of wall time makes sense to me - if you have a repository with many directories that don't contain CUE files, the new modules mode used to do two syscalls per directory (one to read its contents, another to check if Traversing the entire CUE module directory tree is something that has to happen in some form for commands like @rogpeppe is looking into a fix for this for the upcoming release, as we are treating this as a significant enough regression - it will affect any CUE user with a large enough directory structure. It just won't be a trivial fix so it might take us a couple of days to have something ready. |
Thanks that all sounds good to me! |
In order to fix this issue, we need to evaluate exactly the packages Package pattern matchingThe cue command supports package patterns of the form In order to do a complete job, we would need to list an arbitrary Instead of enumerating all possible match candidates, we can use the However, that approach introduces a potential feedback loop: because A way forwardThe above consideration implies a rather deep change to the code and Instead, we propose that for now all patterns must be rooted inside This leaves us open to applying the proper fix at a later date, and Possible regressionNote that this approach seems to imply a regression in existing supported behavior. external-package-patterns.txtarexec cue vet foo.example/... |
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
DO NOT REVIEW This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. All tests pass apart from the ones involving `cue import`, which expose a significant flaw in the above approach: `modimports.AllModuleFiles` only looks for CUE files, but `cue import` relies on the fact that package patterns will currently match directories that do not contain any CUE files. Fixes #3155 Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
This test will fail if the package resolution logic scans all packages in the module, as reported in https://cuelang.org/issue/3155. We add the test now so we can see that it's fixed in a subsequent CL. For #3155. Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I57d320cb0f4fdf656fdd5f5f41d1cf150fe6813d
This test will fail if the package resolution logic scans all packages in the module, as reported in https://cuelang.org/issue/3155. We add the test now so we can see that it's fixed in a subsequent CL. For #3155. Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I57d320cb0f4fdf656fdd5f5f41d1cf150fe6813d Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195297 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Chief Cueckoo <chief.cueckoo@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. This then removes the need to enumerate all packages in the entire module, because we know all the packages up front. This only works because we do not allow wildcard matching in external packages (something that is not currently supported anyway) which would require integrating the matching logic directly into `modload` which is a considerably larger refactor which we'll punt on for now. It would be nice to remove the previous wildcard matching logic, but unfortunately the current behavior relies on wildcards matching directories with no CUE files, which doesn't fit well with the way that the module file enumeration happens, so leave the old logic in place which considerably reduces the number of visible behavior changes. Fixes #3155. Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves moving the wildcard expansion code into a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. This then removes the need to enumerate all packages in the entire module, because we know all the packages up front. This only works because we do not allow wildcard matching in external packages (something that is not currently supported anyway) which would require integrating the matching logic directly into `modload` which is a considerably larger refactor which we'll punt on for now. It would be nice to remove the previous wildcard matching logic, but unfortunately the current behavior relies on wildcards matching directories with no CUE files, which doesn't fit well with the way that the module file enumeration happens, so leave the old logic in place which considerably reduces the number of visible behavior changes. Fixes #3155. Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
This reduces the number of syscalls we do when loading one or all packages by about 10% and 2% respectively, going from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 108 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5365 to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases where a local package being loaded was a regular file rather than a directory, and we have had test cases for that scenario for some time. Since the recent cue/load and modules refactors, this extra io/fs work appears to no longer be necessary, as we just capture missing files, which can be done with fs.ReadDir easily. This reduces the number of syscalls for a single and all packages from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 by about 5% and 0.5% to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
This avoids repeated calls to os.Stat via fileSystem.isDir. It reduces the number of syscalls from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 by about 3% and 20% respectively to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 87 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 4162 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
In particular, to test all the scenarios that we explicitly do not wish to support right now, or whose potential behavior isn't obvious at all. For #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Ic866bb078ca0f36dc2615606fc65b790a0c5c9a0
This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves adding new wildcard matching logic in a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. This then removes the need to enumerate all packages in the entire module, because we know all the packages up front. This only works because we do not allow wildcard matching in external packages (something that is not currently supported anyway) which would require integrating the matching logic directly into `modload` which is a considerably larger refactor which we'll punt on for now. It would be nice to remove the previous wildcard matching logic, but unfortunately the current behavior relies on wildcards matching directories with no CUE files, which doesn't fit well with the way that the module file enumeration happens, so leave the old logic in place which considerably reduces the number of visible behavior changes. Fixes #3155. Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
This reduces the number of syscalls we do when loading one or all packages by about 10% and 2% respectively, going from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 108 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5365 to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases where a local package being loaded was a regular file rather than a directory, and we have had test cases for that scenario for some time. Since the recent cue/load and modules refactors, this extra io/fs work appears to no longer be necessary, as we just capture missing files, which can be done with fs.ReadDir easily. This reduces the number of syscalls for a single and all packages from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 by about 5% and 0.5% to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
This avoids repeated calls to os.Stat via fileSystem.isDir. It reduces the number of syscalls from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 by about 3% and 20% respectively to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 87 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 4162 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves adding new wildcard matching logic in a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. This then removes the need to enumerate all packages in the entire module, because we know all the packages up front. This only works because we do not allow wildcard matching in external packages (something that is not currently supported anyway) which would require integrating the matching logic directly into `modload` which is a considerably larger refactor which we'll punt on for now. It would be nice to remove the previous wildcard matching logic, but unfortunately the current behavior relies on wildcards matching directories with no CUE files, which doesn't fit well with the way that the module file enumeration happens, so leave the old logic in place which considerably reduces the number of visible behavior changes. Fixes #3155. Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
This reduces the number of syscalls we do when loading one or all packages by about 10% and 2% respectively, going from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 108 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5365 to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases where a local package being loaded was a regular file rather than a directory, and we have had test cases for that scenario for some time. Since the recent cue/load and modules refactors, this extra io/fs work appears to no longer be necessary, as we just capture missing files, which can be done with fs.ReadDir easily. This reduces the number of syscalls for a single and all packages from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 by about 5% and 0.5% to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
This avoids repeated calls to os.Stat via fileSystem.isDir. It reduces the number of syscalls from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 by about 3% and 20% respectively to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 87 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 4162 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
In particular, to test all the scenarios that we explicitly do not wish to support right now, or whose potential behavior isn't obvious at all. For #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Ic866bb078ca0f36dc2615606fc65b790a0c5c9a0 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195470 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Paul Jolly <paul@myitcv.io>
This change changes the cue/load logic to expand package wildcards before invoking the `modpkgload.LoadPackages` resolution logic. This involves adding new wildcard matching logic in a new place, independent of the `cue/load.loader` which is created after doing that. We use the `modimports.AllModuleFiles` logic to enumerate the packages matched by a wildcard. This then removes the need to enumerate all packages in the entire module, because we know all the packages up front. This only works because we do not allow wildcard matching in external packages (something that is not currently supported anyway) which would require integrating the matching logic directly into `modload`. This is a considerably larger refactor which we'll punt on for now, and is being tracked at https://cuelang.org/issue/3183. It would be nice to remove the previous wildcard matching logic, but unfortunately the current behavior relies on wildcards matching directories with no CUE files, which doesn't fit well with the way that the module file enumeration happens, so leave the old logic in place which considerably reduces the number of visible behavior changes. Fixes #3155. Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Change-Id: I9f4b210eb0588e21ae942a97d7cf34c9887363dc
This reduces the number of syscalls we do when loading one or all packages by about 10% and 2% respectively, going from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 108 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5365 to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases where a local package being loaded was a regular file rather than a directory, and we have had test cases for that scenario for some time. Since the recent cue/load and modules refactors, this extra io/fs work appears to no longer be necessary, as we just capture missing files, which can be done with fs.ReadDir easily. This reduces the number of syscalls for a single and all packages from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 by about 5% and 0.5% to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9
This avoids repeated calls to os.Stat via fileSystem.isDir. It reduces the number of syscalls from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 by about 3% and 20% respectively to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 87 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 4162 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e
This reduces the number of syscalls we do when loading one or all packages by about 10% and 2% respectively, going from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 108 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5365 to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I8a5bbe33b0d9656805a20a082be90faedd679122 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195412 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Paul Jolly <paul@myitcv.io>
Calling fs.Stat before fs.ReadDir used to be crucial to catch cases where a local package being loaded was a regular file rather than a directory, and we have had test cases for that scenario for some time. Since the recent cue/load and modules refactors, this extra io/fs work appears to no longer be necessary, as we just capture missing files, which can be done with fs.ReadDir easily. This reduces the number of syscalls for a single and all packages from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 96 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5256 by about 5% and 0.5% to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Id706a0ef83e7bf29869c8a99faeb7ae5db1537e9 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195413 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Paul Jolly <paul@myitcv.io> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
This avoids repeated calls to os.Stat via fileSystem.isDir. It reduces the number of syscalls from $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 90 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 5224 by about 3% and 20% respectively to $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep '/home/mvdan/src' | wc -l 87 $ strace -f -t -e trace=file cue fmt ./... |& grep '/home/mvdan/src' | wc -l 4162 Updates #3155. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I51e09a0adf233788c5c92c62c20a743b7873e73e Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195415 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Paul Jolly <paul@myitcv.io>
@Bryant-Cockcroft-IBM we are done with our first round of fixes for this performance regression. The original number of syscalls for loading a single package (and its few dependencies) in our repo ballooned by about 6x with the experiment as of May 20th, as I explained above:
With all the changes above, we are now down to a modest ~35% increase in syscalls:
This is still not perfect by any means, and we will continue to make improvements to the loading code to be less wasteful, but I think we're most of the way there in terms of fixing the performance regression. Could you please try the latest master and let me know what the performance is like for your repository? |
Yes, it looks so much better! I'm now seeing a net performance increase from v0.8 too! 2.86s v0.8.0 |
Oh, excellent :) There were some other speed-ups for cmd/cue along the way. It looks like |
What version of CUE are you using (
cue version
)?Does this issue reproduce with the latest stable release?
No, only on pre-release. Also happens with
CUE_EXPERIMENT=modules
on earlier v0.9.0 alpha versions. Does not reproduce with the new modules disabled.What did you do?
cue eval --out json example.com/pkg/foo
I have a large project where cue is only a small component. So the cue module root contains many directories and sub-directories un-related to the cue module. But these directories are separate from the foo package above, so I'd expect them to not effect this evaluation.
What did you expect to see?
I expected similar evaluation time to doing
cue eval --out json test.cue
Where test.cue just imports the exact same foo package. Which takes less than 1 second in this case.I expect cue commandline to only look a the
foo
directory and directories leading to the module root, and nothing else, as described in the package instances https://cuelang.org/docs/concept/modules-packages-instances/#instancesWhat did you see instead?
The command-line version recursively walks every directory starting at the module root looking for
cue.mod
. Since my project has 100s if not 1000s of directories in total, this now takes ~14 seconds. (note: it does find one in the actual root, but does not stop until it's searched every directory)used
strace -f -t -e trace=file cue eval --out json example.com/pkg/foo
to determine this behavior.I'm unable to share my actual data, but I can probably create reproducer if necessary.
The text was updated successfully, but these errors were encountered: