Skip to content

Commit

Permalink
internal/mod/modimports: avoid extra fs.Stat calls in AllModuleFiles
Browse files Browse the repository at this point in the history
As reported by a user on a large repository with many hundreds
of directories, CUE_EXPERIMENT=modules now being the default
caused a sudden increase in the number of file operations being done
by cmd/cue when loading packages, causing a slow-down.

We can easily reproduce the issue via strace in our CUE repository,
using `cue fmt ./internal/ci` as an example to load one CUE package.
Where we used to only stat or open 8 cue.mod files, we now do 360:

    $ CUE_EXPERIMENT=modules=0 strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    8
    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    360

The culprit turned out to be AllModuleFiles; the way it needs to skip
walking nested CUE modules did not fit well with the fs.WalkDir API.
Using fs.ReadDir and recursive func calls instead works better:

    $ strace -f -t -e trace=file cue fmt ./internal/ci |& grep 'cue\.mod"' | wc -l
    8

Updates #3155.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I32dc0c39ea795f795077feff88475d38cb324433
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1194921
Reviewed-by: Paul Jolly <paul@myitcv.io>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
  • Loading branch information
mvdan committed May 18, 2024
1 parent 51cd5ca commit 97bf1a6
Showing 1 changed file with 39 additions and 38 deletions.
77 changes: 39 additions & 38 deletions internal/mod/modimports/modimports.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package modimports

import (
"errors"
"fmt"
"io/fs"
"path"
Expand Down Expand Up @@ -88,47 +87,49 @@ func PackageFiles(fsys fs.FS, dir string, pkgQualifier string) func(func(ModuleF
// module at the given root.
func AllModuleFiles(fsys fs.FS, root string) func(func(ModuleFile, error) bool) {
return func(yield func(ModuleFile, error) bool) {
fs.WalkDir(fsys, root, func(fpath string, d fs.DirEntry, err error) (_err error) {
if err != nil {
if !yield(ModuleFile{
FilePath: fpath,
}, err) {
return fs.SkipAll
}
return nil
}
if path.Base(fpath) == "cue.mod" {
return fs.SkipDir
yieldAllModFiles(fsys, root, true, yield)
}
}

// yieldAllModFiles implements AllModuleFiles by recursing into directories.
//
// Note that we avoid [fs.WalkDir]; it yields directory entries in lexical order,
// so we would walk `foo/bar.cue` before walking `foo/cue.mod/` and realizing
// that `foo/` is a nested module that we should be ignoring entirely.
// That could be avoided via extra `fs.Stat` calls, but those are extra fs calls.
// Using [fs.ReadDir] avoids this issue entirely, as we can loop twice.
func yieldAllModFiles(fsys fs.FS, fpath string, topDir bool, yield func(ModuleFile, error) bool) bool {
entries, err := fs.ReadDir(fsys, fpath)
if err != nil {
if !yield(ModuleFile{
FilePath: fpath,
}, err) {
return false
}
}
// Skip nested submodules entirely.
if !topDir {
for _, entry := range entries {
if entry.Name() == "cue.mod" {
return true
}
if d.IsDir() {
if fpath == root {
return nil
}
base := path.Base(fpath)
if strings.HasPrefix(base, ".") || strings.HasPrefix(base, "_") {
return fs.SkipDir
}
_, err := fs.Stat(fsys, path.Join(fpath, "cue.mod"))
if err == nil {
// TODO is it enough to have a cue.mod directory
// or should we look for cue.mod/module.cue too?
return fs.SkipDir
}
if !errors.Is(err, fs.ErrNotExist) {
// We haven't got a package file to produce with the
// error here. Should we just ignore the error or produce
// a ModuleFile with an empty path?
yield(ModuleFile{}, err)
return fs.SkipAll
}
return nil
}
}
for _, entry := range entries {
name := entry.Name()
fpath := path.Join(fpath, name)
if entry.IsDir() {
if name == "cue.mod" || strings.HasPrefix(name, ".") || strings.HasPrefix(name, "_") {
continue
}
if !yieldPackageFile(fsys, fpath, "*", yield) {
return fs.SkipAll
if !yieldAllModFiles(fsys, fpath, false, yield) {
return false
}
return nil
})
} else if !yieldPackageFile(fsys, fpath, "*", yield) {
return false
}
}
return true
}

func yieldPackageFile(fsys fs.FS, fpath, pkgQualifier string, yield func(ModuleFile, error) bool) bool {
Expand Down

0 comments on commit 97bf1a6

Please sign in to comment.