Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File filtering improvements #332

Merged
merged 1 commit into from
Sep 20, 2023
Merged

File filtering improvements #332

merged 1 commit into from
Sep 20, 2023

Conversation

anderseknert
Copy link
Member

  • Ignore .git and .idea directories
  • Use more efficient algorithm for filtering than the one provided by OPA

Fixes #330

@anderseknert
Copy link
Member Author

Heh, pretty much all of my findings from this experiment summarized here: https://benhoyt.com/writings/go-readdir/
It would have saved me some time finding that before I did this, but good to get some validation nonetheless.

Copy link
Member

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.idea makes we wonder if we should read .gitignore, or introduce .regalignore.

@anderseknert
Copy link
Member Author

Insipired by that blog, I rewrote the implementation using filepath.WalkDir. It performs about the same, but it feels more idiomatic than that recursive algo I took from OPA. It moves a little more responsibility over to the filter function, but the upside is that it also provides a whole lot more control 😃 The pattern they use for the callback function here, which now returns an error rather than a bool is quite nice — by returning a filepath.SkipDir "error", the user can stop the walk from traversing a directory, and return filepath.SkipAll to stop the traversal entirely.

@srenatus please take another look and let me know what you think :)

Using .gitignore is a good idea! I'll create an issue for that 👍

// to work with FileInfo objects, but will rather work directly with the files as
// string paths. Finally, this function differs from OPA in that we never will
// traverse down into .git or .idea directories, as that's always a waste of time.
func walkPaths(paths []string, filter DirEntryFilter) []error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] It doesn't matter, but golang can join errors now: https://pkg.go.dev/errors#Join

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's interesting! Perhaps we could skip that whole errorsToString function entirely then? I'll play around with it.

Comment on lines 160 to 157
buf := make([]string, len(e))
for i := range buf {
buf[i] = e[i].Error()
}

return fmt.Sprintf("%v errors occurred during loading:\n", len(e)) + strings.Join(buf, "\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Inconsequential, but I've played with it so I might as well post the comment 🙃

Suggested change
buf := make([]string, len(e))
for i := range buf {
buf[i] = e[i].Error()
}
return fmt.Sprintf("%v errors occurred during loading:\n", len(e)) + strings.Join(buf, "\n")
buf := make([]string, len(e)+1)
buf[0] = fmt.Sprintf("%v errors occurred during loading:", len(e))
for i := range buf {
buf[i+1] = e[i].Error()
}
return strings.Join(buf, "\n")

)

func FilterIgnoredPaths(paths, ignore []string, checkFileExists bool) ([]string, error) {
policyPaths := paths
type DirEntryFilter = func(path string, info os.DirEntry, err error) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] DirEntryFilter doesn't need to be exported. I wonder if it even warrants its own type, but it won't hurt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, not at this point. I guess we really should expose something like lint.WithFilter(...).Lint(ctx) or else users of the Go API wouldn't be able to tell us what they consider to be Rego files. Not important right now, so I'll un-export this with a note.

* Ignore .git and .idea directories
* Use more efficient algorithm for filtering than
  the one provided by OPA

Fixes #330

Signed-off-by: Anders Eknert <anders@styra.com>
return filterPaths(paths, ignore)
}

func walkPaths(paths []string, filter func(path string, info os.DirEntry, err error) error) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2023-09-20 at 09 42 21

LOL, errlang strikes again! But you're right, that type was probably superfluous.

@anderseknert anderseknert merged commit 82ebbe6 into main Sep 20, 2023
1 check passed
@anderseknert anderseknert deleted the efficient-filter branch September 20, 2023 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Long time spent filtering .git when linting a project root directory
2 participants