Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchexec takes 8s to start, does 2M syscalls across 148 threads #377

Open
kentonv opened this issue Aug 9, 2022 · 15 comments
Open

watchexec takes 8s to start, does 2M syscalls across 148 threads #377

kentonv opened this issue Aug 9, 2022 · 15 comments
Labels
bug Something's not right!

Comments

@kentonv
Copy link

kentonv commented Aug 9, 2022

I'm using watchexec 1.20.5 on Linux. It works great! Except for one thing...

I have a directory structure like this:

src/   # my code
deps/  # literally like 200,000 files

When I run watchexec I do it like this:

watchexec --watch src make

watchexec spends about 8 seconds scanning through the entirety of deps. The -vvvv log, which is 528 megabytes long, suggests it is looking for "ignore files". The various --no-*-ignore options don't seem to change anything.

During this time, according to strace, watchexec apparently creates 148 threads which perform 2,458,067 system calls, of which 2,337,801 are futex operations. It also stat()s 44,270 files and opens 14,634 of them. After all that, it sets up 26 inotify watches.

Could watchexec be made to scan only the directory I asked it to watch?

@kentonv kentonv added the bug Something's not right! label Aug 9, 2022
@passcod
Copy link
Member

passcod commented Aug 10, 2022

That's Known Issue Number 2! You can work around for now with --project-origin src, or by starting watchexec in src and setting the cwd of the command with --workdir ...

@kentonv
Copy link
Author

kentonv commented Aug 10, 2022

Oops, sorry I missed that! Thanks for the tip, --project-origin does seem to solve the problem.

The excessive context switching via futex operations might be worth looking into as an optimization opportunity -- it seems to me that even without behavioral changes, some time could probably be saved there. But feel free to close this if you prefer.

@passcod
Copy link
Member

passcod commented Aug 10, 2022

That's true, yes, we could likely get much better performance by optimising where file I/O happens. I'll have a specific look at this.

@AaronFriel
Copy link

AaronFriel commented Sep 5, 2022

@passcod is .gitignore parsing currently a depth-first search, when it should be breadth-first starting at the root for project origin detection, and then recursively applying ignore files in descent?

Ignore files only apply to their subtrees, yet after installing an NPM package, I see traces like the following:

  2022-09-05T20:14:42.704608Z TRACE watchexec::ignore::files: found a file, path: "/home/friel/c/github.com/pulumi/pulumi/.gitignore"
  2022-09-05T20:14:42.704642Z TRACE watchexec::ignore::filter: reading ignore file, file: IgnoreFile { path: "/home/friel/c/github.com/pulumi/pulumi/.gitignore", applies_in: Some("/home/friel/c/github.com/pulumi/pulumi"), applies_to: Some(Git) }
  2022-09-05T20:14:42.704761Z TRACE watchexec::ignore::filter: adding ignore line, line: "**/node_modules/"
...
    in watchexec::ignore::files::visit_path with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__"
    in watchexec::ignore::filter::check_dir with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__/data"
    in watchexec::ignore::files::dir_entry with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__/data"
    in watchexec::ignore::files::visit_path with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__"

If ignore files follow the man gitignore spec, then there's no point in crawling any folder that matches a previously-discovered filter.

An optional prefix "!" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined.

This seems like it would be possible for me to make a contribution here. The product I work on, Pulumi, recently shipped a a change to embed watchexec to support pulumi watch functionality. See: https://github.com/pulumi/watchutil-rs

@passcod
Copy link
Member

passcod commented Sep 5, 2022

Yes, that's absolutely the case (well, kind of, it's complicated, there's actually two layers of figuring out ignore files and that's where the issues arise; happy to chat if you want to know more). I'd love a contribution! The work would be in the ignore-files crate and filterer, which get used in the relevant places in watchexec.

@matu3ba
Copy link

matu3ba commented Sep 14, 2022

I have a similar issue with fd -tf . 'src/' | wc -l returning 75.
However, jabbascript (angular in my case) has a giantic filesize.

I did use watchexec -w src/ 'ng build && stuff_to_install'.

@Shados
Copy link

Shados commented Nov 21, 2022

Aside from any other possible optimisations, shouldn't it be fairly trivial to make watchexec skip looking for ignore files if all of --no-default-ignore --no-global-ignore --no-project-ignore --no-vcs-ignore are set?

@passcod
Copy link
Member

passcod commented Nov 21, 2022

Yes. That would be a conditional around here, if you want to PR this: https://github.com/watchexec/watchexec/blob/main/crates/cli/src/filterer/globset.rs#L23

@pesterhazy
Copy link

I also ran into this (very significant) problem

watchexec takes 8s to start, but when using --project-origin $PWD, it's down to 1s

@infogulch
Copy link

It seems "scanning and watching the entire filesystem for files that could slightly modify the behavior of the program" is more anti-feature than feature for a small utility like this.

@passcod passcod added this to the Ignore handling issues milestone Aug 3, 2023
@passcod
Copy link
Member

passcod commented Aug 30, 2023

While not fully fixed, 1.23 is more clever around this, and you can use --no-discover-ignore to disable looking for ignore files altogether.

@pesterhazy
Copy link

While a more thorough fix would be great, --no-discover-ignore works well for us in the meantime as a workaround

@t3hmrman
Copy link
Contributor

t3hmrman commented Jan 9, 2024

Hey @kentonv a fix for this issue made it into watchexec v1.25.0+, would you mind trying the newer version and see if your problem is fixed?

@passcod
Copy link
Member

passcod commented Jan 9, 2024

I've measured a bit myself and it's not really fixed, though your improvements did help.

However, one larger issue than the ignores, which I hadn't twigged on until I started measuring in earnest, is the notify watches, which are recursed by notify and not by us.

For example looking at watchexec with RUST_LOG=notify=trace on Linux shows notify adding watches for the entire .git folder recursively, among other things; fixing that involves writing the watch recursion into watchexec and using non-recursive watches at the notify boundary; I've written up surface notes on the complexities therein: https://cohost.org/watchexec/post/4070897-early-feature-concep

It turns out that this particular issue here is an onion of causes adding up to a large negative effect, rather than a single buggy layer to be fixed!

@t3hmrman
Copy link
Contributor

t3hmrman commented Jan 9, 2024

Ahh thanks for checking this -- sad that this issue isn't fixed completely, but thanks for linking to the explanation/your thoughts on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's not right!
Projects
None yet
Development

No branches or pull requests

8 participants