Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient read failure may cause permanent failure to watch file #9072

Closed
willdean opened this issue Aug 10, 2022 · 38 comments · Fixed by #9173
Closed

Transient read failure may cause permanent failure to watch file #9072

willdean opened this issue Aug 10, 2022 · 38 comments · Fixed by #9173
Assignees

Comments

@willdean
Copy link

V3.1.18
Tailwind CLI, Visual Studio 2022 as editor.
Node v14.18.1
Chrome
Window 10 (No Docker or WSL)

This issue is a very close relative of #7759, but I think it might be quite specialised so I don't want to pollute that one.

I have a rather odd web project, which has a single HTML file. So the content section of tailwind.config.js looks like this:

  content: [
      "./wwwroot/index.html"
  ],

I run a tailwind CLI watcher with a command like this:

npx tailwindcss --output wwwroot/css/tailwind.css --watch -i css/twbase.css

I am suffering from the problem described in #7759 where the watcher stops watching after a while - anywhere between almost immediately and after lots and lots of successful rebuilds.

By using the "Process Monitor" utility to observe the filesystem activity on the index.html file, I have observed that when Visual Studio (devenv.exe) is saving index.html, it goes through a rename/replace process which means there is a window of time in which the index.html file does not actually exist. It appears that node.exe (i.e. Tailwind) sometimes tries to read the file during this window and finds it to be missing.

This apparently causes Tailwind to stop watching that file.

image

The blue selected line in the image is Node getting a file-not-found failure doing a CreateFile (it's actually an open-to-read) on index.html in the middle of devenv doing a rename dance. I think that's a smoking gun.

If I change the content file specification to be a wildcard, then my first impression is that things are much more robust:

  content: [
      "./wwwroot/*.html"
  ],

So:

  1. Is it possible/likely that TW is converting a transient read-failure on a watched file into a permanent failure to watch that file?
  2. If so, could this be fixed?

Even without the specific Visual Studio rename thing, transient read failures are probably an inescapable feature of Windows, because it's so common that people have tools like anti-virus and cloud-backup and indexers which do lots of automatic file opening.

@thecrypticace thecrypticace self-assigned this Aug 10, 2022
@thecrypticace
Copy link
Contributor

@willdean Could you provide some details on the specs of your system? Like are you running off of a hard drive instead of an SSD? I'm trying but I can't get this to fail and I'm betting it's a super weird timing thing.

@thecrypticace
Copy link
Contributor

@willdean Okay I did a bit more testing. I can reproduce something similar-ish to this when using Visual Studio. But other processes writing files in a similar manner do not cause issues. I'm not sure why this is but it seems specific to Visual Studio and how it's writing files. Doing a write + atomic rename does not appear to cause problems generally.

@willdean
Copy link
Author

Thanks for looking at this - I'm on a high-end SSD dev machine, but that doesn't mean that there aren't weird timing problems.

My own naïve approach to this would be to massively boost-up the logging in Tailwind, so that we could actually see A. what change events it's getting (that might mess-up the timing) and B. see when it suffers failures to open files, because it seems possible to me that one of those fs.readFileSync calls in, say, getChangedContent is failing in this situation, but I'm unclear where the exception from that ends-up - I'm not really clear what the call stack into those watcher event handlers looks like.

Like I said elsewhere, I know zip about Node, and I'm not even clear how I would modify the source (which I have) and run a modified version of the cli, but I'll have a play.

Is there some simple node ./src/cli.js sort of thing I could be running, or is there a build process I'd need to go through?

@thecrypticace
Copy link
Contributor

yeah you have to run a build process generally but, if you want, you can directly edit node_modules/tailwindcss/lib/cli.js — though it may be a bit weird to do so but yeah that's exactly what I'm gonna do. I suspect that we're probably not getting notified at all after a point but I'm not certain yet. And thanks for the details.

@willdean
Copy link
Author

willdean commented Aug 11, 2022

I've just had a bit of a play with Chokidar by itself, to see what events it's seeing when VS edits that same index.html file.

I ran a minimal example of this code:

const chokidar = require('chokidar');
chokidar.watch('n:/xxx/wwwroot/index.html').on('all', (event, path) => {
  console.log(event, path);
});

This reports a quite interesting sequence of events:
Upon starting the Chokidar test app:

add n:\xxx\wwwroot\index.html

The first time I save the file in VS:

unlink n:\xxx\wwwroot\index.html

The second time I save the file in VS:

change n:\xxx\wwwroot\index.html
add n:\xxx\wwwroot\index.html

All subsequence saves in VS:

change n:\xxx\wwwroot\index.html

Next I will try reading the file in the event handler, to see what happens.

@thecrypticace
Copy link
Contributor

thecrypticace commented Aug 11, 2022

Yeah I just poked around and chokidar is not firing change events when saved by Visual Studio. It is however firing a "raw" event with details which I can see:

IMG_3395

It should be firing a change event but I'm not sure why it's not. I'd imagine this is a chokidar bug but maybe we can do something to work around it. I'll keep poking around.

@willdean
Copy link
Author

Looking through the issues on Chokidar's GH, it may just not work very well on Windows - apparently VS Code tried to use it and gave up for Windows and did their own thing.

@thecrypticace
Copy link
Contributor

So it appears that this fixes it for the CLI:

watcher.on("unlink", (file) => watcher.add(file))

The rename operation firing an unlink causes it to be removed from the list of watched files.

@thecrypticace
Copy link
Contributor

the problem is that, for renames, it's supposed to fire an "add" event internally which it is not doing for some reason. Chokidar uses this unlink / add dance with a timer to detect renames and it doesn't appear to work here for some reason.

@willdean
Copy link
Author

Unfortunately your fix doesn't work for me - I added it (with some logging), and I get an unlink event for every save, but nothing else.

I have some other lines of enquiry, because my little tester is showing a different series of events to those that I get in TW when I remove all of TW's event handlers and just log Chokidar events. This may be something about Chokidar configuration.

@thecrypticace
Copy link
Contributor

What version of chokidar are you testing?

@willdean
Copy link
Author

3.5.3. I think that's the same as TW's using.
I think the differences might be to do with the write-event-coalescence stuff, but I'm still seeing my simple three line test keep working after unlink events without needing to re-add the file.
The last difference is the exact list of files passed in, so I'll look at that.

@thecrypticace
Copy link
Contributor

@willdean Can you clone this repo: https://github.com/thecrypticace/tw-cli-read-failure-repo

  1. Run node .\watch.mjs
  2. Open the folder and repeatedly save src/index.html in Visual Studio

And paste the output here? I see a few changes and unlinks for the first couple of saves (depends on how much and how quickly you save) and then they stop being delivered. I also left the "fix" that I mentioned commented. Could you run it with it not commented too and post those as well? Really appreciate your time here — I know this is a lot :)

This is what I see for both of these cases:

Windows.PowerShell.2022-08-11.18-12-30.mp4

@willdean
Copy link
Author

@thecrypticace

The blank lines represent each separate ctrl-s in VS.
This is pretty much what I see running a similar trial with logging in TW: the raw change events are sometimes translated into actual change events, but frequently not.

I spent a lot of time in the debugger last night trying to work out where and why the conversion of raw into "cooked" events is supressed, but it got very late and I gave up. I might have another go at this.

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }
Event [change] { file: 'src\\index.html' }

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }
Event [unlink] { file: 'src\\index.html' }

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }
Event [change] { file: 'src\\index.html' }

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }
Event [change] { file: 'src\\index.html' }

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }

Event [raw] { event: 'rename' }
Event [raw] { event: 'rename' }
Event [raw] { event: 'change' }
Event [change] { file: 'src\\index.html' }

Really appreciate your time here — I know this is a lot :)

Not at all - I've maintained OSS too, so I know how much one needs the people with the unusual repro scenarios to stick around for the debugging!

I don't see this problem nearly so much on a much larger project I also use TW on, but after a while you get all the selectors you're likely to use already compiled-in, so flaky rebuilds are less noticeable - my guess would be that this issue might be quite prevalent, at least on Windows.

@thecrypticace
Copy link
Contributor

Wow that's wild :/ — mine consistently fails after the first two or three and never comes back no matter what I do (unless I re-add the file). The other thing that's super weird is my raw event is always 100% rename and never change like from your list when using visual studio (not VS code — I get change when saving files in node I think). That could very well be part of the issue.

Unfortunately my windows rig is water cooled and the pump just stopped working last night so it's out of commission for like a week until my new one comes in (though I might try to take it apart and clean it if I can but that will not be today). But I'll keep investigating as much as I can. And I might throw the unlink -> re-add fix in there (not quite as simple as always doing that I don't think but I'll have to see). Who knows maybe it'll help even if it's not 100% reliable.

@willdean
Copy link
Author

It's bizarrely difficult to work out what's going on. I would have said the "awaitWrite" stuff might be a cause if the inconsistency, but removing it from the config makes no difference. Changing from "index.html" to "*.html" in the search patterns does make a big difference.

Nothing about the Chokidar project gives me much encouragement that a fix will be imminent, so I think trying to fix (work around) it from this side is worthwhile.

@willdean
Copy link
Author

I think it's a race in _throttle in chokadar's index.js. I've just put a bunch of console logging right through that function and it's now raising change events every time.

We can have a race to understand it....

@willdean
Copy link
Author

OK, last entry in the monologue.

The problem (or at least, one of them) is in nodefs-handler.js, in _handleFile.

The anon function assigned to listener contains a try/catch block which is catching an exception thrown from the await stat(file) call if that happens to occur while the file is missing. This seems to cause problems.

I changed the function to do multiple retries, with a 10ms delay between them, and that's a considerable improvement - adding logging shows that there's always a retry, but other experimentation shows that this is extremely sensitive to timing, so other people may not be seeing this in the same way.

Here's my modified code - I have no idea what the logistics of (or wisdom of) trying to incorporate this into TW would be.

const maxAttempts = 5;
      for(let attempt = 0; attempt < maxAttempts; attempt++) {
        try {
          const newStats = await stat(file);
          if (this.fsw.closed) return;
          // Check that change event was not fired because of changed only accessTime.
          const at = newStats.atimeMs;
          const mt = newStats.mtimeMs;
          if (!at || at <= mt || mt !== prevStats.mtimeMs) {
            this.fsw._emit(EV_CHANGE, file, newStats);
          }
          if (isLinux && prevStats.ino !== newStats.ino) {
            this.fsw._closeFile(path)
            prevStats = newStats;
            this.fsw._addPathCloser(path, this._watchWithNodeFs(file, listener));
          } else {
            prevStats = newStats;
          }
          break;
        } catch (error) {
          if(attempt >= maxAttempts-1) {
            // Fix issues where mtime is null but file is still present
            this.fsw._remove(dirname, basename);
          }
          await new Promise(resolve => setTimeout(resolve, 10));
        }
      }

@thecrypticace
Copy link
Contributor

@willdean hey really, really appreciate your help here. I did a bit of work that uses my unlink / re-add solution in addition to raw event rename detection which should hopefully work in your case. As far as I can tell it does for me in Visual Studio. Could you give the insiders build a kick to see if it works for you reliably? npm install tailwindcss@insiders

@willdean
Copy link
Author

@thecrypticace I've just tried the insiders build (as a local install, not a -g which is how I normally have TW). And it gives me this exception, which I suspect is unrelated to this work?

Rebuilding...
TypeError: Cannot read property 'split' of undefined
    at getClassCandidates (\EmbeddedWebApp\node_modules\tailwindcss\lib\lib\expandTailwindAtRules.js:86:30)
    at \EmbeddedWebApp\node_modules\tailwindcss\lib\lib\expandTailwindAtRules.js:184:13
    at \EmbeddedWebApp\node_modules\tailwindcss\lib\processTailwindFeatures.js:51:53
    at Object.Once (\EmbeddedWebApp\node_modules\tailwindcss\lib\cli.js:682:27)
    at LazyResult.runOnRoot (\EmbeddedWebApp\node_modules\postcss\lib\lazy-result.js:337:23)
    at LazyResult.runAsync (\EmbeddedWebApp\node_modules\postcss\lib\lazy-result.js:393:26)
    at async rebuild (\EmbeddedWebApp\node_modules\tailwindcss\lib\cli.js:739:26)
    at async \EmbeddedWebApp\node_modules\tailwindcss\lib\cli.js:845:17

(I've trimmed-out some confidential stuff from the file paths.)

@thecrypticace
Copy link
Contributor

Oh interesting. I have seen that before but i have to spam rebuilds for a while before it happens. I'll see if I can figure out what causes that. I figured it was unrelated. Does it happen on the first rebuild for you?

@willdean
Copy link
Author

Yes - the initial build was without problem, then this exception happens on each save of the html file.

@thecrypticace
Copy link
Contributor

I think I know what the problem is. Working on a solution

@thecrypticace
Copy link
Contributor

thecrypticace commented Aug 26, 2022

@willdean Heh apparently visual studio's atomic saves can even break Visual Studio if you spam them enough. It'll get confused and eventually open the save as dialog. Anyway, I pushed what is hopefully a fix. This is actually another bug in chokidar itself that I had copied the original code from and it seemed to work but I think I must've done something wrong when testing. Hopefully it's good now. Can you update your build and test it again?

@willdean
Copy link
Author

Hi @thecrypticace - Looks like that's working, many thanks.

In testing this I have noticed an unrelated 'feature' where TW does not remove selectors that are no longer needed until the CLI is restarted (i.e. regenerations of the file are not as 'clean' as the first generation). Is this intended? If not I'll file a separate issue with explicit repro steps.

@thecrypticace
Copy link
Contributor

That is indeed intended. It's a performance optimization so we don't have to rescan everything over and over.

@willdean
Copy link
Author

OK, that makes sense. I'd guess that cache is also the basis for the bug I see where we end up with lots of duplicates in the Tailwind output until we restart - I've no idea how to reliably reproduce that though, so I've only mentioned in passing on #7449.

@willdean
Copy link
Author

willdean commented Aug 29, 2022

@thecrypticace Unfortunately while running with the insiders build and Jetbrains Rider, I've noticed an intermittent failure which gives the following message:

(node:59920) UnhandledPromiseRejectionWarning: Error: ENOENT: no such file or directory, open '[Redacted Filename].razor~'
(Use `node --trace-warnings ...` to show where the warning was created)
(node:59920) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)

I think the file with the ~ on the end of it is Rider doing some kind of safe-save, and we're into the same sort of problem where files appear briefly and then disappear again. It feels to me that when this crash doesn't occur, we instead get a double-rebuild by TW (which is benign, of course).

Edit: Here's the same thing with '--trace-warnings` turned on:

(node:20444) Error: ENOENT: no such file or directory, open '[RedactedFilename].razor~'
(node:20444) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
    at emitDeprecationWarning (internal/process/promises.js:180:11)
    at processPromiseRejections (internal/process/promises.js:249:13)
    at processTicksAndRejections (internal/process/task_queues.js:96:32)

@thecrypticace
Copy link
Contributor

Hmm interesting. Yeah some apps do an occasional backup of sorts and append ~ to the beginning or end of the file name. Any details you could provide there would be helpful. I imagine there's another race condition somewhere.

@willdean
Copy link
Author

I am still trying to make sense of this, but one way it fails is in the new readFileWithRetries, where it's trying to read the file with the ~ on the end. This file is very short-lived, so this read inevitably fails, and no amount of retrying is going to make it work because it's gone.

What I don't understand is why anything is watching this file or trying to process change events for it, given that my content spec is:

        "./Views/**/*.cshtml",
        "./Views/**/*.razor",
        "./Pages/**/*.cshtml",
        "./Components/**/*.razor"

None of which should match a filename which ends .razor~

Globbing problem?

@thecrypticace
Copy link
Contributor

Oh… that's a really good point. We shouldn't be getting an event for that file at all. But if chokidar delivers one that's a problem. I have a feeling I know what this is related to. When files are removed we add them back to the watcher. This makes complete sense when watching individual files but should be unnecessary when watching directories.

So we likely need to compare the file we got an event for with our globs and skip it if it's a glob instead of an individual file path.

@thecrypticace
Copy link
Contributor

Gonna re-open this so I don't forget to add that.

@thecrypticace thecrypticace reopened this Aug 29, 2022
@willdean
Copy link
Author

willdean commented Aug 29, 2022

OK - so the re-adding has added something which never matched the filter in the first place? Not quite sure why we got on event on it if that's the case, but the rest of it fits. There's certainly a world of hurt if we start watching stuff we're not supposed to.

This is a show-stopper as it is because I can't work with Rider at all (and I suspect this will break all the umpteen other Jetbrains IDEs (Jira, Webstorm, etc) because they probably all use the same save process.

Edit: I guess this might be one of the reasons Chokidar warns against the RAW event: that it might return things which don't actually pass the filename filtering.

@thecrypticace
Copy link
Contributor

Yep, we'll definitely need to do glob-matching & filtering. In this case it's not the raw event though. I think it's just watch directory + unlink that's the issue. I don't believe that the raw event is the culprit here — it just happens to trigger the error. I could be wrong tho — will have to check. One of the main problems is that chokidar, when watching individual files, stops watching after an individual file is removed. This isn't a problem when watching directories.

@willdean
Copy link
Author

It's the raw "rename" event I'm seeing the .razor~ filename come through. I don't think Chokidar does glob filtering on raw events.

As a quick fix I returned out of the raw rename handler if the filename ended with a ~ and that "fixed" the problem here. I am aware that's not a general solution!

@thecrypticace
Copy link
Contributor

Yeah, you're totally right. The unlink event won't fire for files that don't match the glob. However, unconditionally adding files back from an unlink is unnecessary so I've optimized that by only doing it for "static" globs. And I've also filtered out the rename events that don't match the defined content paths.

@willdean can you give the insiders build a check (it'll take a few minutes to publish)? Hopefully this will work… 🤞

@willdean
Copy link
Author

LGTM (so far!)

@thecrypticace
Copy link
Contributor

O_O YES 🎉 Gonna re-close this but let me know if you run into anything else. Seriously, thank you for all of your help. It's very appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants