Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANCM: Randomly fails to load with White ANCM Error Page due to Directory Iteration on Startup #55216

Closed
1 task done
RickStrahl opened this issue Apr 19, 2024 · 9 comments
Closed
1 task done
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM

Comments

@RickStrahl
Copy link

RickStrahl commented Apr 19, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I'm running into an intermittent issue with ASPNETCore 8.0.4 app published on IIS. The app also use Shadow Copy deployment.

What I see:

  • Publish Application using dotnet-publish with WebDeploy options
  • App publishes
  • Usually it works right after publish
  • Sometime later the app restarts and fails to reload with white ACNM White error page
  • I reset the app pool
  • App usually starts back up, but sometimes it continues to fail

The error I get looks like this:

image

I have many errors in the Windows error logs with different files pointing at the problem include runtime folders that don't exist anywhere on the dev install that is publishing the app:

image

In neither case the folder exists - but it also doesn't exist on Dev (used to) and I searched code there's no reference anywhere to this folder so there should be no reason for the app to be looking there in the first place. The odd thing is that it works initially but then fails at a later restart. It's not consistent - I've tried manually restarting the app pool and IIS reset and in most cases the app works fine. But at a later time the app - without any new updates - will then fail. IOW - it works at first, and then all of a sudden fails.

As mentioned app uses Shadow Copy to ensure that deployment doesn't fail due to locked files. To be sure I cleared out all Shadow Copy folders (so there's no left over anything) but that didn't seem to help.

This problem started recently - this same app (and also another app) have been running without issues for 2 years and now 3 of my applications are starting to fail with these errors. Initially I was on 8.0.2 (which was out of sync with dev) and I updated to latest (8.0.4) on the server to match dev and republished. At first this seemed to fix it, but the problem has now returned.

Expected Behavior

A site that doesn't crash when the app pool reccyles 😄

Steps To Reproduce

There's no direct repro scenario. See notes above.

Exceptions (if any)

No response

.NET Version

8.0.4

Anything else?

All runtimes are up to date with 8.0.4 on both dev and live.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Apr 19, 2024
@gfoidl gfoidl added the feature-iis Includes: IIS, ANCM label Apr 19, 2024
@BrennanConroy
Copy link
Member

Looks similar to #48233
Try the workaround of setting cleanShadowCopyDirectory to true.

@RickStrahl
Copy link
Author

RickStrahl commented Apr 20, 2024

Thanks @b-long - I threw that in and so far it appears to be working but hasn't had enough time to verify.

But... I'm wondering why this is happening in the way it does:

  • I publish my app
  • It works
  • App restarts a few times
  • It works
  • App restarts again
  • It stops working

IOW - without any changes in the app - it stops working from a base that was working before. The question I have is why would this break any time after initial publish?

This behavior is new. Same apps (there are 3 that started failing this way) been running with ShadowCopy for - well as long as ShadowCopy has been around in preview state (.NET 6.0) and never had these issues. Oddly though this issue started with 8.0.1 installed on the server for me (recently). I then updated the server runtime to 8.0.4 thinking it was related to the version mismatch between dev and live, but it made no difference. Based on that the ACNM has not changed on the server (unless Windows Update updates just that but not the runtime?)

I hope cleanShadowCopyDirectory fixes this because these random failures are highly disruptive because they don't recover.

Also if I might make a suggestion - I think what should happen in a scenario where the ANCM fails when Shadow Copy isenabled it seems like it would be a good idea to automatically try to clean the folder and try again.

This would negate the potential overhead of having to always re-copy the bin folder even if nothing has changed for slightly slower startup.

@RickStrahl
Copy link
Author

RickStrahl commented Apr 20, 2024

So after running the cleanShadowCopyDirectory switch for a bit I've now started to see my apps not restarting, but in a different way - I end up with Http 500 errors. Happens on all 3 apps. Apps run fine after initial deploy, but when I forced a recycle or web.config touch they won't start back up.

The errors in the event log look a little different for these:

image

If I remove the cleanShadowCopyDirectory flag then I get back to the ANCM error page (with the directory traversal failure).

Only way to get the app to run then is to remove ShadowCopy entirely.

@BrennanConroy
Copy link
Member

Is it possible to try the 9.0 preview version of ASP.NET Core? We fixed the bug in the issue I linked above which was causing weird directory walking, so hopefully it will just fix your issue, but if it still persists then hopefully it'll show a different error.

Another potential with the latest log you showed is #50531. You mentioned 3 apps, are they on the same machine? Do you have multiple apps sharing the same shadow directory?

@RickStrahl
Copy link
Author

I'd have to install 9.0 for production and rebuild for 9.0 - let me see what I can do.

Isn't this an issue with the IIS module rather than the runtime? Is there any way I can update just the ANCM? That would be much more palatable to replace in production.

But... this seems like a major regression and bug for the current release and should probably be fixed for this release. At this point I have to turn Shadow Copy off because I just could not get it to work. Frustrating because I can turn it on and it works briefly before it starts failing a little later on - even though I've tried to manually cycle app pool restart IIS etc and it works fine. Always fails only hours later.

@RickStrahl
Copy link
Author

RickStrahl commented Apr 22, 2024

Another potential with the latest log you showed is #50531. You mentioned 3 apps, are they on the same machine? Do you have multiple apps sharing the same shadow directory?

Yes I'm using a single folder for a number of apps.

I set this up by using the following scheme now:

../ShadowCopyDirectories/WebStore
../ShadowCopyDirectories/LicensingService
../ShadowCopyDirectories/WebSurgeServer

Initial tests seem to confirm this works. I have to give this a day or two before I can be sure, although I did try a number of AppPool recycles, restarts and republishes without any issues so far. We'll see.

FWIW, I see the behavior that is described in #50531 where all directories are getting nuked when AppPool is nuked and cleanShadowCopyDirectories is set. If that is the case, it seems the logic needs to ensure that only folders for the same application are nuked.

I'm hoping that cleanShadowCopyDirectories is not actually required, since that's been working for a long time (since .NET 6.0 betas) prior to the recent incidents. Now I'm wondering if there wasn't some sort of directory monitoring cross talk that was causing the missing file issues that started this discussion off in the first place. I'm now running without cleanShadowCopyDirectories in the distinct folder and see how that goes first. If that fails I'll add it back and I suspect then for sure it will work, but I'd rather see it work without.

For now it looks promising as a workaround.

@RickStrahl
Copy link
Author

RickStrahl commented Apr 23, 2024

So after letting this run over the last day I've not seen any failures in any of the 3 apps that are active, even with several re-publishes and several forced App Pool recyles.

Separating out by folder seems to be doing the trick. All apps are now back to running without the explicit cleanShowCopyDirectories switch.

@BrennanConroy
Copy link
Member

Ok, so sounds like you are hitting the issue in #50531. Shall we close this then in favor of that one?

@RickStrahl
Copy link
Author

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM
Projects
None yet
Development

No branches or pull requests

3 participants