Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shadow copy directory in inconsistent state after reboot #45017

Open
adityamandaleeka opened this issue Nov 11, 2022 · 14 comments
Open

Shadow copy directory in inconsistent state after reboot #45017

adityamandaleeka opened this issue Nov 11, 2022 · 14 comments
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM

Comments

@adityamandaleeka
Copy link
Member

adityamandaleeka commented Nov 11, 2022

(Reported by @RichardD2 here: dotnet/AspNetCore.Docs#23733 (comment))

We have a Windows Server 2019 VM hosting (currently) 15 instances of an API, all based on identical code. They're all configured with separate AppPools, all set to "no managed code", and all configured for shadow-copying.

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <location path="." inheritInChildApplications="false">
    <system.webServer>
      <handlers>
        <add name="aspNetCore" path="*" verb="*" modules="AspNetCoreModuleV2" resourceType="Unspecified" />
      </handlers>
      <aspNetCore processPath=".\Assembly.exe" stdoutLogEnabled="false" stdoutLogFile=".\logs\stdout" hostingModel="inprocess">
        <handlerSettings>
          <handlerSetting name="experimentalEnableShadowCopy" value="true" />
          <handlerSetting name="shadowCopyDirectory" value="../shadow/api/" />
          <!-- Only enable handler logging if you encounter issues-->
          <!--<handlerSetting name="debugFile" value=".\logs\aspnetcore-debug.log" />-->
          <!--<handlerSetting name="debugLevel" value="FILE,TRACE" />-->
        </handlerSettings>
      </aspNetCore>
    </system.webServer>
  </location>
</configuration>

ANCM version is 16.0.22232.9.

Mostly this works fine. But for the last two months, when the server reboots to install updates, one AppPool seemingly chosen at random ends up with its shadow copy being corrupted. Specifically, the .deps.json file is filled with nulls, and the main assembly .dll is 0kb in size. There are three errors logged in the application event log by the ANCM:

1027: Unable to locate application dependencies. Ensure that the versions of Microsoft.NetCore.App and Microsoft.AspNetCore.App targeted by the application are installed.

1027: Could not find 'aspnetcorev2_inprocess.dll'. Exception message:
A JSON parsing exception occurred in C:\inetpub\Instance\shadow\api\99\Assembly.deps.json, offset 0 (line 1, column 1): The document is empty.
Error initializing the dependency resolver: An error occurred while parsing: C:\inetpub\Instance\shadow\api\99\Assembly.deps.json

1010: Failed to start application '/LM/W3SVC/9/ROOT/api', ErrorCode '0x8000ffff'.

After stopping the affected AppPool, deleting the shadow folder, and starting the AppPool again, everything springs back into life.

There don't appear to be any other relevant events in the event log around the time of the restart.

I don't know whether this is a bug in the .NET 6 shadow copying option, or a problem with the VM, or whether something else is interfering when the server restarts.

Has anyone else seen this, or have any suggestions on how to diagnose and/or fix it? (The debug file won't help, since the issue occurs before the application has a chance to start.)

Originally posted by @RichardD2 in dotnet/AspNetCore.Docs#23733 (comment)

@adityamandaleeka
Copy link
Member Author

@RichardD2 Does this happen after every reboot? Can you reproduce the issue with a simple Hello World type app?

@HaoK Any thoughts about what could cause this?

@HaoK
Copy link
Member

HaoK commented Nov 11, 2022

@RichardD2 so there's always one and only one app pool that is corrupted? Can you think of any other changes that happened around the time this started happening 2 months ago)?

Since you don't see any messages around the error I'm guessing this is something that the file watcher caused issues with, it looks like the file watcher calling CopyToDirectoryInner which looks at the last modified time of the file before updating it, I wonder if these files are locked so it copies over munged / null files.

void Environment::CopyToDirectoryInner(const std::filesystem::path& source, const std::filesystem::path& destination, const std::filesystem::path& directoryToIgnore, int& copiedFileCount)

One potential issue I see is that we don't check whether CopyFile (which appears to be the CopyFileW function https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-copyfilew) succeeded, at a minimum we should probably check and log an error here:

CopyFile(path.path().wstring().c_str(), destinationPath.wstring().c_str(), FALSE);

@adityamandaleeka adityamandaleeka added the Needs: Author Feedback The author of this issue needs to respond in order for us to continue investigating this issue. label Nov 14, 2022
@ghost
Copy link

ghost commented Nov 14, 2022

Hi @adityamandaleeka. We have added the "Needs: Author Feedback" label to this issue, which indicates that we have an open question for you before we can take further action. This issue will be closed automatically in 7 days if we do not hear back from you by then - please feel free to re-open it if you come back to this issue after that time.

@RichardD2
Copy link

@HaoK It always seems to affect only one app pool; but it's been a different one each time.

It looks like there was a gap in the Windows Updates due to an issue with the hosting company's WSUS server, so October would have been the first month of updates since we moved this application to .NET 6. The October update took it from 6.0.6 to 6.0.9. I don't know whether that would update the ANCM at the same time.

@adityamandaleeka So far it's happened twice, immediately after installing Windows updates and restarting. It's a live server, so I haven't tried rebooting without pending updates. I also haven't tried a simple "hello world" app - I'm not sure how much use that would be, since the affected app pool seems to be random, so the chances of it affecting a specific target app pool are quite low.

@ghost
Copy link

ghost commented Nov 21, 2022

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.

@RichardD2
Copy link

What further information is required?

@ghost
Copy link

ghost commented Nov 25, 2022

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.

@RichardD2
Copy link

Again, what further information is required?

Is this bot just going to keep threatening to close the issue unless I post a pointless comment every few days?!

@adityamandaleeka adityamandaleeka removed the Needs: Author Feedback The author of this issue needs to respond in order for us to continue investigating this issue. label Nov 26, 2022
@adityamandaleeka
Copy link
Member Author

@RichardD2 I just removed the tag that was causing that.

@adityamandaleeka adityamandaleeka added this to the .NET 8 Planning milestone Dec 7, 2022
@ghost
Copy link

ghost commented Dec 7, 2022

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@adityamandaleeka
Copy link
Member Author

Moving to .NET 8 Planning to address @HaoK's comment above: #45017 (comment)

@RichardD2
Copy link

Just had a repeat of this issue - this time it happened almost immediately after installing .NET 6.0.13 and 7.0.1 on the server, without waiting for a reboot.

Log Name: Application
Source: IIS AspNetCore Module V2
Date: 16/01/2023 12:15:18
Event ID: 1027
Level: Error
Description:
Unable to locate application dependencies. Ensure that the versions of Microsoft.NetCore.App and Microsoft.AspNetCore.App targeted by the application are installed.

Followed by:

Log Name: Application
Source: IIS AspNetCore Module V2
Date: 16/01/2023 12:15:18
Event ID: 1027
Level: Error
Description:
Could not find 'aspnetcorev2_inprocess.dll'. Exception message:
A JSON parsing exception occurred in [C:\inetpub..._shadow\api\502\OurApp.deps.json], offset 0 (line 1, column 1): The document is empty.
Error initializing the dependency resolver: An error occurred while parsing: C:\inetpub..._shadow\api\502\OurApp.deps.json

And then:

Log Name: Application
Source: IIS AspNetCore Module V2
Date: 16/01/2023 12:15:18
Event ID: 1010
Level: Error
Description:
Failed to start application '/LM/W3SVC/8/ROOT/api', ErrorCode '0x8000ffff'.

The updates were installed at around 12:07. The error started on the first request to this specific AppPool after the update was installed. The other 14 AppPools running the same code were not affected.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
@RichardD2
Copy link

We're still seeing this issue on a semi-regular basis. Should we try installing the .NET 8 ANCM, or would we need to upgrade the application itself to .NET 8 as well?

@RichardD2
Copy link

We've installed the 8.0.6 ANCM, and upgraded the applications to .NET 8. Whilst it seems slightly better, we're still seeing the same issue - something triggers a new shadow copy, even though nothing has changed, and one of the .deps.json files (looks like the last one to be copied) ends up filled with NULL bytes rather than the expected JSON content.

The affected application instance then fails to load with a 500.31 - Failed to load ASP.NET Core runtime error until we stop the app pool, delete the shadow copy folders, and restart the app pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM
Projects
None yet
Development

No branches or pull requests

4 participants