Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conectivity issue when upgrading from .NET 7.0.1 to .NET 7.0.4 on Windows server #48965

Open
1 task done
haludi opened this issue Jun 22, 2023 · 9 comments
Open
1 task done
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Needs: Attention 👋 This issue needs the attention of a contributor, typically because the OP has provided an update.

Comments

@haludi
Copy link

haludi commented Jun 22, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

Our product is a distributed system consisting of multiple ASP servers.
There are TCP connections open between the servers.
And the servers also communicate with HTTP (version 1.1).

The issue:

The server doesn’t accept HTTP requests between the servers.
Even not from the server itself.
No errors on Microsoft logs (the configuration below) on the target server.
In the source server, we got:

Raven.Client.Exceptions.RavenException: An exception occurred while contacting ***URL***.
System.Net.Http.HttpRequestException: No connection could be made because the target machine actively refused it. (***URL***:443)
---> System.Net.Sockets.SocketException (10061): No connection could be made because the target machine actively refused it.

(full stack trace below)
The issue starts between 1 hour to a couple of hours after a restart.
Azure firewall and OS firewall were checked.

  • Is there a known issue with .Net versions greater than .Net7.0.1?
  • Are there additional configurations that were added or maybe an old one that wasn't respected and was fixed?
  • What additional information can we collect to identify the issue?

Some details:

One of our customers had an issue in a production environment when upgrading our product.
This customer tested the upgrade in a test environment but no issue there.
We upgraded the servers again to collect more information.
We collected Microsoft logs (the configuration below) and also tcpdump.
In the Microsoft logs, we saw no error.
On tcpdump that was collected (for 3 minutes) for the target server, we see all HTTPS packages to the port the server is listening to has no response packages [Conversation completeness: Incomplete, SYN_SENT (1)].
We didn’t have any changes regarding the server communication handling between our product versions besides upgrading .Net from .Net6 to .Net7 so we created the same build with one difference, we used .Net6 instead and the issue was solved.
Since this is a production we deployed the .Net7 version twice after the first issue to collect the logs and the issue happened again on both times.
We also saw the issue again in another customer system, this time the .Net upgrade was between .NET 7.0.1 to .NET 7.0.4 - again using .Net6 build solved the issue.

Both customers’ servers were on:

  • Windows Server 2019 Datacenter.
  • WindowsVersion:10.0 BuildVersion:17763 (1809)
  • We tried to reproduce it on Azure with no success.
  • We don't see it in our Linux customers.

Addition finding:

  • It seems the TCP connections between the nodes are working fine.
  • In Microsoft logs, we don’t see any errors or warnings.
  • In Microsoft logs Microsoft.AspNetCore.Hosting.Diagnostics, we see only HTTP/2 (this is external communication typically when you access the server from a browser).
  • In Microsoft logs, we don't see any sign of other requests even those that come from the server itself.
  • From tcpdump there are no packages that go out from the HTTPS port even though there are packages that go in.
  • Requests from outside of the cluster seem to work (We did see at the first time of issue occurrence that after one hour the requests from outside started to fail but we don’t have logs/tcpdump from this time).

tcpdump was collected for 3 minutes while the issue happened

Microsoft log configuration:

{
    "Microsoft.AspNetCore.Server.Kestrel": "Debug",
    "Microsoft.AspNetCore.Server.Kestrel.BadRequests": "Debug"
    "Microsoft.AspNetCore.Server.Kestrel.Connections": "Debug",
    "Microsoft.AspNetCore.Server.Kestrel.Http2": "Debug",
    "Microsoft.AspNetCore.Server.Kestrel.Http3": "Debug",

    "Microsoft.AspNetCore.Server.Kestrel.Transport.Quic": "Debug",
    "Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets": "Debug",

    "Microsoft.AspNetCore.Hosting.WebHost": "Debug",
    "Microsoft.AspNetCore.Hosting.Diagnostics": "Information",
}

Expected Behavior

No response

Steps To Reproduce

We couldn't reproduce

Exceptions (if any)

Exception Stack Trace:

Raven.Client.Exceptions.RavenException: An exception occurred while contacting ***URL***.
System.Net.Http.HttpRequestException: No connection could be made because the target machine actively refused it. (***URL***:443)
---> System.Net.Sockets.SocketException (10061): No connection could be made because the target machine actively refused it.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|281_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.HttpConnectionWaiter`1.WaitForConnectionAsync(Boolean async, CancellationToken requestCancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Raven.Client.Http.RequestExecutor.SendAsync[TResult](ServerNode chosenNode, RavenCommand`1 command, SessionInfo sessionInfo, HttpRequestMessage request, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1098
   at Raven.Client.Http.RequestExecutor.SendRequestToServer[TResult](ServerNode chosenNode, Nullable`1 nodeIndex, JsonOperationContext context, RavenCommand`1 command, Boolean shouldRetry, SessionInfo sessionInfo, HttpRequestMessage request, String url, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1050.
The server at ***URL*** responded with status code: ServiceUnavailable.
---> System.Net.Http.HttpRequestException: No connection could be made because the target machine actively refused it. (***URL***:443)
---> System.Net.Sockets.SocketException (10061): No connection could be made because the target machine actively refused it.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|281_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.HttpConnectionWaiter`1.WaitForConnectionAsync(Boolean async, CancellationToken requestCancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Raven.Client.Http.RequestExecutor.SendAsync[TResult](ServerNode chosenNode, RavenCommand`1 command, SessionInfo sessionInfo, HttpRequestMessage request, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1098
   at Raven.Client.Http.RequestExecutor.SendRequestToServer[TResult](ServerNode chosenNode, Nullable`1 nodeIndex, JsonOperationContext context, RavenCommand`1 command, Boolean shouldRetry, SessionInfo sessionInfo, HttpRequestMessage request, String url, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1050
   --- End of inner exception stack trace ---
   at Raven.Client.Http.RequestExecutor.ThrowFailedToContactAllNodes[TResult](RavenCommand`1 command, HttpRequestMessage request) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1177
   at Raven.Client.Http.RequestExecutor.SendRequestToServer[TResult](ServerNode chosenNode, Nullable`1 nodeIndex, JsonOperationContext context, RavenCommand`1 command, Boolean shouldRetry, SessionInfo sessionInfo, HttpRequestMessage request, String url, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1050
   at Raven.Client.Http.RequestExecutor.ExecuteAsync[TResult](ServerNode chosenNode, Nullable`1 nodeIndex, JsonOperationContext context, RavenCommand`1 command, Boolean shouldRetry, SessionInfo sessionInfo, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 919
   at Raven.Client.Http.RequestExecutor.HandleServerDown[TResult](String url, ServerNode chosenNode, Nullable`1 nodeIndex, JsonOperationContext context, RavenCommand`1 command, HttpRequestMessage request, HttpResponseMessage response, Exception e, SessionInfo sessionInfo, Boolean shouldRetry, RequestContext requestContext, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1560
   at Raven.Client.Http.RequestExecutor.SendRequestToServer[TResult](ServerNode chosenNode, Nullable`1 nodeIndex, JsonOperationContext context, RavenCommand`1 command, Boolean shouldRetry, SessionInfo sessionInfo, HttpRequestMessage request, String url, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 1050
   at Raven.Client.Http.RequestExecutor.ExecuteAsync[TResult](ServerNode chosenNode, Nullable`1 nodeIndex, JsonOperationContext context, RavenCommand`1 command, Boolean shouldRetry, SessionInfo sessionInfo, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Http\RequestExecutor.cs:line 919
   at Raven.Server.Utils.ReplicationUtils.GetTcpInfoAsync(String url, String databaseName, String databaseId, Int64 etag, String tag, X509Certificate2 certificate, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Server\Utils\ReplicationUtils.cs:line 51
   at Raven.Server.Utils.ReplicationUtils.GetTcpInfoAsync(String url, String databaseName, String tag, X509Certificate2 certificate, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Server\Utils\ReplicationUtils.cs:line 36
   at Raven.Client.Util.AsyncHelpers.RunSync[T](Func`1 task) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Client\Util\AsyncHelpers.cs:line 135
   at Raven.Server.Utils.ReplicationUtils.GetTcpInfo(String url, String databaseName, String tag, X509Certificate2 certificate, CancellationToken token) in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Server\Utils\ReplicationUtils.cs:line 30
   at Raven.Server.ServerWide.Maintenance.ClusterMaintenanceSupervisor.ClusterNode.ListenToMaintenanceWorker() in C:\Builds\RavenDB-Stable-5.4\54038\src\Raven.Server\ServerWide\Maintenance\ClusterMaintenanceSupervisor.cs:line 274

.NET Version

.NET 7.0.4

Anything else?

No response

@ayende
Copy link

ayende commented Jul 10, 2023

Hi,
Is there any additional information that we can provide?

@adityamandaleeka
Copy link
Member

Sorry for the delay getting to this. This is interesting... I'm fairly sure none of the changes between 7.0.1 and 7.0.4 (at least on the Kestrel side) would affect this. I see you mentioned that downgrading to 6.0 solved it. How confident are you that 7.0.1 doesn't have the issue?

It might be helpful to see 'Trace' level logs for "Microsoft.AspNetCore". And maybe a dump while the server is in the bad state?

@haludi
Copy link
Author

haludi commented Aug 1, 2023

Hi, thank you for the reply.
We had two incidents of that in two separate customers.
One of them upgraded our product from a version with .Net6 and the other upgraded from a version with .Net7.0.1
For both customers, the issue happened only in the production environment and has not happened in the test environment.
For both of them when we installed our latest version with .Net6 the issue was resolved.
Because this is a production environment a dump is not possible.
Next time it will happen I will collect Trace for Microsoft.AspNetCore.
To ask our customer to reproduce the issue on purpose to collect the data can be problematic (we already asked one time) but I can check.
Just to be sure, you want to collect 'Trace' for all sub logger of Microsoft.AspNetCore, right?

@ayende
Copy link

ayende commented Aug 6, 2023

Is there any additional information that we can provide? This has caused us to downgrade several customers to .NET 6.0, which is not a long term strategy, obviously.

@adityamandaleeka
Copy link
Member

As mentioned above, Trace level logs for "Microsoft.AspNetCore" would be the best next step (since @haludi said dumps are not going to be possible).

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 25, 2023
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@adityamandaleeka adityamandaleeka added the Needs: Author Feedback The author of this issue needs to respond in order for us to continue investigating this issue. label Mar 5, 2024
@danmoseley
Copy link
Member

Also curious whether you can test on .NET 8

@haludi
Copy link
Author

haludi commented Mar 5, 2024

There is a plan to do so
but since we saw that only in production environments
and we couldn't reproduce it in a test environment
we are very limited

@dotnet-policy-service dotnet-policy-service bot added Needs: Attention 👋 This issue needs the attention of a contributor, typically because the OP has provided an update. and removed Needs: Author Feedback The author of this issue needs to respond in order for us to continue investigating this issue. labels Mar 5, 2024
@danmoseley
Copy link
Member

In many/most cases the update is just flip the target framework and rebuild. Do you have the ability to do that, and deploy (perhaps temporarily and limited, just enough to see whether it's fixed)?

That would give another data point but depending on your processes and limitations might be a way to get the fix.

@haludi
Copy link
Author

haludi commented Mar 5, 2024

Currently, our customers running on dotnet 6.
We created a custom build for them.
Our stable currently running on dotnet 7 and should be upgraded to dotnet 8 in the upcoming 2 months.
We created a custom build for them to work around until we find another solution.
Our customers don't experience issues in dotnet 6.
Requesting them to upgrade to dotnet 8 and potentially break their production is problematic.
But we will gather more information if another accident happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Needs: Attention 👋 This issue needs the attention of a contributor, typically because the OP has provided an update.
Projects
None yet
Development

No branches or pull requests

6 participants