New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP2: Hang when concurrent calls with cancellation #39608
Comments
Tagging subscribers to this area: @dotnet/ncl |
@JamesNK Just to clarify... it's not always hanging, right? Only after some period of time/load? I can see one possible issue introduced by my change. If we get stuck waiting for a write on the connection to complete, then we will fail to cancel the outstanding request write. Do you know if that could be happening here? I.e. is the server itself possibly reading from the connection very slowly, causing backpressure on the client side of the connection? |
@JamesNK I'm trying to repro and I get this from the client:
I assume I need to set up certs somehow? |
You need to trust the ASP.NET Core dev cert: https://docs.microsoft.com/en-us/aspnet/core/grpc/troubleshoot?view=aspnetcore-3.1#call-a-grpc-service-with-an-untrustedinvalid-certificate You could also change the client to use http://localhost:5000 (if you do this then you'll need the AppContext switch to support h2c) |
Correct
It might not necessarily be from your change. I have only just started running gRPC unit tests against net5.0. |
@JamesNK You also need to put the server into h2c mode for this to work, right? E.g: webBuilder.ConfigureKestrel(options =>
{
options.Listen(IPAddress.Any, 5000, listenOptions =>
{
// Set protocol to HTTP/2-only to put Kestrel in h2c prior knowledge mode.
listenOptions.Protocols = HttpProtocols.Http2;
// ... |
Not needed. The gRPC template has http on 5000, https on 5001, and both are HTTP/2 only because of appsettings.json. "Kestrel": {
"EndpointDefaults": {
"Protocols": "Http2"
}
} |
#39654 does not fix this. I suspect it is unrelated to my change. |
@JamesNK I investigated your repro and it appears to me there isn't any network related issue. Timeouts you observed are almost surely caused by tasks getting into deadlocks due to missing if (waitForHeaders)
{
await call.ResponseHeadersAsync.DefaultTimeout().ConfigureAwait(false);
}
try
{
await call.RequestStream.WriteAsync(new DataMessage
{
Data = ByteString.CopyFrom(data)
}).DefaultTimeout().ConfigureAwait(false);
}
catch (Exception ex)
{
Console.WriteLine(ex);
throw;
} |
cc: @geoffkizer |
That's odd. This is a console app. I thought there was no sync context in console apps? Also the console app is correctly awaiting all tasks. There are no I will double check what happens on my computer. |
I added if (waitForHeaders)
{
await call.ResponseHeadersAsync.DefaultTimeout().ConfigureAwait(false);
}
try
{
await call.RequestStream.WriteAsync(new DataMessage
{
Data = ByteString.CopyFrom(data)
}).DefaultTimeout().ConfigureAwait(false);
}
catch (Exception ex)
{
Console.WriteLine(ex);
throw;
}
|
Sorry, my mistake, there should be one more class Program
{
static async System.Threading.Tasks.Task Main(string[] args)
{
Console.WriteLine("Go!");
var handler = new HttpClientHandler
{
ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator
};
GrpcChannel c = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
HttpHandler = handler,
ThrowOperationCanceledOnCancellation = true
});
Greeter.GreeterClient client = new Greeter.GreeterClient(c);
// Arrange
var data = new byte[1024 * 64];
const int interations = 20000;
bool waitForHeaders = true;
await RunParallel(count: 20, async () =>
{
for (int i = 0; i < interations; i++)
{
if (i % 50 == 0)
{
Console.WriteLine(i);
}
var cts = new CancellationTokenSource();
var headers = new Metadata();
if (waitForHeaders)
{
headers.Add("flush-headers", bool.TrueString);
}
var call = client.EchoAllData(cancellationToken: cts.Token, headers: headers);
if (waitForHeaders)
{
await call.ResponseHeadersAsync.DefaultTimeout().ConfigureAwait(false);
}
try
{
await call.RequestStream.WriteAsync(new DataMessage
{
Data = ByteString.CopyFrom(data)
}).DefaultTimeout().ConfigureAwait(false);
}
catch (Exception ex)
{
Console.WriteLine(ex);
throw;
}
cts.Cancel();
}
}).ConfigureAwait(false);
Console.WriteLine("Done");
Console.ReadKey();
}
public static Task RunParallel(int count, Func<Task> action)
{
var actionTasks = new Task[count];
for (int i = 0; i < actionTasks.Length; i++)
{
actionTasks[i] = action();
}
return Task.WhenAll(actionTasks);
}
} |
@JamesNK Could you please confirm whether the full version above fixes the issue on your machine? |
It still fails await RunParallel(count: 20, async () =>
{
for (int i = 0; i < interations; i++)
{
if (i % 50 == 0)
{
Console.WriteLine(i + " ConfigureAwait + RunParallel");
}
var cts = new CancellationTokenSource();
var headers = new Metadata();
if (waitForHeaders)
{
headers.Add("flush-headers", bool.TrueString);
}
var call = client.EchoAllData(cancellationToken: cts.Token, headers: headers);
if (waitForHeaders)
{
await call.ResponseHeadersAsync.DefaultTimeout().ConfigureAwait(false);
}
try
{
await call.RequestStream.WriteAsync(new DataMessage
{
Data = ByteString.CopyFrom(data)
}).DefaultTimeout().ConfigureAwait(false);
}
catch (Exception ex)
{
Console.WriteLine(ex);
throw;
}
cts.Cancel();
}
}).ConfigureAwait(false);
|
Hmm, you are right, it started happening again on my machine, too. That's really strange because before I fully ran it several times without any issues. Will continue investigation. |
It seems I found the root cause. There is a deadlock between processing WindowUpdateFrame and cancelling Http2Stream.
|
Description
Grpc.Net.Client (which uses HttpClient internally) is hanging when there are 20 concurrent calls with cancellation happening in a loop.
Repo: repos.zip
Changing ConsoleApp50 target framework from net5.0 to netcoreapp3.1 fixes the timeout exceptions.
Normally I'd attempt to remove Grpc.Net.Client library from the repo, but in this case the gRPC call is using a number of fairly advanced features (bidi streaming, cancellation) so I have left it in.
What console app is doing (20 parallel threads on one HTTP/2 connection):
client.EchoAllData(cancellationToken: cts.Token, headers: headers)
await call.ResponseHeadersAsync.DefaultTimeout();
await call.RequestStream.WriteAsync(...)
<- this is where the hang happenscts.Cancel();
Configuration
Using nightly SDK: 5.0.100-rc.1.20367.2
Regression?
Yes, this worked in netcoreapp3.1
The text was updated successfully, but these errors were encountered: