Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync methods seems to hang #5417

Open
bjornbouetsmith opened this issue Feb 5, 2024 · 5 comments
Open

Sync methods seems to hang #5417

bjornbouetsmith opened this issue Feb 5, 2024 · 5 comments
Assignees
Labels

Comments

@bjornbouetsmith
Copy link

bjornbouetsmith commented Feb 5, 2024

Describe the bug
When running version 6.2 of this library using net.tcp and having a sync service interface, then calling the methods from a task based application makes the code seem to hang.

To Reproduce
Hi,
This seems to still be an issue.

In NET framework 4.8 if you call a sync method from a Task based method everthing just works.
If you do the same in a net6 app, the application seems to hang while the threadpool slowly ramps up its worker threads.

i.e.

using System.Threading.Tasks;
using System;
using System.Collections.Generic;
using System.ServiceModel;

namespace Whatever
{
    [ServiceContract]
    public interface IWcfClient
    {
        [OperationContract]
        string SyncMethod();
    }
    static class Program
    {
        static async Task<int> Main(string[] args)
        {
            await RunTest();
            return 0;
        }
        private static async Task RunTest()
        {
            List<Task> tasks = new List<Task>();
            List<string> results = new List<string>();
            IWcfClient wcfClient = default; // Create this via ChannelFactory
            for (int i = 0; i < 100; i++)
            {
                Console.WriteLine($"Start new [{i}] {DateTime.Now}");
                var iLocal = i;
                var task = Task.Run(async () =>
                {

                    string data = wcfClient.SyncMethod();
                    lock (results)
                    {
                        results.Add(data);
                    }
                    Console.WriteLine($"##Got result [{iLocal}] " + DateTime.Now);
                });
                tasks.Add(task);
            }

            Task.WaitAll(tasks.ToArray());
            Console.WriteLine($"##Got results [{results.Count}] " + DateTime.Now);
        }
    }
}

Expected behavior
Code runs as in net framework 4.8, so its possible to migrate from net framework 4.8 to net 6 in a client

Additional context
If you run the above example code in NET Framework 4.8 it runs very quickly - if you run the same code in net6 and most likely also higher, it seems to just hang.

This behavior breaks legacy applications that tries to convert from net framework to a net6 wcf client using the same sync api.

Running in NET6 - the first results comes in quickly, and then it just hangs and after a while the remaining tasks completes when the threadpool has ramped up enough threads to continue.

We have this issue in a service that works as a "proxy" between a cloud environment using grpc and an onprem legacy application in WCF.

When the service starts it immediately receives a lot of requests - many more than it has cores (default min threads in threadpool) - this is not a problem in net framework 4.8, but in NET6 the WCF client it uses to communicate with the legacy WCF service exhibits this behavior of hanging - and because the service continues to receive requests, it snowballs into many thousands of tasks queued that will most likely never finish because threadpool is ramping too slowly up and because of this seemingly "new" change in behavior in between net framework 4.8 and net6.

You can "fix" this behavior by forcing the thread pool to start with many more worker threads, but I think that is a work around and I would prefer a better solution if any exist.

I know its a bad pattern to call a sync method from a task based code base, but you cannot magically switch entire codebases to Task based overnight - its much easier to change from target framework net48 -> net6.0.

@bjornbouetsmith
Copy link
Author

Possibly related to #1153 #4946

@mconnew
Copy link
Member

mconnew commented Feb 6, 2024

We only have an async implementation in .NET6, so when you call a synchronous operation it does sync over async and it basically blocks the calling thread waiting for the background task to complete. On .NET Framework there was a sync implementation (when we started the port there were zero sync IO api's at the time so it didn't make sense to implement sync all the way down until the final IO call as that would be duplicating code.) This means we use more threads as you have the blocked calling thread, and a background thread being used. Because the implementation is async, you don't need 2 threads for the entire duration of the call, only until we're waiting for IO completion. The problem you're seeing is when that IO completes, there isn't a thread available to resume the call.
As you noted, you can increase the number of worker threads, but you don't need to force more initial threads. If you increae the min thread setting in the thread pool, it will only create threads when needed. It does this anyway, but once min threads has been reached, it only adds one thread every half a second. Increasing min threads allows it to create those extra threads immediately. You might try increasing the min IO threads too. We tend to try to stay on those threads as much as possible to prevent thread starvation like you are seeing, but underlying libraries like HttpClient automatically dispatch completions to a worker thread and there's nothing we can do about that.

One thing you might not be aware of is that the async/sync pattern on the client and the server don't need to be the same. If your concern is it's a lot of work to modify your service implementation to be Task based, you don't need to do that. As long as the service and client operations are equivalent (same base name, same parameters, sync return type the same on async but wrapped in a Task etc), then the client can use a contract interface which is async without needing to change anything on the service side. So you can do this:

Service Contract

    [ServiceContract]
    public interface IWcfClient
    {
        [OperationContract]
        string SyncMethod();
    }

Client Contract

    [ServiceContract]
    public interface IWcfClient
    {
        [OperationContract]
        Task<string> SyncMethodAsync();
    }

You just add Async to the end of the method name and wrap the return type in a Task. If it returns void, then the return type is just Task. This will have the same on-wire communication. If you are using out or ref in your parameters, it's a bit more work and you need to use message contracts. Let me know if that's the case and I can provide details how to do that.

@bjornbouetsmith
Copy link
Author

One thing you might not be aware of is that the async/sync pattern on the client and the server don't need to be the same. If your concern is it's a lot of work to modify your service implementation to be Task based, you don't need to do that.

I am aware that this is a possibility, but we still have even older clients expecting the sync client methods to work as well - and we have built a lot of code into our client to facilitate load balancing, sharding etc and all that code is also sync - so basically it would require an almost full rewrite of our client side code to switch to an async pattern, with code duplicated for both sync and async clients.

My real concern with this change in the WCF client framework is that its a "non breaking" change that has been implemented, but it breaks in a subtle way because the "WCF Client framework" still supports sync methods on the surface, but only if you have a "normal" usage pattern.

Huge bursts in calls before the threadpool is "warmed" up will make it seems like something broke, and if you have a continuous traffic thats much greater than the initial threadpool size you can get a snowball effect that will cause the application to never recover because new tasks will be spawned in a much greater rate than the threadpool will create new threads to handle the io completions.

In my opinion it actually would have been better if the WCF Client library did not support sync wcf contracts at all, since its only "half supported" - and then people would notice this when upgrading to NET6 and revert their change until they have converted their code base to a async version.

I get that my use case might be a niche usage pattern, but we have services handling almost two billion requests per day which ends up with a lot of concurrent calls

(when we started the port there were zero sync IO api's at the time so it didn't make sense to implement sync all the way down until the final IO call as that would be duplicating code.)

Does this mean that there are sync io api's now and in theory it would be possible to implement a full sync path for client methods that use a sync service interface?

As you noted, you can increase the number of worker threads, but you don't need to force more initial threads. If you increae the min thread setting in the thread pool

That is what I meant and for now I guess that is what we will do - call SetMinThreads with a sufficiently high number.

I think it might be worth having a BIG warning for people updating from net framework -> NET6 that if your load is sufficiently high and you are using sync clients, you might experience this issue and a work around for that until you have converted the codebase to task based.

@ptixed
Copy link

ptixed commented Apr 19, 2024

@mconnew Do you know if the behavior you described also applies to versions before NET6? I am seeing similar behavior on NET Core 3.1. Threads are getting blocked at this line likely due to thread pool starvation.

@mconnew
Copy link
Member

mconnew commented Jun 18, 2024

Yes, this behavior has been there since the very first WCF Client release for .NET Core. When we first released, there was no sync api's that we could even consume. There's been some improvements and optimizations to reduce how much this is happening so it has got better in some scenarios since 3.1. For example, for net.tcp, if you are using localhost, we no skip the async DNS lookup as we know it's just IPAddress.Loopback for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants