-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfMemoryException in Thread.StartInternal() on Unix with plenty of mem free #13062
Comments
|
Assuming you're using https://github.com/dotnet/wcf, it confirms that the IOThreadScheduler uses the It's very likely that those threads pile-up for some reason until you reach the system limit. |
I don't know if it makes a difference to this issue or not, but you linked/refered to the CoreRT source code, did you mean to refer to the CoreCLR source instead, e.g. Thread.CoreCLR.cs? Or did you mean to open this issue on the CoreRT repo? |
I am, in the case above, the length of |
I'm embarrassed to say that I didn't notice, sorry about that. I'm not using CoreRT (or atleast I don't think so? - the application is started using The CoreCLR repos does not seem to contain the source code of However, I still think, like kevingosse says, that I'm likely hitting some other limit than free system memory. My reason for posting here, was because I think it could be a bug, or at least misleading, that an |
It's an
Which, if I've got it right, means the |
I am actually calling a WCF service once every minute, to maintain an internal lookup table of user id=>name mapping.
Where I may try rewriting this to a REST service, to fully remove WCF from the project, or remove the wrapping code so I can call
The length of I'm considering trying to regularly log the contents of some of the "files" under the |
Digging even deeper, we end up in And then into Here I'm starting to loose the thread (pun intended), either At some point, we end up in https://github.com/dotnet/coreclr/blob/835836c9d34409af0f31529201dfd57cb2bd053c/src/pal/src/thread/thread.cpp#L596 In any case, I am beginning to see why it's hard to report all the way to the application code, exactly which resource is missing :-) |
Is it possible that your system has a limit imposed on the amount of virtual memory space it is allowed to use? You can use |
It doesn't seem so, this is the output of
|
@ThomasHjorslevFcn strace output from the run could show us where the issue stems from. Would you be able to capture and share it? Please note that it will be really huge after 30..60 minutes of running. |
The ECS Docker instances and attached storage are de-allocated as soon as the instance crashes. |
We don't have a Linux serve with the 30-40GB free storage required so I'm currently running it like this on our Linux build server, hoping to see it crash:
In other news, I noticed that our logging system often log this error shortly before a crash:
Sometimes the exception message is this instead:
|
It's still running, but now that it's running on a Linux server, not Docker, I have a bit more opportunity to examine it.
The number 3040 is rising steadily and roughly corresponds to the number of requests currently received by the application (3288). I'm not sure if the last part is a coincidence.
I'm guessing that at some point it runs against some limit, like max_map_count. |
I was able to provoke the OOM by executing
Confirming that |
The sockets leak part of this issue should probably move to corefx. @davidsh @stephentoub |
@ThomasHjorslevFcn the trace log confirms that the OOM exception stems from that:
|
After finding a way to monitor and reproduce, I think I have isolated the problem to our home grown logging system, specifically one which ships logs every second to a HTTPS endpoint. After disabling it, public static HttpResponseMessage PostAsJson<T>( HttpClient httpClient, string url, T data )
{
var dataAsString = JsonConvert.SerializeObject( data );
var content = new StringContent( dataAsString );
content.Headers.ContentType = new MediaTypeHeaderValue( "application/json" );
return httpClient.PostAsync( url, content ).Result;
} I'm guessing the problem is using |
@ThomasHjorslevFcn, what does the caller of PostAsJson look like? Is it creating a new HttpClient for every request and then not disposing of it? What is it doing with the HttpResponseMessage it gets back? |
Ok, this is embarrassing. protected override void WriteEntries( List<LogEntry> entries )
{
var client = new HttpClient( );
client.DefaultRequestHeaders.Accept.Clear( );
client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue( "application/json" ) );
foreach( var entries2 in Batch( entries, 100 ) )
{
PostAsJson( client, EndpointUrl, entries2 );
}
} This library was recently ported to .net standard and I think I even code reviewed this part :-( So it seems it all boils down to a missing using/Dispose(). I'm sorry for wasting time your here! |
It should definitely be disposed... or better yet, rather than disposing it, storing it as a singleton that all requests use (and then definitely not disposing it :). As currently written, each WriteEntries call is going to need to establish a new connection to the server, since connection pooling is done at the level of the HttpClient (or, more specifically, at the level of the handler it wraps and instantiates in its default ctor if one isn't explicitly provided). If the client/handler were shared across WriteEntries calls, then the connection pooling would then enable connections to be reused as well. By not disposing of the handler, each WriteEntries call is creating a new connection and then stranding it in the HttpClient's connection pool. It'll eventually be cleaned up, but it could take some non-trivial period of time to happen. Note that there was a related bug here, which could be impacting you further: https://github.com/dotnet/corefx/issues/37044. We rely on Socket's finalizer to close the connection in a case like this, but for various reasons that wasn't working. It was fixed for 3.0 in dotnet/corefx#38499. |
Now converted to a singleton, the application has been running for 24 hours without crashing, thanks everyone! |
I'm experiencing an asp.net core application crashing regularly (every 30-60 minutes) with
System.OutOfMemoryException
. It seems to only happen on the Unix platform.The stack trace is always:
StartInternal()
can throw OOM whenCreateThread()
returns false.CreateThread()
can only return false ifRuntimeThread_CreateThread()
does.pthread_attr_init(&attrs)
,pthread_attr_setstacksize(&attrs, stackSize)
orpthread_create(&threadId, &attrs, startAddress, parameter)
does not return 0 (SUCCESS).Since the server is far from out of memory, I suspect some other limit has been reached, but since
StartInternal()
throws OOM no matter what non-success code is returned, it's hard to know what and why.The application in question is running in an Docker environment with 16GB memory. Because of the crashes, I have added a per-5-seconds logging of some key process figures. I never see Process.WorkingSet64 go over 2-3GB.
Here is an example logging a few seconds before a crash:
concurrency
is a counter I increment/decrement on web service call start/end (it varies between a few and around 100)proc thread count
is the length ofProcess.GetCurrentProcess().Length
avail worker threads
andavail compl port threads
is fromThreadPool.GetAvailableThreads()
ws
isProcess.GetCurrentProcess().WorkingSet64
I have been able to reproduce in our test env a few times, under synthetic load, but only after about 1 hour of heavy concurrent load.
Some facts that may be relevant:
microsoft/dotnet:2.2-aspnetcore-runtime
Environment.Is64BitOperatingSystem
istrue
Environment.Is64BitProcess
istrue
IntPtrSize
is8
uname -a
returnsLinux ip-172-xxx-xxx-xxx.eu-central-1.compute.internal 4.14.123-111.109.amzn2.x86_64 dotnet/coreclr#1 SMP Mon Jun 10 19:37:57 UTC 2019 x86_64 GNU/Linux
WebRequest
HTTP or FTP fetch.async
and all MongoDB access and HTTP/FTP requests make full use ofasync/await
.The text was updated successfully, but these errors were encountered: