Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile memory usage on Linux #315

Open
Ernest314 opened this issue Sep 27, 2022 · 16 comments
Open

Profile memory usage on Linux #315

Ernest314 opened this issue Sep 27, 2022 · 16 comments
Assignees
Labels
area: build Build toolchain and testing type: bug Defective or unintended behavior

Comments

@Ernest314
Copy link
Contributor

For some reason, memory usage on Linux is significantly higher than on Windows. Not enough to be an issue (yet?), but something seems wrong.

This may be a known issue with OpenCvSharp?

@Ernest314 Ernest314 self-assigned this Sep 27, 2022
@Ernest314
Copy link
Contributor Author

Ernest314 commented Nov 21, 2022

Possible workaround: use pinvoke to call mallopt, changing the mmap threshold. Alternatively, the same thing can be accomplished by setting the environment variable MALLOC_TRIM_THRESHOLD_. The threshold should be set to something like 20k? The author used "40k~80k". See: dotnet/runtime#13301

Alternative workaround: call malloc_trim(0) manually (again via pinvoke) to free unreleased memory. See: mruby/mruby#5047

@Ernest314
Copy link
Contributor Author

Emgu CV is an alternative to OpenCVSharp if this really becomes an issue...

@Ernest314
Copy link
Contributor Author

Placing:

// Tune memory management for linux. Without this, OpenCvSharp
// causes "fake" (un-freed) memory usage in the range of 150+MB.
if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux)) {
	// By default, this is set for the current process only.
	Environment.SetEnvironmentVariable("MALLOC_TRIM_THRESHOLD_", "20000");
}

inside of the Irene.Program static constructor does not seem to fix this issue, temporarily commenting it out.

@Ernest314
Copy link
Contributor Author

Note: Emgu CV has a different license (GPL) than OpenCvSharp.

@Ernest314
Copy link
Contributor Author

Even after verifying that the service is running with the correctly reduced environment variable (MALLOC_TRIM_THRESHOLD_), there didn't seem to be an effect on memory usage. (Values tried: 60000, 20000, 2000.)

The code for setting it at runtime has been removed, since it probably wasn't actually taking effect (it needs to be set before the program runs).

@Ernest314
Copy link
Contributor Author

Commenting out the use of OpenCvSharp (and also ScottPlot, NLua, and HtmlAgilityPack, for good measure) had no effect on memory usage--this means it's probably something else causing all the memory usage?

@Ernest314
Copy link
Contributor Author

Manually calling large object compaction seemed to have no effect.

https://learn.microsoft.com/en-us/dotnet/api/system.runtime.gcsettings.largeobjectheapcompactionmode?view=net-7.0

(The role of large object compaction is explained here: https://www.jetbrains.com/help/dotmemory/NET_Memory_Management_Concepts.html#large-object-heap)

Calling

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

before checking memory usage also seemed to have minimal effect (<10 MB).

@Ernest314
Copy link
Contributor Author

Ernest314 commented Dec 24, 2022

This memory difference has increased after upgrading to .NET 7 (although there have also been lots of code changes too). Formerly the usage was around 50~60 MB on Windows, and 140~200 MB on Ubuntu. Now it's more like 45~50 MB on Windows, and 270~310 MB on Ubuntu.

This is getting increasingly concerning, but perhaps still manageable? (One avenue is that maybe the swapfile size can be increased?) The difference could also come down to .NET Core taking more resources to run on Linux in the first place (this is my best guess atm), since presumably .NET was written for Windows first.

There is also a possibility that the garbage collector is aware of overall memory pressure, and just hasn't kicked in (maybe it's waiting until 90% memory utilization): https://github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md#gc-is-per-process-but-is-aware-of-physical-memory-load-on-the-machine However, this would be strange, since it still should not be using up that much memory, especially when compared to Windows.

@Ernest314
Copy link
Contributor Author

JetBrains dotMemory doesn't profile unmanaged memory, making it useless for tracking this down. Or rather, it confirmed that the issue is with unmanaged memory (some 200+ MB), and while there is some on Windows too, it isn't nearly that much. (This is where my conjecture about .NET Core utilization fits in.)

Tracking this down further would require using ANTS or a different memory profiler which supports diagnosing unmanaged memory.

@Ernest314
Copy link
Contributor Author

ANTS also has issues analyzing dumps from Linux, and technically only has .NET 5 support. (It might be able to get it to work but I didn't look far enough into it.) Another option is ".NET Memory Profiler" from SciTech.

Also, dumps can be created by gcore and dotnet-dump, but I'm not quite sure how to analyze the resulting files. This article might be helpful: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-memory-leak

@Ernest314
Copy link
Contributor Author

My current best guess is that it's just the runtime/GC requesting that much memory on startup for allocation later, and a combination of the .NET runtime requiring more memory on Linux (due to Windows obviously having better native .NET support).

Microsoft provides settings here to tune GC allocation: https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector#heap-limit

I was able to bring the heap limit down pretty far, and I saw a significant reduction in RAM usage, but it wasn't linear, and stopped before hitting 100MB. From a 100MB limit to a 150MB limit the RAM usage stayed around 220MB, and at some point past that the bot wouldn't even run.

(To do this, I edited the service file to add an entry under [Service]:)

Environment="DOTNET_GCHeapHardLimit=0x7800000"

@Ernest314
Copy link
Contributor Author

The D#+ discord thinks this is because .NET core needs to be loaded from scratch on Linux (as I conjectured previously), and that additional bots may or may not also require this much memory. I've pretty much resigned myself to hoping that's the case, and to just crank up the swap size as needed. :/

@Ernest314
Copy link
Contributor Author

Ernest314 commented Dec 25, 2022

Possible things left to try:

  • Chart the memory usage over time. There was a freak occurrence where the memory usage showed up as 53MB, despite earlier calls to /about (on the same commit) return a usage around ~260MB. (b578bf9) Maybe this was because the command was issued as the program was shutting down, so it didn't actually capture full memory usage? There's also a slim chance that memory usage starts high and sometimes dips (this would be worth investigating).
  • Try measuring different metrics of memory usage (and over time), such as the working set, as opposed to private memory size. (A preliminary investigation seems to indicate this doesn't affect anything, but this was very preliminary, and this additionally could interact with additional bots using the .NET runtime.) See: https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process?view=net-7.0
  • Investigate something like GCCollectionMode, see if it has any effect: https://learn.microsoft.com/en-us/dotnet/api/system.gccollectionmode?view=net-7.0 (specifically GCCollectionMode.Aggressive)
  • Server vs Workstation GC? Although because we're running on single core, by default we should be using "workstation" GC (which is the more aggressive kind).

@Ernest314
Copy link
Contributor Author

Also apparently Process (Process.GetCurrentProcess()) is IDisposable and should probably be disposed?

@Ernest314
Copy link
Contributor Author

Putting this on-hold until I can test with additional bots (and see if memory consumption is shared0, or see if the memory consumption keeps increasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: build Build toolchain and testing type: bug Defective or unintended behavior
Projects
Status: On hold
Development

No branches or pull requests

1 participant