Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive allocations in unmanaged memory on Dictionary creation #54688

Closed
Takoooooo opened this issue Jun 24, 2021 · 9 comments
Closed

Massive allocations in unmanaged memory on Dictionary creation #54688

Takoooooo opened this issue Jun 24, 2021 · 9 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@Takoooooo
Copy link

Description

I wanted to use https://github.com/thecoderok/Unidecode.NET this project in my code, but I have realized it allocates a lot of memory. After some investigations, I have understood what the reason for all those allocations was this file
https://github.com/thecoderok/Unidecode.NET/blob/master/src/Unidecoder.Characters.cs
At peak, Dictionary initialization would consume ~65MB of memory(.net5,x64, Release), and ~99% of this memory is unmanaged, which is quite strange for me. For example "Hello World" console app(.net5,x64, Release) on my machine would consume ~8 MB of memory.
Steps to reproduce? Just initialize the dictionary with the data from https://github.com/thecoderok/Unidecode.NET/blob/master/src/Unidecoder.Characters.cs

Configuration

.net5,x64,Win10

Data

image
(With dictionary initialization)
image
Just "hello world" app to compare.

Analysis

From dotMemory, I can see what when the app just starts it starts to allocate memory to the heap(prb initializing dictionary) and also starts to massively allocate unmanaged memory. When the allocation to the heap ends some unmanaged memory is also being cleared and stabilizes on 30MB.

@Takoooooo Takoooooo added the tenet-performance Performance related issue label Jun 24, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 24, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@huoyaoyuan
Copy link
Member

It seems that diagnostics tools show string content as unmanaged memory.
However, the type has ~1.5K strings. Certainly a string shouldn't consume 30KB memory.

@GrabYourPitchforks
Copy link
Member

I temporarily pathed this to the VM because those folks probably have the best awareness of what might be reporting this. Or at least can route it appropriately. :)

@jkotas
Copy link
Member

jkotas commented Jun 25, 2021

The dictionary initialization method in the library is huge. The JITed code for it is about 1.5MB.

The unmanaged memory allocation that you are seeing is coming from the JIT. The JIT needs the unmanaged memory to create intermediate representation of the method. It is expected that the intermediate representation for 1.5MB method is going to take 10s MB.

Large auto-generated collection initializers are known source of bad performance and crashes. See for example: #8980.

You should open an issue against the library instead. The library should be fixed to use data-driven approach for initialization of the Dictionary.

@jkotas jkotas closed this as completed Jun 25, 2021
@jkotas jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-VM-coreclr labels Jun 25, 2021
@jkotas
Copy link
Member

jkotas commented Jun 25, 2021

Looks like there is an issue on this already: thecoderok/Unidecode.NET#14

@Takoooooo
Copy link
Author

The dictionary initialization method in the library is huge. The JITed code for it is about 1.5MB.

The unmanaged memory allocation that you are seeing is coming from the JIT. The JIT needs the unmanaged memory to create intermediate representation of the method. It is expected that the intermediate representation for 1.5MB method is going to take 10s MB.

Large auto-generated collection initializers are known source of bad performance and crashes. See for example: #8980.

You should open an issue against the library instead. The library should be fixed to use data-driven approach for initialization of the Dictionary.

I`m sorry, but what do you mean by the "data-driven approach for initialization of the Dictionary"?

@huoyaoyuan
Copy link
Member

huoyaoyuan commented Jun 25, 2021

Store the corresponding data in embedded binary file, primitive array of constants, or ReadOnlySpan<byte> that backed by a constant array. Then use a loop to convert the data into dictionary.

@Takoooooo
Copy link
Author

Takoooooo commented Jun 25, 2021

Store the corresponding data in embedded binary file, primitive array of constants, or ReadOnlySpan<byte> that backed by a constant array. Then use a loop to convert the data into dictionary.

I tried making one array of integers and a jagged array of strings and to add them in for loop to the dictionary, but it doesn't really solve the issue.Still~65MB

@huoyaoyuan
Copy link
Member

huoyaoyuan commented Jun 25, 2021

a jagged array of strings

This is still codeful. String is not considered primitive in this case.

You can inspect the output assembly with ILSpy, and examine the IL size of method body.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

4 participants