Proposal: TryForSufficientStack method to support stackalloc usage #24963

jkotas · 2018-02-08T12:25:34Z

From @kkokosa on February 8, 2018 12:8

Due to changes in C# 7.2 and Span, more and more stackalloc usages may become popular like:

Span<byte> span = stackalloc byte[1000];

However, this call will end up with unrecoverable StackOverflowException if there is not enough stack space left. We can do nothing in such situation which makes this approach not useful at some point. It is now just completely unreliable to guess what stackalloc size may end up with such a catastrophe.

@jkotas pointed out in dotnet/corefx#14675 that RuntimeHelpers.EnsureSufficientExecutionStack is a reliable solution for handling stack overflow in general but as MSDN says, this method "ensures that the remaining stack space is large enough to execute the average .NET Framework function". However, probing for average .NET framework function is not very helpful as stackalloc makes it not average for sure.

I propose to add a new helper method which gives at least some clue whether our stackalloc may end up with StackOverflowException:

public static bool RuntimeHelpers.TryForSufficientStack(long size)

I believe returning bool instead of throwing an exception (like InsufficientExecutionStackException from above method) is better because stackalloc is most probably used in hot paths already and adding exception handling there is rather undesirable.

As far as I understand this method seems to be quite simple in terms of implementation as all necessary data are there already. My naive implementation proposal:

FCIMPL1(FC_BOOL_RET, ReflectionInvocation::TryForSufficientStack, INT64 size)
{
    FCALL_CONTRACT;

    Thread *pThread = GetThread();

    UINT_PTR current = reinterpret_cast<UINT_PTR>(&pThread);
    UINT_PTR limit = reinterpret_cast<UINT_PTR>(pThread->GetCachedStackLimit());

    FC_RETURN_BOOL(current >= (limit + size));
}
FCIMPLEND

PS. I am not sure whether stack guard size should be taken into consideration here or not...

Copied from original issue: dotnet/coreclr#16277

The text was updated successfully, but these errors were encountered:

svick · 2018-02-08T12:44:35Z

I think the name of this method should be close to EnsureSufficientExecutionStack, to indicate that the two methods do very similar things. Maybe CheckSufficientExecutionStack or HasSufficientExecutionStack?

Also, I think Try makes sense when it's combined with a verb, so if you want to use it here, the name would be TryEnsureSufficientExecutionStack.
Should there be a way to use the new method to check for stack space for that "average function" without throwing? That could be achieved by adding default value to the size parameter (probably 0 or -1), or by adding a parameterless overload.

stephentoub · 2018-02-08T15:26:08Z

Should there be a way to use the new method to check for stack space for that "average function" without throwing?

FYI, there already is a public RuntimeHelpers.TryEnsureSufficientExecutionStack() exposed in .NET Core.

svick · 2018-02-08T15:33:11Z

there already is a public RuntimeHelpers.TryEnsureSufficientExecutionStack() exposed in .NET Core.

Then I think the method proposed here should be added as its overload.

kkokosa · 2018-02-08T15:57:28Z

there already is a public RuntimeHelpers.TryEnsureSufficientExecutionStack() exposed in .NET Core.

Then I think the method proposed here should be added as its overload.

Then probably complementary EnsureSufficientExecutionStack overload (throwing InsufficientExecutionStackException) should be added?

jkotas · 2018-02-08T16:38:55Z

If the proposed EnsureSufficientExecutionStack is added what the typical usage is going to look like?

Note that stackalloc should be only used for relatively small buffers (say up to 1kB). Array pool is more appropriate above this limit. The framework is using stackalloc in number of places and the typical pattern looks like this:

https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Text/ValueStringBuilder.cs
https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Number.Formatting.cs#L297

Would it be better to work on making this pattern easier instead?

kkokosa · 2018-02-08T17:29:37Z

For example, I was thinking about various scenarios where some temporary data collections have to be calculated. Instead of using an array or even ArrayPool, we could utilize stack with almost no time overhead:

public double Process(List<BigData> list)
{
	Span<DataStruct> preprocessedList = stackalloc DataStruct[list.Count];
	for (int i = 0; i < list.Count; ++i)
	{
		// Calculate preprocessedList [i] fields from list items
	}
	return ProcessData(new ReadOnlySpan<DataStruct>(preprocessedList ));
}

But currently every usage of stackallow above MinExecutionStackSize (from Thread::SetStackLimits method) - which is 128 kB (or 64 kB on 32-bit) - may end up with killing entire app with StackOverflowException.

I mean, I know it is quite a big size guaranteed already by EnsureSufficientExecutionStack call but... would not be nice to have some more size-aware way of checking this? Currently, people using stackalloc may be even not aware of those deep hidden implementation-related limitations. In other words, deciding whether we are using "relatively small buffers" may be too subjective when writing general-purpose library and there is no way currently to ensure that stackalloc won't fail.

jkotas · 2018-02-08T17:39:59Z

So what would your code example look like if you had EnsureSufficientExecutionStack API?

jkotas · 2018-02-08T17:43:48Z

Instead of using an array or even ArrayPool, we could utilize stack with almost no time overhead:

Large stack buffers do have overhead: The stack memory has to be faulted in (one page at a time typically), and then it tends to be stuck in the working set.

kkokosa · 2018-02-08T17:55:54Z

So what would your code example look like if you had EnsureSufficientExecutionStack API?

Something like:

public double Process(List<BigData> list)
{
	if (RuntimeHelpers.TryEnsureSufficientExecutionStack(list.Count * PUT_HERE_YOUR_STRUCT_SIZE))
	{
		Span<DataStruct> preprocessedList = stackalloc DataStruct[list.Count];
		for (int i = 0; i < list.Count; ++i)
		{
			// Calculate preprocessedList [i] fields from list items
		}
		return ProcessData(new ReadOnlySpan<DataStruct>(preprocessedList ));
	}
	else
	{
		// fallback to ArrayPool or other way of creating preprocessed list
	}
}

Large stack buffers do have overhead: The stack memory has to be faulted in (one page at a time typically), and then it tends to be stuck in the working set.

Isn't the same for faulting in memory from ArrayPool? I mean, if Process is on hot path called multiple times, stucking in working set is some kind of overhead we should be aware of but maybe worth considering? If Process was one-shot call, probably nobody would care about optimizing it in the first place.

Additionally, the point is that even with small buffers of 1 kB the TryEnsureSufficientExecutionStack says nothing whether it is sufficient for that 1 kB or maybe some other value.

jkotas · 2018-02-08T18:03:26Z

// fallback to ArrayPool or other way of creating preprocessed list

I do not think I would want to use this pattern because of it makes me duplicate the code.

Isn't the same for faulting in memory from ArrayPool?

No - if the buffer is reused on multiple threads, or different callers on the same thread.

There needs to be best practice for allocating large buffers. If half of the folks are going to do it via ArrayPool and other half with large stack allocs, everybody loses at the end.

kkokosa · 2018-02-08T18:51:32Z

I do not think I would want to use this pattern because of it makes me duplicate the code.

Ok, this was a poorly prepared example. I was thinking more about something like 'generic temporary collection factory' without repeating any processing logic:

public double Process(List<BigData> list)
{
	Span<SomeStruct> span;
	if (RuntimeHelpers.TryEnsureSufficientExecutionStack(list.Count * PUT_HERE_YOUR_STRUCT_SIZE))
	{
		span = stackalloc SomeStruct[size];
	}
	else
	{
		span = ArrayPool<SomeStruct>.Shared.Rent(size);
	}
	
	for (int i = 0; i < list.Count; ++i)
	{
		// Calculate span[i] fields from list items
	}	
	return ProcessData(new ReadOnlySpan<DataStruct>(span));
}

However, this currently does not compile with "A result of a stackalloc expression of type 'Span' cannot be used in this context because it may be exposed outside of the containing method" error. Is it supposed to be ok? Because https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md says that "A stackalloc expression is an rvalue that is safe-to-escape to the top-level scope of the method (but not from the entire method itself)".

There needs to be best practice for allocating large buffers. If half of the folks are going to do it via ArrayPool and other half with large stack allocs, everybody loses at the end.

Here I agree fully and I see your point. What still bothers me is the fact that stackalloc usage may kill any app with stack overflow and the defensive programmer has no way to proactively prevent this.

jkotas · 2018-02-08T19:19:23Z

Right, it does not compile. Also, missing in your example is code to return the buffer to the pool once you are done. The pattern that works today looks like this:

https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Text/ValueStringBuilder.cs
https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Number.Formatting.cs#L297

Or another example:

https://github.com/dotnet/corefx/pull/26942/files#diff-a07e741d448a2771beaea007ab99a266
https://github.com/dotnet/corefx/pull/26942/files#diff-00816d3a470b2701102e6be001c6e4d3R23

benaadams · 2018-02-09T14:04:12Z

Though its likely unsensible to do on a different thread than the one you are running on (unless in a Wait?); how about some utility apis for querying stack size instead?

So currently there is a maxStackSize parameter for creating a thread:

public Thread (ParameterizedThreadStart start, int maxStackSize);

But you can't query it back to find out what it was set to; so maybe something like the following would be better?

public partial class Thread
{
    public int TotalStackSize { get; }
    public int AvailableStackSize { get; }
    public int UsedStackSize { get; }
}

Its also less ambiguous with guarantees that xSufficientStack would suggest

benaadams · 2018-02-09T15:03:10Z

Or perhaps in System.Diagnostics?

namespace System.Diagnostics.ThreadInfo
{
    public static class ThreadInfoExtensions
    {
        public static int TotalStackSize(this Thread thread);
        public static int AvailableStackSize(this Thread thread);
        public static int UsedStackSize(this Thread thread);
    }
}

kkokosa · 2018-02-09T15:22:32Z

Or perhaps in System.Diagnostics?

Moving it away from "mainstream" APIs like Thread seems to be a good idea. And it seems to be a better place than quite exotic System.Runtime.CompilerServices.

jkotas · 2018-02-09T16:28:40Z

It does not make sense for these methods to take Thread because of they are applicable for current thread only. For historic reasons, there are number of methods like that on Thread that throw when called on anything but current thread. We do not want to be growing this set.

svick · 2018-02-28T12:11:19Z

Would it be more convenient if the user didn't have to compute the required size in bytes, they would just specify the type and the count of items of that type to allocate? So instead of:

RuntimeHelpers.TryEnsureSufficientExecutionStack(list.Count * PUT_HERE_YOUR_STRUCT_SIZE)

You would have:

RuntimeHelpers.TryEnsureSufficientExecutionStack<SomeStruct>(list.Count)

Though being able to specify the size in bytes is probably still useful, so this could be just a convenience overload (possibly added later than the one proposed here).

ygc369 · 2018-03-01T00:59:12Z

I like this idea.

airbreather · 2018-03-01T14:06:58Z

I do appreciate this idea, but my intuition suggests that it's a lot less useful than it seems on its face.

Thinking about when you'd need a really big buffer (like, bigger than a 4K OS page) that could legally fit on the stack, allocating that buffer on the stack feels like it could very often cause a page fault when trying to return to the caller. Since we're probably talking about situations where these scratch buffers are of input-dependent lengths, then it seems to follow that larger buffers will correlate with more time spent in the method, which means that I see a potential for a bit of a "perfect storm":

memory-constrained system that aggressively swaps out memory pages that aren't used
method that hyper-aggressively allocates whatever it possibly can on the stack
caller that invokes the method multiple times, perhaps in a loop, in such a way that the caller directly or indirectly pushes the stack off onto another page and works exclusively with those addresses.

I'd imagine that someone could produce a contrived microbenchmark that shows this as being even worse than unconditionally allocating on the heap; more likely, each method will have its own threshold where it starts making sense to use ArrayPool<T>, and I imagine that this point will usually be well below "however much stack space there currently is".

Furthermore, it feels like as soon as we write a TryEnsureSufficientExecutionStack-guarded stackalloc, everything after that becomes "serious business" until the stack-allocated buffer goes out of scope:

If buffers are sometimes going to fit on the stack and sometimes not, then it follows that there are going to be situations where the buffer only barely fits on the stack.
This means that the code in question needs to consider not just the size of their desired scratch buffer, but also the size of everything else that will be pushed onto the stack after it (including other methods being called, and their own calls, etc.) and encode all of that knowledge into a number that goes into the TryEnsureSufficientExecutionStack call.
This feels fragile: that number can depend on code that's "far away" and can change without warning.

I only bring up the above because several examples in the thread use TryEnsureSufficientExecutionStack as the only thing that determines stack vs. heap.

Finally, what are some actual situations that we're worried about here?

It seems unlikely that a non-recursive leaf method would fear allocating a 1 KiB buffer on the stack, at least at a 1MiB stack size (the default on Windows - other platforms seem to have not-smaller defaults).
- If we're worried about methods running on a thread with a lower stack size, then it feels like a better option would be to give them the ability to encode a heuristic like "stackalloc if <1 KiB and also <0.1% of max stack, otherwise grab from ArrayPool<T>".
I guess a possibility is that some method stack-allocates a buffer and passes its Span<T> down into another method that does the same, and it goes down like that, but I don't think the proposed guard really helps much for that situation (it has all the concerns from earlier in my comment, magnified a ton).

Again, though, I do appreciate the idea, and the sample in the original issue description seems to suggest that it's easy to implement. I just wanted to make sure to share my thoughts on it (it's been on my mind a ton since @svick linked it from dotnet/roslyn#25118), even if all it does is inspire a "Remarks" section in the docs.

jkotas · 2018-03-01T16:50:33Z

Completely agree with @airbreather . Similar discussion at dotnet/csharplang#1344

kkokosa · 2018-03-01T19:45:05Z

I really like the direction of this discussion. My primary reasons for this proposal were twofold:

with expected popularity growth of constructs like Span<byte> span = stackalloc byte[1000] in safe code, the actual size of the array that is safe to allocate is now quite an internal magic. More and more people may wonder what size is "safe"
stackalloc is special because it may throw uncatchable exception killing everything. Thus, it should be specially treated probably

At the moment, I fully agree with the arguments presented by you. However, the fact that such stackalloc usage is most probably, in most cases, not optimal does not incur that people will not do it. Maybe a good remark in a documentation should state clearly what this magic number is and it will be enough. Currently, I would be a little afraid to advise the client to use stackalloc and when asked what size does not kill his application, the only answer is "well... probably something around 1 KiB".

I like @jkotas idea of LocalBuffer<T> a lot as a way of addressing this problem. It would leave stackallocating-people on their own but at least they would have a safer alternative.

svick · 2018-03-01T19:57:12Z

@airbreather

This means that the code in question needs to consider not just the size of their desired scratch buffer, but also the size of everything else that will be pushed onto the stack after it (including other methods being called, and their own calls, etc.) and encode all of that knowledge into a number that goes into the TryEnsureSufficientExecutionStack call.

EnsureSufficientExecutionStack already has an opinion about how much stack is required to execute "average .NET Framework function". Maybe you could tell TryEnsureSufficientExecutionStack how many functions you need to fit on the stack after this allocation? That way, you wouldn't be encoding it yourself and any magic numbers would be hidden from you. The code could look like:

RuntimeHelpers.TryEnsureSufficientExecutionStack(
    size: list.Count * STRUCT_SIZE, calledMethods: 10)

Drawaes · 2018-03-01T23:18:55Z

There are a couple of issues that feed into this. One is that the stack size varies widely on platforms now (for instance 2mb on windows and a default of 8mb on Ubuntu). Also I am not sure about allocating large chunks of stack until we get the "don't zero the stack". Without that its faster to pull from the array pool because zeroing a large array will cost you more and as said it will need to be faulted in anyway.

airbreather · 2018-03-03T11:32:35Z

EnsureSufficientExecutionStack already has an opinion about how much stack is required to execute "average .NET Framework function". Maybe you could tell TryEnsureSufficientExecutionStack how many functions you need to fit on the stack after this allocation? That way, you wouldn't be encoding it yourself and any magic numbers would be hidden from you.

@svick looking through the code, the rule-of-thumb used by EnsureSufficientExecutionStack seems like it's already accounting for methods calling other methods at a "typical" rate (plus the overhead of CLR services), so I'm guessing that the implementation would be more like "return true iff the stack budget burned so far, plus the directly desired size, plus that rule-of-thumb value for the current thread, would be less than the current thread's max stack size".

On the one hand, that certainly feels like it would be safe enough, but on the other hand, is it maybe a bit too safe? If I'm a leaf method that's 50KiB away from overflowing the stack (which might actually start happening because of callers over-eagerly stack-allocating because of this guard), and this guard stops me from allocating 400B, then I'm not happy.

Currently, I would be a little afraid to advise the client to use stackalloc and when asked what size does not kill his application, the only answer is "well... probably something around 1 KiB".

@kkokosa Without further context, if all you're given is a question phrased like "how much can I stackalloc before my application explodes?", then the perfect answer seems to depend both on how much stack is already burned by the time your method is called and how much stack the method will burn after the stackalloc (including any of the CLR services that might get invoked automatically like the GC).

Anyway, without that context, isn't the best answer still going to be something like that? "Probably something around 1 KiB"? The helper utility would only help to remove the "stack burned so far" part, not the "stack burned after the stackalloc" part.

It's almost like the more useful helper would be to just give a utility method that answers "how many more bytes can I push onto the stack before I hit the guard page?" (edit: or was it the page that comes immediately "before" the true guard page?) and just leave it up to the caller to interpret that according to their specific knowledge of what they intend to do in the future.

iSazonov · 2018-04-27T07:38:46Z

In PowerShell repo we got first experience of using Span and stackalloc. We found that we use one pattern:

const stackallocTheshold = 120;

Span<int> array;

int size = GetArraySize();
if (size < stackallocTheshold)
{
    array = stackalloc int[size];
}
else
{
    array = new int[size]
}

Based on the discussion we would have to implement:

const RedLineStackallocTheshold = 1Kb; // ???
const StackallocTheshold = 120;

Span<int> array;

int size = GetArraySize();
if (size < StackallocTheshold && AvailableStackSize() > RedLineStackallocTheshold)
{
    array = stackalloc int[size];
}
else
{
    array = new int[size]
}

RedLineStackallocTheshold protects from stack overflow.
StackallocTheshold is important - it guarantees optimal use of the stack because we use stackalloc in nested functions. This value can only be selected based on the properties of the application itself. In Powershell repo we use 120 - this is the maximum width of the screen by default and we are sure that in most scenarios the application will use the stack, but if the user will set extremely large parameters, the application will continue work using heap.

Fallback to heap can be interesting proposal for the discussion.

GSPP · 2019-10-16T09:30:59Z

Maybe the method should return the number of bytes left. That way the caller can apply any logic they like.

This would simplify the API as well. No need for a size parameter and comparison.

jkotas · 2020-03-23T13:03:12Z

#25423 is a better proposal solving the same problem.

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

jkotas closed this as completed Mar 23, 2020

ghost locked as resolved and limited conversation to collaborators Dec 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: TryForSufficientStack method to support stackalloc usage #24963

Proposal: TryForSufficientStack method to support stackalloc usage #24963

jkotas commented Feb 8, 2018

svick commented Feb 8, 2018

stephentoub commented Feb 8, 2018 •

edited

Loading

svick commented Feb 8, 2018

kkokosa commented Feb 8, 2018

jkotas commented Feb 8, 2018

kkokosa commented Feb 8, 2018

jkotas commented Feb 8, 2018

jkotas commented Feb 8, 2018

kkokosa commented Feb 8, 2018 •

edited

Loading

jkotas commented Feb 8, 2018

kkokosa commented Feb 8, 2018

jkotas commented Feb 8, 2018 •

edited by danmoseley

Loading

benaadams commented Feb 9, 2018 •

edited

Loading

benaadams commented Feb 9, 2018

kkokosa commented Feb 9, 2018

jkotas commented Feb 9, 2018

svick commented Feb 28, 2018

ygc369 commented Mar 1, 2018

airbreather commented Mar 1, 2018 •

edited

Loading

jkotas commented Mar 1, 2018

kkokosa commented Mar 1, 2018

svick commented Mar 1, 2018

Drawaes commented Mar 1, 2018

airbreather commented Mar 3, 2018 •

edited

Loading

iSazonov commented Apr 27, 2018

GSPP commented Oct 16, 2019 •

edited

Loading

jkotas commented Mar 23, 2020

Proposal: TryForSufficientStack method to support stackalloc usage #24963

Proposal: TryForSufficientStack method to support stackalloc usage #24963

Comments

jkotas commented Feb 8, 2018

svick commented Feb 8, 2018

stephentoub commented Feb 8, 2018 • edited Loading

svick commented Feb 8, 2018

kkokosa commented Feb 8, 2018

jkotas commented Feb 8, 2018

kkokosa commented Feb 8, 2018

jkotas commented Feb 8, 2018

jkotas commented Feb 8, 2018

kkokosa commented Feb 8, 2018 • edited Loading

jkotas commented Feb 8, 2018

kkokosa commented Feb 8, 2018

jkotas commented Feb 8, 2018 • edited by danmoseley Loading

benaadams commented Feb 9, 2018 • edited Loading

benaadams commented Feb 9, 2018

kkokosa commented Feb 9, 2018

jkotas commented Feb 9, 2018

svick commented Feb 28, 2018

ygc369 commented Mar 1, 2018

airbreather commented Mar 1, 2018 • edited Loading

jkotas commented Mar 1, 2018

kkokosa commented Mar 1, 2018

svick commented Mar 1, 2018

Drawaes commented Mar 1, 2018

airbreather commented Mar 3, 2018 • edited Loading

iSazonov commented Apr 27, 2018

GSPP commented Oct 16, 2019 • edited Loading

jkotas commented Mar 23, 2020

stephentoub commented Feb 8, 2018 •

edited

Loading

kkokosa commented Feb 8, 2018 •

edited

Loading

jkotas commented Feb 8, 2018 •

edited by danmoseley

Loading

benaadams commented Feb 9, 2018 •

edited

Loading

airbreather commented Mar 1, 2018 •

edited

Loading

airbreather commented Mar 3, 2018 •

edited

Loading

GSPP commented Oct 16, 2019 •

edited

Loading