Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tail opcode being emitted for normal methods, destroys JIT optimizations #6329

Open
NinoFloris opened this Issue Mar 13, 2019 · 29 comments

Comments

Projects
None yet
@NinoFloris
Copy link

NinoFloris commented Mar 13, 2019

After a benchmarking discussion in the FSSF slack we found the main culprit for the C# / F# speed differences. tail. was being emitted for a non recursive method, that turns off JIT inlining and other optimizations.

Repro can be found here.

https://sharplab.io/#v2:DYLgZgzgNALiCWwA+B7ADgUwHYAIDKAnhDBgLYCwAUOtvkSaQHQBKArljPKRowMIqk0iDACc8ogG7wAxhghUqMAphwAZFAHMNo1RgkZgOALw4AgsZwAGHEhwAhHBYCMOBZSUr1W0QAoAlMZUjo7AGDA43BAQAIbaEBaExGR8KMCh0pwoWBCMAOLYojKMqvDEADzEIvBYGgB8Pk6WTZZ+ANxBwR2OANplALJhABYoACYAkoLAPgMww+OTAPJomdmMploichDw+mNYwNXVGn61ALpdOKHhWOg4/hb+F9ykAEaiOLOlxZo+ofqGIDUmm0Il0/ygEQgGhwgMqRwCRguwQiW1icjWIxGPlIUL8SOCNzQj0oTzIbxEH0GXy8vz0BhhQO8oLpwAhKBeACsGQByaII/GOSIxOIYrHsjmMAAqKDwMCqNX8eMoyORhOJimUGBwkuiiC8IPuiOVwSuKKiaPiJkSDBSaQwGXgWRy+SwhWkxVKMAqcqO9UazTaF1Jr3enxyNL+9MB+p0LIhOOhsJ9NX5xpVZuF6NMmOxuLcyOe5Mp1J+kYBjJBYIMbM5PL5gTTKqFFtFPnFUplyY0iqoQA===

Another smaller codegen issue can be seen in the IL for ObjLog<'a>

            .locals init (
                [0] !!a
            )

            IL_0000: ldarg.0
            IL_0001: ldfld class [mscorlib]System.Collections.Generic.List`1<string> _/TailLogger::messages
            IL_0006: ldarg.2
            IL_0007: stloc.0
            IL_0008: ldloca.s 0
            IL_000a: constrained. !!a
            [...]

The C# IL for the same method completely does away with the arg to local dance and just uses the arg directly.

These extraneous load/store patterns confuse the JITs dataflow analysis whether things are being modified or not, hampering enregistration.

https://sharplab.io/#v2:C4LglgNgNAJiDUAfAAgBgATIIwBYDcAsAFDHIDM6ApgHYCuAtugDID2A5m5QE5OUBulCOmIBvYunQBBKOPQAhYgF9ipCsgBM8gIYBnSqw7dRsiQAcuYPluCVmYHcAA82VAD509Sjp1bOO9AC86NSUAO52Ds5YbgAU0agJAJSERBISJphqOMzsMQacPPyC6BBF0JjRHjpsiRliqWmNnt6+XgB0kjAwMfTVyRnKJA0S5JjZBo4AKq557AW8AkKli1Dok+gsAEYAVrXD6PWNTV4+fh1dMVvbbZMsAMrAFtRsMYn9+4OKQA=

If anything the focus should be on making sure tail. is only ever emitted for real recursive methods/functions

@EBrown8534

This comment has been minimized.

Copy link

EBrown8534 commented Mar 13, 2019

As a note, this is the performance benchmark that was being used:

https://github.com/EBrown8534/Logging-Benchmark

Given a workaround from @NinoFloris the results looked as follows:

// * Summary *

BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT


|                  Method |      Mean |      Error |     StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  33.14 us |  0.3343 us |  0.3127 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|             LogDirectFs |  33.86 us |  0.6621 us |  0.9496 us |  0.62 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  54.51 us |  1.0658 us |  1.6276 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogViaInjectionFs |  59.06 us |  1.1748 us |  2.4781 us |  1.08 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  47.44 us |  0.3609 us |  0.3200 us |  0.87 |    0.03 |     18.8599 |      3.7231 |           - |            78.27 KB |
|        LogViaCallbackFs |  50.51 us |  0.9964 us |  0.9320 us |  0.93 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|            LogObjDirect | 650.95 us |  6.5345 us |  5.7927 us | 11.95 |    0.42 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|          LogObjDirectFs | 682.77 us | 13.3395 us | 22.2873 us | 12.46 |    0.54 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|      LogObjViaInjection | 694.41 us | 13.6506 us | 13.4067 us | 12.76 |    0.53 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|    LogObjViaInjectionFs | 715.00 us | 15.6588 us | 18.6406 us | 13.11 |    0.53 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|       LogObjViaCallback | 678.59 us | 13.1145 us | 13.4676 us | 12.47 |    0.56 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|     LogObjViaCallbackFs | 689.77 us | 13.4700 us | 19.7442 us | 12.65 |    0.45 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|         LogInlineDirect |  33.13 us |  0.3786 us |  0.3356 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogInlineDirectFs |  35.31 us |  0.7056 us |  1.2359 us |  0.65 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  34.65 us |  0.6788 us |  1.3239 us |  0.64 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
| LogInlineViaInjectionFs |  35.03 us |  0.6944 us |  1.3543 us |  0.64 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  53.60 us |  1.0659 us |  2.0020 us |  0.98 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineViaCallbackFs |  57.05 us |  1.1371 us |  3.2443 us |  1.08 |    0.07 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineDynamicDirect |  81.92 us |  1.6557 us |  3.1099 us |  1.51 |    0.07 |     18.7988 |      3.0518 |           - |            78.21 KB |

However, pre-workaround we were seeing large disparities between C# and F# functions:

// * Summary *

BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT


|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  32.95 us |  0.4434 us |  0.4148 us |  32.86 us |  0.61 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|             LogDirectFs |  53.02 us |  0.4130 us |  0.3864 us |  52.95 us |  0.98 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  54.23 us |  1.0741 us |  1.8527 us |  53.39 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogViaInjectionFs |  59.85 us |  0.4193 us |  0.3502 us |  59.91 us |  1.11 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  46.98 us |  0.3562 us |  0.2974 us |  46.96 us |  0.87 |    0.03 |     18.8599 |      3.7231 |           - |            78.27 KB |
|        LogViaCallbackFs |  58.36 us |  0.5470 us |  0.5116 us |  58.20 us |  1.07 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|            LogObjDirect | 641.96 us |  4.9715 us |  4.4071 us | 640.42 us | 11.85 |    0.38 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|          LogObjDirectFs | 688.81 us | 13.4879 us | 15.5327 us | 682.40 us | 12.60 |    0.60 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|      LogObjViaInjection | 686.75 us |  6.9659 us |  6.5159 us | 685.58 us | 12.63 |    0.46 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|    LogObjViaInjectionFs | 707.65 us |  8.0404 us |  7.5210 us | 705.40 us | 13.01 |    0.42 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|       LogObjViaCallback | 676.84 us | 13.3201 us | 21.5095 us | 669.99 us | 12.45 |    0.47 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|     LogObjViaCallbackFs | 676.51 us |  8.0330 us |  7.5141 us | 674.65 us | 12.44 |    0.42 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|         LogInlineDirect |  32.70 us |  0.1734 us |  0.1353 us |  32.66 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogInlineDirectFs |  47.08 us |  0.4349 us |  0.4068 us |  47.03 us |  0.87 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  32.95 us |  0.4549 us |  0.4255 us |  32.83 us |  0.61 |    0.02 |     18.8599 |      3.7231 |           - |            78.21 KB |
| LogInlineViaInjectionFs |  48.79 us |  0.9598 us |  1.6299 us |  48.11 us |  0.90 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  47.75 us |  1.0404 us |  1.2385 us |  47.08 us |  0.87 |    0.04 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineViaCallbackFs |  53.19 us |  0.3635 us |  0.3223 us |  53.14 us |  0.98 |    0.03 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineDynamicDirect |  73.80 us |  0.8627 us |  0.7648 us |  73.76 us |  1.36 |    0.04 |     18.7988 |      3.0518 |           - |            78.21 KB |

You'll note, that in benchmark set 1, the LogDirect and LogDirectFs are almost identical: 33.14us vs 33.86us. The second benchmark is substantially different: 32.95us and 53.02us. The only difference was the following:

    [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
    let nop () = ()
    member this.Log(level : LoggerLevel, msg : string) =
        messages.Add(msg)
        nop()

For the first, and:

    member this.Log(level : LoggerLevel, msg : string) =
        messages.Add(msg)

For the second. (Each method was modified to include or exclude the nop as appropriate, but the only change was whether that is included or not.)

Across all the benchmarks this one difference changed execution times dramatically.

@manofstick

This comment has been minimized.

Copy link
Contributor

manofstick commented Mar 17, 2019

This has been a bugbear of mine for many years! Mix the tail instruction with arguments with value type > 64 bits and performance is terrible (the > 64 bit thing is because arguments can't be passed by registers and so the JIT goes to a lot of effort to set up a stack frame that can be reused)

I think it was a mistake to be default insert tail.. We turn it off for all projects. For simple tail calls the fsharp compiler converts them into loops anyway, so its only really of use for mutually recursive functions which I assume for many code bases are the exception rather than the norm... For those cases the use of an attribute to output tail. would be preferable.

Here's a mention back from 4 years ago...

@EBrown8534

This comment has been minimized.

Copy link

EBrown8534 commented Mar 18, 2019

@manofstick How would I go about turning that off for a project?

I never have mutually-recursive functions, so it'd be great if I could turn this off because we have some F# code along very hot paths that I'd like to take advantage of this with, without needing to add a manual nop.

@7sharp9

This comment has been minimized.

Copy link
Contributor

7sharp9 commented Mar 18, 2019

--tailcalls-

@manofstick

This comment has been minimized.

Copy link
Contributor

manofstick commented Mar 18, 2019

As well as fscs --tailcalls- argument mentioned by @7sharp9, if you're using Visual Studio (or at least using fsproj files), in the project Properties under Build is a checkbox "Generate tail calls" which by default is off for Debug and on for Release builds (which has the perverse affect of, under some conditions, meaning that your debug build runs faster!). Turning it off in Release modifies the .fsproj file by adding the <Tailcalls>false</Tailcalls> in the release PropertyGroup.

<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|AnyCPU'">
    ....
    <Tailcalls>false</Tailcalls>
</PropertyGroup>
@NinoFloris

This comment has been minimized.

Copy link
Author

NinoFloris commented Mar 18, 2019

@manofstick problem is that these also get output during debug (see sharplab example, just switch to release and check bottom IL methods).

In this case the compiler is just wrong, there is nothing to tailcall optimize as there is no recursion

@KevinRansom

This comment has been minimized.

Copy link
Contributor

KevinRansom commented Mar 18, 2019

@NinoFloris
If the jit generates less well optimized code in the presence of a .tail instruction, it seems like the performance change is a Jit Bug. .tail is documented as a hint to the runtime to try for tail optimizations.

However, it also appears that the compiler can help the Jit out by not generating unnecessary .tails.

Kevin

@manofstick

This comment has been minimized.

Copy link
Contributor

manofstick commented Mar 18, 2019

Check out @mrange's spiel on tail calls here...

@forki

This comment has been minimized.

Copy link
Contributor

forki commented Mar 18, 2019

@NinoFloris

This comment has been minimized.

Copy link
Author

NinoFloris commented Mar 18, 2019

@KevinRansom the jit has never ever worked well in the presence of .tail... Sharplab runs netfx, but how I came to know of tail woes was through coreclr issues like dotnet/coreclr#18406.

But searching for "tail" in coreclr shows a ton of other issues like
dotnet/coreclr#4971
dotnet/coreclr#2556

And quite some closed issues where many are only fixed in upcoming 3.0

Search link
https://github.com/dotnet/coreclr/issues?utf8=%E2%9C%93&q=is%3Aissue+tail+

Tail is a bugfarm and it'll take years before it'll be as fast as just doing callvirt nop instead of tail callvirt

@cartermp cartermp added this to the Unknown milestone Mar 18, 2019

@mrange

This comment has been minimized.

Copy link
Contributor

mrange commented Mar 18, 2019

It's not only that .tail turns off optimization, it also inserts a call to JIT_TailCall (or something like that) when calling a function on the tail. I suspect this function collapses the stackframe which is beneficial to not run out of stackspace but really poor for performance.

On rare occassions I have seen the jitter replace a call with jmp and I throw a party everytime! (I guess if all arguments can be passed in registers and there is no stack created).

I have upvoted issues, I have spammed chat on twitch during .NET community standups about giving .tail some love but I too have come to the conclusion that as long C# don't generate .tail we won't see any investment here. So:

  1. Turn off tail call by default?
  2. Make tail call generation suppressible in code for those performance sensitive cases?
  3. Add .tail call generation to C# to create some demand to invest :)?
  4. Change JIT to try to do tail call whenever it sees an opportunity creating a massive demand to invest in better tail calls ;)
  5. Have a beer and forget about it?

PS: @manofstick now I am in AVX + Struct type performance hell. The jitter does lot of "funny" stuff :)

@NinoFloris

This comment has been minimized.

Copy link
Author

NinoFloris commented Mar 18, 2019

@mrange yeah that helper is for "slowtailcall" and the jmp is when the jit decides "fasttailcall" is possible (which indeed is dependent on argument sizes but also Linux or arm architecture doesn't have any fast tail call support yet)

@zpodlovics

This comment has been minimized.

Copy link

zpodlovics commented Mar 19, 2019

@AndyAyersMS would it be possible to keep some/most of the optimization and inlining with .tail? Would AggressiveOptimization (dotnet/coreclr#20009) attribute help here?

@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Mar 19, 2019

I'll have to read the notes above more carefully, but a few quick replies:

  • Jit_TailCall is generally only used when there isn't enough space on the caller's frame to accommodate the callee args. So you should only see it on x64 windows if callee has more than 4 args and also more args than caller. Not as sure about x86, it always has odd quirks. And yes x64 SysV support is missing, and there are limitations on arm64 too.
  • The jit is in a bit of a dilemma about deciding whether to honor explicit tail or inline -- if we inline we may blow the stack up unless we are careful. Today we never inline as we don't know how to be suitably careful. As dotnet/coreclr#18406 notes, this is something we can improve on in some cases. Aggressive opt is unrelated.
  • The jit will attempt to identify tail calls even without explicit tail, here inlining takes precedence.
@7sharp9

This comment has been minimized.

Copy link
Contributor

7sharp9 commented Mar 19, 2019

@forki Yeah the other implementations like Xamarin don't do anything with tailcalls, the arm compiler code thats generated by AOT just ignores it. When I worked for Xamarin we used to recommend that tailcalls were always disabled in a mobile solutions for that reason.

@dsyme

This comment has been minimized.

Copy link
Contributor

dsyme commented Mar 19, 2019

@forki Yeah the other implementations like Xamarin don't do anything with tailcalls, the arm compiler code thats generated by AOT just ignores it.

My understanding this is inaccurate and out-of-date.

Please see mono/mono#6914 (comment), e.g. "pretty much everything is known to work". There were some subsequent patches/fixes but my understanding is that based on the work by @jaykrell the comment above is no longer accurate.

@NinoFloris

This comment has been minimized.

Copy link
Author

NinoFloris commented Mar 19, 2019

@AndyAyersMS we did notice that even with adding noinline on the c# version it, the c# version, was still faster by quite a bit. It was only once we removed that explicit tail (and removed noinline on the c# method) that we got comparable results within noise range.

Are there more optimizations in the jit besides inlining that work less well or not at all in the presence of explicit tail?

@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Mar 19, 2019

Not many, no -- there is a restriction on the unboxing optimization (see dotnet/coreclr#21890).

@zpodlovics

This comment has been minimized.

Copy link

zpodlovics commented Mar 19, 2019

@NinoFloris Check and compare the generated C# and F# IL. By default F# compiler will inline the small methods however this IL inlining is not always profitable due the more instruction, the JIT may able to make a better inlining decision. (#5178).

@jaykrell

This comment has been minimized.

Copy link
Member

jaykrell commented Mar 19, 2019

.tail is a dilema. On desktop, .tail can tailcall anything, sometimes a perf gain but sometimes a large perf loss. You sometimes have to chose stack or cycles.

@NinoFloris

This comment has been minimized.

Copy link
Author

NinoFloris commented Mar 19, 2019

@zpodlovics I've obviously been looking at the IL ;) F# doesn't inline top level methods and functions AFAIK that only happens if they're marked inline. F# does do some optimizations for local functions and closures indeed.

@dsyme

This comment has been minimized.

Copy link
Contributor

dsyme commented Mar 19, 2019

Something of interest is that the F# compiler does have logic to eliminate emit of "unnecessary" tailcall instructions even when --tailcalls is enabled. You can search for

In this particular case, the optimization is not triggered because the call to List.Add is a virtual call.

            IL_0007: tail.
            IL_0009: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<string>::Add(!0)

There are cases where we want to make tailcalls through virtual calls to .NET libraries - for example when implementing long deep calling control structures like IStructuralComparison. However this is not such a case (adding something to a list is never going to result in unbounded stack usage) and in the vast majority of cases we really don't need to make tailcalls to .NET virtual methods.

I think it would be quite reasonable to essentially somehow whitelist pretty much every virtual call in .NET core libraries apart from the IComparable/IEqualityComparer/IStructuralComparable/... APIs.

@AndyAyersMS AndyAyersMS referenced this issue Mar 20, 2019

Open

General inlining of tail calls #23370

0 of 6 tasks complete
@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Mar 21, 2019

I think we can improve how RyuJit handles the interaction of .tail calls and inlining, see dotnet/coreclr#23370. It is too late to get any changes from this into .Net Core 3.0, but addressing this in 3.1 seems quite possible, assuming we can develop reasonable heuristics. I also expect us to fix some of the tail call limitations we see on some ABIs in .Net Core 3.1.

And perhaps there are ways to reduce the need for cases that use the Jit_TailCall helper. The jit may be able to artificially increase the size of the root method's argument area via a special prolog so that fast tail calls to callees with more arg space than callers is possible. And, the inlining changes will be done so as not to increase the need for helper-assisted calls and inlining some tail calls may eliminate some cases that would have needed the helper.

@dsyme

This comment has been minimized.

Copy link
Contributor

dsyme commented Mar 22, 2019

The jit may be able to artificially increase the size of the root method's argument area via a special prolog so that fast tail calls to callees with more arg space than callers is possible.

That sounds like a very important case, though I suppose to complete it would have to be transitive to some depth, e.g. A --> B --> C --> A --> B --> C ad infinitum, where A requires 10, B 20 and C 30.....

@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Mar 23, 2019

it would have to be transitive to some depth

Since the jit is the one issuing the slow tail calls, it is not quite that bad. Once we're done inlining we know all the tail call candidate sites in the method, and we should be able to compute how much arg stack each needs, as well as what the method would normally provide. If there's a mismatch, we just need to boost the method's arg area size to the maximum needed by any of the calls, and make sure the prolog does the right magic.

If anyone's up for benchmarking a very early version of the jit changes, feel free to build from my fork on the InlineExplicitTailCalls branch. This doesn't have the right heuristic yet so it may be too aggressive. And it does not try and remove the need for helper assisted tail calls. Let me know if you run into issues with it, or have interesting results.

@EBrown8534

This comment has been minimized.

Copy link

EBrown8534 commented Mar 25, 2019

@AndyAyersMS How difficult would that be to setup? We have an extremely hot path of F# code in our website that I could spin another cluster node up for and load this version in. We already have timings between all nodes in the cluster, so at the very least I could give you a (relative) performance differential of real-world use-cases (at least, our real-world use-case). I.e. is it overall faster, slower, or no change.

@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Mar 25, 2019

If you can run against the most recent 3.0 preview SDK, and can build the coreclr repo, then it should not be too bad. You can follow the Using Your Build instructions.

If you want something that works with 2.2 or 2.1 the changes in my branch should port over fairly cleanly. I can prop something up for this if you like but it might take a few days.

@jdh30

This comment has been minimized.

Copy link
Contributor

jdh30 commented Apr 7, 2019

@NinoFloris
"tail. was being emitted for a non recursive method"
"If anything the focus should be on making sure tail. is only ever emitted for real recursive methods/functions"
"In this case the compiler is just wrong, there is nothing to tailcall optimize as there is no recursion"

Consider this code:

let f1 x = x()
let f2 x = f1 x
let rec f3() = f2 f3

Is that call from f2 to f1 a "recursive call"? I'm guessing not but if you don't honor that tail call then f3 will stack overflow.

@NinoFloris

This comment has been minimized.

Copy link
Author

NinoFloris commented Apr 8, 2019

@jdh30 Yes it really only gets problematic when dynamic dispatch is involved.

#6329 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.