New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you run/update the numbers on Mono? #3

Closed
migueldeicaza opened this Issue Apr 2, 2018 · 10 comments

Comments

Projects
None yet
3 participants
@migueldeicaza

migueldeicaza commented Apr 2, 2018

Hello Aras,

How are you running/updating the numbers in Mono, as there is no MathF there yet.

I did a port from MathF to the DesktopCLR API, which is supported in Mono, and put my changes here:

https://github.com/migueldeicaza/ToyPathTracer/tree/desktop-clr

The odd thing is that on my system, .NET Core 2.1.4 seems slower, perhaps we have different versions?

$ dotnet --version
2.1.4
$ mono --version
Mono JIT compiler version 5.13.0 (master/4723e6603e6 Sun Apr  1 21:34:34 EDT 2018)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	Interpreter:   yes
	LLVM:          supported, not enabled.
	GC:            sgen (concurrent by default)
$ dotnet run
2752.30ms 4.2Mrays/s 11.47Mrays/frame frames 1
^c
$ mono demo.exe
1819.82ms 6.3Mrays/s 11.47Mrays/frame frames 1
^c
$ mono -O=float32 demo.exe
1462.56ms 7.8Mrays/s 11.47Mrays/frame frames 1
^c
$

That said, to make it more apples to apples, I just submitted a pull request to Mono to get MathF:

mono/mono#7941

With that patch, I can compare apples of the same species, and now I get:

$ dotnet run
2876.87ms 4.1Mrays/s 11.76Mrays/frame frames 1
2874.08ms 4.1Mrays/s 11.76Mrays/frame frames 2
2876.00ms 4.1Mrays/s 11.76Mrays/frame frames 3
2860.35ms 4.1Mrays/s 11.76Mrays/frame frames 4
2852.96ms 4.1Mrays/s 11.75Mrays/frame frames 5
2842.87ms 4.1Mrays/s 11.76Mrays/frame frames 6
2840.46ms 4.1Mrays/s 11.76Mrays/frame frames 7
2835.01ms 4.1Mrays/s 11.76Mrays/frame frames 8
2829.51ms 4.2Mrays/s 11.76Mrays/frame frames 9
2829.30ms 4.2Mrays/s 11.75Mrays/frame frames 10
2833.15ms 4.2Mrays/s 11.76Mrays/frame frames 11
2831.47ms 4.2Mrays/s 11.76Mrays/frame frames 12
2841.67ms 4.1Mrays/s 11.76Mrays/frame frames 13
2851.48ms 4.1Mrays/s 11.76Mrays/frame frames 14
2857.88ms 4.1Mrays/s 11.76Mrays/frame frames 15
2881.28ms 4.1Mrays/s 11.75Mrays/frame frames 16
2888.21ms 4.1Mrays/s 11.76Mrays/frame frames 17
2894.16ms 4.1Mrays/s 11.76Mrays/frame frames 18
2896.44ms 4.1Mrays/s 11.75Mrays/frame frames 19
2904.69ms 4.0Mrays/s 11.76Mrays/frame frames 20
2937.43ms 4.0Mrays/s 11.76Mrays/frame frames 21
2936.99ms 4.0Mrays/s 11.76Mrays/frame frames 22
2937.48ms 4.0Mrays/s 11.76Mrays/frame frames 23
2935.11ms 4.0Mrays/s 11.76Mrays/frame frames 24
2933.07ms 4.0Mrays/s 11.76Mrays/frame frames 25
2932.50ms 4.0Mrays/s 11.76Mrays/frame frames 26
2928.50ms 4.0Mrays/s 11.76Mrays/frame frames 27
2928.81ms 4.0Mrays/s 11.76Mrays/frame frames 28
2927.48ms 4.0Mrays/s 11.76Mrays/frame frames 29
2925.44ms 4.0Mrays/s 11.76Mrays/frame frames 30
$ mono mathf.exe
1815.47ms 6.5Mrays/s 11.76Mrays/frame frames 1
1825.87ms 6.4Mrays/s 11.76Mrays/frame frames 2
1813.91ms 6.5Mrays/s 11.76Mrays/frame frames 3
1836.47ms 6.4Mrays/s 11.76Mrays/frame frames 4
1849.84ms 6.4Mrays/s 11.75Mrays/frame frames 5
1843.00ms 6.4Mrays/s 11.76Mrays/frame frames 6
1870.65ms 6.3Mrays/s 11.76Mrays/frame frames 7
1873.14ms 6.3Mrays/s 11.76Mrays/frame frames 8
1871.27ms 6.3Mrays/s 11.76Mrays/frame frames 9
1873.10ms 6.3Mrays/s 11.75Mrays/frame frames 10
1871.02ms 6.3Mrays/s 11.76Mrays/frame frames 11
1868.86ms 6.3Mrays/s 11.76Mrays/frame frames 12
1870.36ms 6.3Mrays/s 11.76Mrays/frame frames 13
1872.45ms 6.3Mrays/s 11.76Mrays/frame frames 14
1871.38ms 6.3Mrays/s 11.76Mrays/frame frames 15
1870.67ms 6.3Mrays/s 11.76Mrays/frame frames 16
1873.83ms 6.3Mrays/s 11.76Mrays/frame frames 17
1876.38ms 6.3Mrays/s 11.76Mrays/frame frames 18
1878.16ms 6.3Mrays/s 11.75Mrays/frame frames 19
1879.80ms 6.3Mrays/s 11.76Mrays/frame frames 20
1880.18ms 6.3Mrays/s 11.76Mrays/frame frames 21
1882.34ms 6.2Mrays/s 11.76Mrays/frame frames 22
1878.91ms 6.3Mrays/s 11.76Mrays/frame frames 23
1880.97ms 6.3Mrays/s 11.76Mrays/frame frames 24
1879.36ms 6.3Mrays/s 11.76Mrays/frame frames 25
1879.97ms 6.3Mrays/s 11.76Mrays/frame frames 26
1878.91ms 6.3Mrays/s 11.76Mrays/frame frames 27
1878.21ms 6.3Mrays/s 11.76Mrays/frame frames 28
1879.25ms 6.3Mrays/s 11.76Mrays/frame frames 29
1879.26ms 6.3Mrays/s 11.76Mrays/frame frames 30
$ mono -O=float32 mathf.exe
1633.95ms 7.2Mrays/s 11.76Mrays/frame frames 1
1545.29ms 7.6Mrays/s 11.76Mrays/frame frames 2
1509.87ms 7.8Mrays/s 11.76Mrays/frame frames 3
1550.70ms 7.6Mrays/s 11.76Mrays/frame frames 4
1565.38ms 7.5Mrays/s 11.75Mrays/frame frames 5
1551.46ms 7.6Mrays/s 11.76Mrays/frame frames 6
1567.24ms 7.5Mrays/s 11.76Mrays/frame frames 7
1579.61ms 7.4Mrays/s 11.76Mrays/frame frames 8
1565.49ms 7.5Mrays/s 11.76Mrays/frame frames 9
1558.81ms 7.5Mrays/s 11.75Mrays/frame frames 10
1570.96ms 7.5Mrays/s 11.76Mrays/frame frames 11
1583.83ms 7.4Mrays/s 11.76Mrays/frame frames 12
1587.42ms 7.4Mrays/s 11.76Mrays/frame frames 13
1590.42ms 7.4Mrays/s 11.76Mrays/frame frames 14
1596.51ms 7.4Mrays/s 11.76Mrays/frame frames 15
1596.27ms 7.4Mrays/s 11.75Mrays/frame frames 16
1591.59ms 7.4Mrays/s 11.76Mrays/frame frames 17
1591.84ms 7.4Mrays/s 11.76Mrays/frame frames 18
1585.52ms 7.4Mrays/s 11.75Mrays/frame frames 19
1578.45ms 7.4Mrays/s 11.76Mrays/frame frames 20
1575.85ms 7.5Mrays/s 11.76Mrays/frame frames 21
1571.86ms 7.5Mrays/s 11.76Mrays/frame frames 22
1568.69ms 7.5Mrays/s 11.76Mrays/frame frames 23
1565.06ms 7.5Mrays/s 11.76Mrays/frame frames 24
1562.82ms 7.5Mrays/s 11.76Mrays/frame frames 25
1560.53ms 7.5Mrays/s 11.76Mrays/frame frames 26
1558.02ms 7.5Mrays/s 11.76Mrays/frame frames 27
1555.59ms 7.6Mrays/s 11.76Mrays/frame frames 28
1554.55ms 7.6Mrays/s 11.76Mrays/frame frames 29
1552.10ms 7.6Mrays/s 11.76Mrays/frame frames 30
@xoofx

This comment has been minimized.

xoofx commented Apr 3, 2018

@migueldeicaza you need to run with release dotnet run --framework netcoreapp2.0 -c Release otherwise everything is running with debug for .NET Core and it is uber slow! 😛 With Release, you should get a x5 speedup compare to Mono (with or without -o=float32)

@xoofx

This comment has been minimized.

xoofx commented Apr 3, 2018

Note, I have added multitargeting to add net461 FW moniker as part of PR #4 to make it possible to compile with dotnet build and run with mono

@aras-p

This comment has been minimized.

Owner

aras-p commented Apr 3, 2018

@migueldeicaza I just ran from VS Mac, with Release config and without debugging. Don't know what exactly it ends up doing, will investigate.

@migueldeicaza

This comment has been minimized.

migueldeicaza commented Apr 4, 2018

Thanks xoofx! That does explain the difference!

With this change, on my machine I get 19.5 for .NET Core, and 7.3 for Mono, when using -O=float32

Update: I tried to use the LLVM backend for Mono. The numbers cannot be compared with the number above as I do not have a compiled version of Mono/LLVM with MathF support, but things improve a bit: mono/llvm/float32: 14.5, dotnet -c Release: 19.5.

@aras-p

This comment has been minimized.

Owner

aras-p commented Apr 4, 2018

Ok, phew, so I did not do a gross mis-representation of Mono; it is close to 3x slower on this particular workload than .NET Core. I'll add a note on float32 & LLVM options to the post.

Why is 32 bit floating point not the default by the way? (my guess would be on a "it's a long story...", which is fine :))

@migueldeicaza

This comment has been minimized.

migueldeicaza commented Apr 4, 2018

It is a long story - let me write a blog post for posterity, I can not let you have all the fun!

@migueldeicaza

This comment has been minimized.

migueldeicaza commented Apr 10, 2018

Ok, this has turned out to be a good exercise!

Our LLVM support was still doing 32->64 casts, so we were not really getting the full gains of 32-bit float when using LLVM!

So we just fixed that too. And we also increased the inline limit (pending pull request) that is more suitable for LLVM.

The current inline limit was suitable for Mono's JIT compiler, but it was too small for LLVM, so this is closer to C++. The value currently can be tuned by hand using the MONO_INLINE_LIMIT environment variable).

This is running on a different computer, so I can not compare apples/apples against the above, but the results are now as follows:

Runtime Results
dotnet -c Release 18.3Mray/sec
mono --llvm -O=float32 12.4Mray/sec
MONO_INLINELIMIT=100 mono --llvm -O=float32 22.5Mray/sec
@migueldeicaza

This comment has been minimized.

migueldeicaza commented Apr 10, 2018

Ok, just got home, and got numbers for the same machine that I originally tested on.

The .NET Core -c Release numbers do not match the original results, they are slightly better, but I did reboot the machine since then, so probably something changed in my environment.

So in this one, I am testing .NET Core 2.1.4 and both Mono/master that includes the LLVM float32 support, and that one with more aggressive inlining:

Runtime Results
dotnet -c Release 21.0Mray/sec
mono --llvm -O=float32 16.0Mray/sec
MONO_INLINELIMIT=100 mono --llvm -O=float32 29.1Mray/sec
mono defaults, unoptimized 6.7Mray/sec
@migueldeicaza

This comment has been minimized.

migueldeicaza commented Apr 12, 2018

Blogged the story of float32 and the numbers:

http://tirania.org/blog/archive/2018/Apr-11.html

aras-p added a commit that referenced this issue Apr 16, 2018

C#: add "regular" .NET 4.6 project too (TestCsRegular).
I would have used one of them "multi-targeting" thingies in one .csproj, but VSMac does not support them, i.e. no UI to pick which one to target from the IDE.

So a separate .csproj it is. Which of course is complicated in something called project.json (whatever that is, I've no idea), and needs a custom <BaseIntermediateOutputPath> tag in the new .csproj. See NuGet/Home#5126

Computers were a mistake!

Anyhoo, numbers in Mray/s on Mac (see issue #3):
- dotnet (2.1.4): 13.0
- mono (5.8.1): 5.5
- mono -O=float32: 6.0
- mono --llvm: 7.8
- mono --llvm -O=float32: 10.5
- MONO_INLINELIMIT=100 mono --llvm -O=float32: 12.7
@aras-p

This comment has been minimized.

Owner

aras-p commented Apr 16, 2018

I just wrote an update to Mono performance tweaks, including link to your blog post here: http://aras-p.info/blog/2018/04/16/Daily-Pathtracer-10-Update-CsharpGPU/

Will close this issue; thanks for the investigation & suggestions!

@aras-p aras-p closed this Apr 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment