New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should JIT recognize and hoist NumberFormatInfo.CurrentInfo calls when parsing numbers? #20938

Open
adamsitnik opened this Issue Nov 11, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@adamsitnik
Member

adamsitnik commented Nov 11, 2018

I am currently profiling ML.NET to search for some perf bottlenecks and places where we could do a better job.

One of the patterns I have found is following: parsing numbers in a loop causes a call to NumberFormatInfo.CurrentInfo per loop iteration if the FormatInfo is not provided in explicit way.

image

I have done manual hoisting in ML.NET and I saved 8% for huge file reads (gigabytes of data) (more info here dotnet/machinelearning#1599)

My proposal: JIT could recognize such pattern and hoist the result of NumberFormatInfo.CurrentInfo.

Some benchmarks results:

Method Mean Error StdDev Ratio
NotHoisted 39.45 us 0.1046 us 0.0874 us 1.00
Hoisted 37.37 us 0.1850 us 0.1730 us 0.95

Benchmark code:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;
using System.Threading;

namespace ParsingFloats
{
    class Program
    {
        static void Main(string[] args) => BenchmarkRunner.Run<ParsingFloats>();
    }

    // [BenchmarkDotNet.Diagnostics.Windows.Configs.EtwProfiler(performExtraBenchmarksRun: false)] install BenchmarkDotNet.Diagnostics.Windows and uncomment to get a profile
    public class ParsingFloats
    {
        // the input uses . as separator which is not true for all cultures
        [GlobalSetup] public void Setup() => Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;

        [Benchmark(Baseline = true)]
        public float NotHoisted()
        {
            float result = 0;

            foreach (var toParse in numbers)
                if (float.TryParse(toParse, out float value))
                    result += value;

            return result;
        }

        [Benchmark]
        public float Hoisted()
        {
            float result = 0;

            NumberFormatInfo info = NumberFormatInfo.CurrentInfo;
            foreach (var toParse in numbers)
                if (float.TryParse(toParse, NumberStyles.Float | NumberStyles.AllowThousands, info, out float value))
                    result += value;

            return result;
        }

        private readonly string[] numbers = new[] // this is real input line from wiki.en.vec file
        {
            "-0.023167", "-0.0042483", "-0.10572", "0.042783", "-0.14316", "-0.078954", "0.078187", "-0.19454", "0.022303", "0.31207",
            "0.057462", "-0.11589", "0.096633", "-0.093229", "-0.034229", "-0.14652", "-0.11094", "-0.11102", "0.067728", "0.10023",
            "-0.067413", "0.23761", "-0.13105", "-0.0083979", "-0.10593", "0.24526", "0.065903", "-0.2374", "-0.10758", "0.0057082",
            "-0.081413", "0.26264", "-0.052461", "0.20306", "0.05062", "-0.18866", "-0.11494", "-0.25752", "0.046799", "-0.050525",
            "0.06265", "0.15433", "-0.056289", "-0.048437", "-0.099688", "-0.035332", "-0.091647", "-0.081151", "-0.0010844", "-0.08414",
            "-0.13026", "0.01498", "-0.086276", "-0.053041", "-0.10644", "-0.042314", "0.086469", "0.22614", "-0.16078", "0.18845", "0.053098",
            "-0.21475", "0.16699", "-0.14442", "-0.1593", "0.0062456", "-0.07663", "-0.091568", "-0.28984", "0.027078", "0.021275", "0.023939",
            "0.14903", "-0.33062", "-0.097811", "-0.033814", "0.070587", "0.023294", "0.065382", "0.18716", "-0.13444", "0.14431", "-0.0268",
            "-0.022903", "0.097554", "-0.032909", "-0.027827", "-0.068771", "0.17053", "-0.05946", "0.020424", "-0.077589", "0.1216", "-0.077437",
            "0.10665", "0.051087", "0.0076379", "-0.064936", "0.09031", "0.059447", "0.0048881", "0.078309", "-0.012163", "0.062155", "-0.072664",
            "0.17857", "-0.22874", "0.066397", "-0.039295", "-0.027717", "0.061571", "0.072824", "-0.092512", "-0.087984", "-0.12753", "-0.0018705",
            "0.18689", "0.0051173", "-0.0013532", "0.043246", "0.10867", "-0.12209", "-0.0091676", "0.23938", "-0.059501", "-0.0010456", "0.086584",
            "0.020238", "0.21686", "0.16495", "0.037256", "0.12343", "0.17706", "0.075777", "0.031022", "-0.12948", "0.030936", "0.096897", "-0.10793",
            "0.12644", "-0.056489", "0.082232", "0.20679", "0.11679", "0.13965", "0.26362", "0.037603", "-0.003105", "-0.089501", "-0.0076969", "-0.11654",
            "-0.28567", "0.046616", "-0.0082062", "0.15621", "-0.14641", "0.064561", "-0.1133", "0.27129", "0.14532", "-0.021773", "0.23305", "-0.1617",
            "0.15705", "0.13845", "0.022417", "-0.10982", "-0.049431", "0.076855", "-0.0453", "-0.19029", "0.011183", "-0.010393", "0.0016916", "0.089407",
            "-0.051022", "-0.086066", "0.083933", "-0.0081962", "-0.0077321", "0.033991", "-0.20092", "0.03328", "0.062224", "0.016121", "0.27143", "-0.19754",
            "-0.15222", "-0.015345", "-0.063907", "-0.098597", "-0.20162", "0.14004", "-1.1533e-05", "-0.18928", "0.12253", "-0.0070378", "0.0864", "-0.30255",
            "-0.03908", "0.045517", "-0.16449", "-0.23548", "0.052781", "0.13847", "-0.20022", "-0.015974", "0.027137", "0.18287", "-0.02389", "0.22072",
            "-0.04271", "-0.075939", "-0.087386", "-0.049337", "0.047824", "-0.059078", "-0.15181", "-0.21229", "-0.054944", "-0.011453", "-0.11996", "-0.15307",
            "-0.054828", "-0.053217", "-0.048546", "0.028856", "-0.094537", "0.27144", "0.054638", "0.059727", "0.061772", "0.009259", "-0.12032", "-0.16646",
            "-0.029087", "0.0028752", "-0.16076", "-0.1371", "-0.18988", "0.022857", "0.18455", "-0.018236", "-0.0060562", "0.14302", "0.032535", "0.14333",
            "-0.030871", "-0.15218", "0.092813", "0.066358", "0.018316", "-0.24143", "0.0054391", "-0.064479", "-0.08596", "0.030446", "0.082157", "0.026093",
            "0.058985", "0.0051085", "0.089127", "-0.018164", "-0.077821", "0.0034232", "0.13892", "0.046106", "-0.05417", "0.0084399", "-0.15362", "-0.14735",
            "0.065191", "-0.022883", "-0.14498", "-0.16917", "-0.19215", "0.10611", "0.001678", "-0.16331", "-0.07307", "0.11576", "0.083567", "-0.060317",
            "-0.064714", "0.15305", "-0.11949", "0.16684", "0.14109", "0.046036", "-0.060393", "0.046595", "-0.11558", "0.044184", "-0.023124", "0.02586",
            "-0.11653", "0.010936", "0.089398", "-0.0159", "0.14866"
        };
    }
}

cc @danmosemsft @tannergooding @AndyAyersMS @CarolEidt @davidwrighton

@mikedn

This comment has been minimized.

Contributor

mikedn commented Nov 11, 2018

My proposal: JIT could recognize such pattern and hoist the result of NumberFormatInfo.CurrentInfo.

It's obvious to us humans that the parsing code won't change the current culture but this isn't something that the JIT can figure out without interprocedural analysis. In general, calls cannot be hoisted, with the exception of a few special helper calls that the JIT knows that they're "pure".

@jkotas

This comment has been minimized.

Member

jkotas commented Nov 12, 2018

It's obvious to us humans that the parsing code won't change the current culture

Other threads can change the current culture though. The current culture is a very complex algorithm with many fallbacks.

I think we can make this better by:

  • Streamlining the CurrentCulture getter
  • Caching NumberFormatInfo per thread to avoid unnecessary indirections
  • Ideally, the JIT should be able to hoist the pointer to thread local block that contains the cached NumberFormatInfo. It is an optimization that the JIT does already. The code just needs to be structured properly to take advantage of it.

Also, you may want to consider using the newer Utf8 parsers for any high-performance parsing since you probably got the input as Utf8 anyway and do not care about any of the current sensitive parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment