Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results of the parse_int microbenchmark are misleading #4662

Closed
vitaut opened this issue Oct 27, 2013 · 5 comments
Closed

The results of the parse_int microbenchmark are misleading #4662

vitaut opened this issue Oct 27, 2013 · 5 comments
Labels
performance Must go faster
Milestone

Comments

@vitaut
Copy link
Contributor

vitaut commented Oct 27, 2013

The results of the parse_int benchmark microbenchmark reported on http://julialang.org/ are somewhat misleading because the C version used as a baseline is written in an inefficient way. I see at least two obvious problems with the C implementation:

  1. strlen is unnecessarily called in https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.c#L30. Moreover it is called at every iteration! An idiomatic C code for that would be something along the lines of:
long parse_int(const char *s, long base) {
    long n = 0;
    for (; *s; ++s) { // note that strlen is not called
        char c = *s;
        long d = 0;
        if (c >= '0' && c <= '9') d = c-'0';
        else if (c >= 'A' && c <= 'Z') d = c-'A' + (int) 10;
        else if (c >= 'a' && c <= 'z') d = c-'a' + (int) 10;
        else exit(-1);

        if (base <= d) exit(-1);
        n = n*base + d;
    }
    return n;
}

Note that the code is not only more efficient but also smaller and doesn't contain useless calls to strlen.

  1. The benchmark reports not only string to integer conversion (parsing) time but also integer to string conversion time. But that's not really a problem if the tests for other languages are done in the same way. The problem is that conversion from integer to string in the C benchmark is done using sprintf which is one of the most inefficient ways to it. In http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html I made a comparison of several methods (I used C++ but the point is true for C as well excluding the methods that use std::string). Even a naive implementation of ltoa (https://github.com/vitaut/format-benchmark/blob/master/ltoa.c) is ~70% faster than sprintf. The reason is that sprintf is not an integer to string conversion function, but a general formatting function. So comparing its performance to that of integer to string conversion functions in other languages such as hex in Python or toString in Javascript doesn't make much sense. For instance, the most direct counterpart to sprintf in Python is str.format and not hex.
@jiahao
Copy link
Member

jiahao commented Oct 27, 2013

I like the improvement described in (1). As for (2), my understanding is that sprintf is pretty much the only portable idiomatic way to convert an integer to a string in standard C without writing a custom routine to do it.

@ViralBShah
Copy link
Sponsor Member

This certainly needs to be fixed for 0.2 - especially since the git blame finds me as the author of this!

@StefanKarpinski
Copy link
Sponsor Member

I see a 16% improvement in parse_int performance with this change. I'm gonna commit it.

StefanKarpinski pushed a commit that referenced this issue Oct 28, 2013
This results in a 16% speedup on my machine. Thanks @vitaut. [#4662]
@StefanKarpinski
Copy link
Sponsor Member

Feel free to submit a pull request for a better ltoa implementation.

@StefanKarpinski
Copy link
Sponsor Member

I see from your blog, @vitaut, that you've done a lot of work on fast formatting and integer printing in various languages. If you're interested, I think that our routines are respectable, but we always value performance and I'm sure they could use some further tuning since they were originally written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Development

No branches or pull requests

4 participants