The results of the parse_int microbenchmark are misleading #4662

Closed
vitaut opened this Issue Oct 27, 2013 · 5 comments

Comments

Projects
None yet
4 participants
@vitaut
Contributor

vitaut commented Oct 27, 2013

The results of the parse_int benchmark microbenchmark reported on http://julialang.org/ are somewhat misleading because the C version used as a baseline is written in an inefficient way. I see at least two obvious problems with the C implementation:

  1. strlen is unnecessarily called in https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.c#L30. Moreover it is called at every iteration! An idiomatic C code for that would be something along the lines of:
long parse_int(const char *s, long base) {
    long n = 0;
    for (; *s; ++s) { // note that strlen is not called
        char c = *s;
        long d = 0;
        if (c >= '0' && c <= '9') d = c-'0';
        else if (c >= 'A' && c <= 'Z') d = c-'A' + (int) 10;
        else if (c >= 'a' && c <= 'z') d = c-'a' + (int) 10;
        else exit(-1);

        if (base <= d) exit(-1);
        n = n*base + d;
    }
    return n;
}

Note that the code is not only more efficient but also smaller and doesn't contain useless calls to strlen.

  1. The benchmark reports not only string to integer conversion (parsing) time but also integer to string conversion time. But that's not really a problem if the tests for other languages are done in the same way. The problem is that conversion from integer to string in the C benchmark is done using sprintf which is one of the most inefficient ways to it. In http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html I made a comparison of several methods (I used C++ but the point is true for C as well excluding the methods that use std::string). Even a naive implementation of ltoa (https://github.com/vitaut/format-benchmark/blob/master/ltoa.c) is ~70% faster than sprintf. The reason is that sprintf is not an integer to string conversion function, but a general formatting function. So comparing its performance to that of integer to string conversion functions in other languages such as hex in Python or toString in Javascript doesn't make much sense. For instance, the most direct counterpart to sprintf in Python is str.format and not hex.
@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Oct 27, 2013

Member

I like the improvement described in (1). As for (2), my understanding is that sprintf is pretty much the only portable idiomatic way to convert an integer to a string in standard C without writing a custom routine to do it.

Member

jiahao commented Oct 27, 2013

I like the improvement described in (1). As for (2), my understanding is that sprintf is pretty much the only portable idiomatic way to convert an integer to a string in standard C without writing a custom routine to do it.

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Oct 28, 2013

Member

This certainly needs to be fixed for 0.2 - especially since the git blame finds me as the author of this!

Member

ViralBShah commented Oct 28, 2013

This certainly needs to be fixed for 0.2 - especially since the git blame finds me as the author of this!

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Oct 28, 2013

Member

I see a 16% improvement in parse_int performance with this change. I'm gonna commit it.

Member

StefanKarpinski commented Oct 28, 2013

I see a 16% improvement in parse_int performance with this change. I'm gonna commit it.

StefanKarpinski added a commit that referenced this issue Oct 28, 2013

C parse_int micro benchmark: avoid calling strlen on every iteration.
This results in a 16% speedup on my machine. Thanks @vitaut. [#4662]
@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Oct 28, 2013

Member

Feel free to submit a pull request for a better ltoa implementation.

Member

StefanKarpinski commented Oct 28, 2013

Feel free to submit a pull request for a better ltoa implementation.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Oct 28, 2013

Member

I see from your blog, @vitaut, that you've done a lot of work on fast formatting and integer printing in various languages. If you're interested, I think that our routines are respectable, but we always value performance and I'm sure they could use some further tuning since they were originally written.

Member

StefanKarpinski commented Oct 28, 2013

I see from your blog, @vitaut, that you've done a lot of work on fast formatting and integer printing in various languages. If you're interested, I think that our routines are respectable, but we always value performance and I'm sure they could use some further tuning since they were originally written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment