Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Premature) optimization of nqp::join() on JVM #106

Merged
merged 1 commit into from Jan 19, 2015

Conversation

gerdr
Copy link
Contributor

@gerdr gerdr commented Jul 10, 2013

Instead of using a StringBuilder (which may need to repeatedly re-allocate its internal buffer), we allocate a character array of correct size. This improves the performance of micro-benchmarks, but the effect on setting compilation is minimal.

@Benabik
Copy link
Contributor

Benabik commented Jul 10, 2013

StringBuilder does have a constructor that takes a capacity. Have you compared timings with using that?

@gerdr
Copy link
Contributor Author

gerdr commented Jul 10, 2013

@Benabik I did not compare timings against a properly-sized StringBuilder, but if there's any performance difference at all, this will be slightly faster as I basically manually inlined AbstractStringBuilder.append(), getting rid of the overflow checks and saving the allocation of the StringBuilder object.

The JIT can potentially do exactly the same thing, but I don't think it's worth the effort to de-optimize the code now that I've already written it ;)

@Benabik
Copy link
Contributor

Benabik commented Jul 10, 2013

Agressive manual inlining leads to harder to read code. I had wondered if you could get much of the speed improvement without adding so much code.

It's also possible that the implementation of StringBuilder uses lower level loops for speed. memcpy can often be faster than manual loops, and I imagine that would be even worse when the loop is in bytecode.

What micro-benchmark improves? I could try it myself, since I suggested it.

@gerdr
Copy link
Contributor Author

gerdr commented Jul 10, 2013

Ahem, StringBuilder uses String.getChars() to copy the data, same as I do. The additional loops are there to precompute the length.

Anyway, if you want to do this, more idiomatic code that does the same thing (untested, beware typos) is

public static String join(String delimiter, SixModelObject arr, ThreadContext tc) {
    final int prim = arr.st.REPR.get_value_storage_spec(tc, arr.st).boxed_primitive;
    if (prim != StorageSpec.BP_NONE && prim != StorageSpec.BP_STR)
        ExceptionHandling.dieInternal(tc, "Unsupported native array type in join");

    final int numElems = (int) arr.elems(tc);
    if (numElems == 0)
        return "";

    String[] strings = new String[numElems];
    int totalLength = delimiter.length() * (numElems - 1);

    for (int i = 0; i < numElems; ++i) {
        if (prim == StorageSpec.BP_STR) {
            arr.at_pos_native(tc, i);
            strings[i] = tc.native_s;
            totalLength += tc.native_s.length();
        } else {
            String s = arr.at_pos_boxed(tc, i).get_str(tc);
            strings[i] = s;
            totalLength += s.length();
        }
    }

    StringBuilder sb = new StringBuilder(totalLength);
    sb.append(string[0]);

    for(int i = 1; i < strings.length; ++i) {
        sb.append(delimiter);
        sb.append(strings[i]);
    }

    return sb.toString();
}

As a micro-benchmark, just call nqp::join() on an array that's 'big enough'.

@Benabik
Copy link
Contributor

Benabik commented Jul 10, 2013

I appear to have mis-read your commit, my apologies. And looking at the code to still use StringBuilder, I see there's not a lot of difference in complexity anyway. When I read "premature optimization" and "manual inlining of library code", I worry a bit. Thank you for explaining things.

In the meantime, I have to figure out how I broke my nqp.jvm...

@diakopter
Copy link
Contributor

looks like my emailed reply didn't make it an hour ago; oh well; here it is:

It's worth testing at least, since it's also conceivable the JIT can optimize the usual factoring better.

@gerdr
Copy link
Contributor Author

gerdr commented Jul 10, 2013

I agree that it's worth looking into, but mostly for academic reasons. I would be very surprised if the JIT could optimize the less explicit code any better as StringBuilder is essentially a thin wrapper for exactly the same code.

@teodozjan
Copy link
Contributor

Hi,

IMHO it requires benchark.

Also there is temptation to make StringBuilder static. See this stackoverflow question but may become memory hungry solution if someone uses this in bad way

@moritz
Copy link
Contributor

moritz commented Jan 19, 2015

I've done some rudimentary benchmarks in https://gist.github.com/moritz/ab4e7cd01c36032e4d6b

Results:

  • startup time unchanged (2.02 +- 0.06 => 2.08 +- 0.15)
  • performance improvements of 9.6%, but in my numbers that's only three standard deviations (so don't trust it too much).

So a decent improvement in a micro benchmark while only slightly increasing code size and complexity sounds like an overall win to me.

@moritz moritz merged commit 52733c5 into Raku:master Jan 19, 2015
@rurban
Copy link

rurban commented Jan 21, 2015

See also parrot/parrot#1123 to fix this in the parrot backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants