New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some simple std.regex optimization #1553
Conversation
The inliner isn't as bad as you suggest. If merely doing what you did by turning simple functions into no-arg templates affects inlining, then please produce a separate bug report against dmd. That needs to be solved generally, not by working around it like this. |
It does else I wouldn't spent a better part of an evening to figure it out.
Here you go: The worst thing would be if that was the intended behaviour and modules that are merely imported are not usable for inlining (they are not used ATM see the report). |
I'd be glad to have this tiny workaround pulled untill we can have: Which AFAICT doesn't help any compiler other then DMD(?). |
Removed 2nd change. Now this pull is all about working around the bug. Marked all cases as such. |
@andralex FYI |
@blackwhale noice Not sure what's the best route to take right now. One thing I've learned is that if one one wants maximum speed, gdc and ldc must be tried. Does this change affect them? |
It does. Originally I tried this patch with LDC and got a nice speed up. Can't find exact figures but it's around 30-50%. What I do recall is that it puts us back in the game in one particular benchmark (linked in pull description). P.S. The pull that would address the bug in the compiler |
@blackwhale - that's right. As you've noticed with the LDC speed up, cross module inlining does not happen in GDC either (we may have the code loaded up in the frontend AST, but that is no good if your inliner is in the backend). |
@andralex, @blackwhale: As I noted in that Bugzilla issue, I'm not even sure what the best way to address this issue would even be. If we generate code for all the important modules just so that we might be able to inline a few functions later, we'd effectively make separate compilation impossible/pointless. I've been meaning to experiment a bit more with LTO in LDC (shipping druntime/Phobos as bitcode libraries, …), now would probably be a good point for somebody to do so. |
@klickverbot From this I gather that we ought to make 1-liners of Phobos inlinable at all costs as it's not clear when/if other compilers are going to be able to inline normal functions across modules. At the moment it appears to me that the only reliable way to do so is 0-arg template workaround applied judiciously throughout Phobos. We surely need a better solution long-term. |
@blackwhale: Speaking of other compilers, in optimized builds LDC actually uses the DMD inliner to determine which extra functions from imported modules to emit, so that they can then be inlined by the LLVM inliner. So, if DMD is able to inline a Phobos function that LDC doesn't catch, please report this as a performance problem for the time being. In the long term, however, I'd really like to get away from this model, as running the "late" semantic3 passes on modules that are not actually emitted has been a source of AST breakage problems in the past, and of course it's somewhat pointless to use an inferior inliner to determine what the better inliner is able to catch… |
I'm fine with pulling this in unless there are significant objections to the contrary. Definitely there's some TODO that has been formalized as a bug report. @ibuclaw @klickverbot? |
I'm fine with merging this too. |
Some simple std.regex optimization
no mercy was offered in merging this request |
This picks a couple of low-hanging fruits to optimize R-T & (a bit) C-T regex.
First is inlining. DMD basically never inlines stuff that is in the library and is not a template. It doesn't matter if it has the source for it (obviously DMD has all of Phobos source).
Currently the only way out is marking everything as empty-arg template and this gets us the rough analog of 'explicitly inlined' in C++.
Anyhow - around 30% of speed gain is obtained by adding a couple of empty brackets here and there alone. Or rather I got them back (in the days of FReD it was always inlined).
UPDATE:
The latter was addressed in a separate pull.
Second is removal of indirect call in the outer loop search loop. Basically it had to pick if it can do fast (kicked) search or if it should just use char by char crawling. Being outer loop I thought it wouldn't cost much to use an indirect call. Yet test runs on line by line matching indicate a noticeble speed up of around ~2% in the new version.P.S. See e.g. this sample (one of examples that are currently difficult for std.regex) https://gist.github.com/blackwhale/6470498