Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Additional optimizations #22

Merged
merged 7 commits into from May 13, 2014

Conversation

kanielc
Copy link
Collaborator

@kanielc kanielc commented May 13, 2014

After some serious Valgrind profiling love:

This PR also includes the optimize strings PR. Kept that open because I put a couple additional comments there.

  • Use more moves to save string and Error objects copying
  • Avoid some unnecessary string copying (in lint functions)
  • Use unordered_map and unordered_set when not needing order
    Unordered_map is faster for lookup, O(1) vs O(logn). We profiled them previously with GCC 4.8.1 to make sure, it holds even for small collections.
  • StartsWith works directly with a C-String
  • Replace atSequence's find_if with std::equal. This gave a massive cut in computation. Also got to get rid of pre-call check, as equal already does it.

1. Move strings we no longer need
2. Append char to whitespace instead of string (of 1 char)
3. Clear whitespace instead of ``` = "" ``` (tiny speed difference from
benchmarking, nice to know anyway)
4. Use ``` const char * ``` for startsWith, avoiding the creation of
temporary string objects.
@JossWhittle
Copy link
Owner

Looking really good here :) Do you have any timings for before this PR vs after? I'm getting really good speeds on a 50k+ codebase but I don't have a way of testing against the original flint atm.

return mismatch(begin(prefix), end(prefix), str_iter).first == end(prefix);
bool startsWith(string::const_iterator str_iter, const char *prefix) {
while (*prefix != '\0' && *prefix == *str_iter) {
++prefix, ++str_iter;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had no idea you could do comma separated expressions of this form. Does this actually provide a performance boost?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No performance boost, no value to doing it that way either to be honest :)

@kanielc
Copy link
Collaborator Author

kanielc commented May 13, 2014

On the folly + gtest + double-conversion code base, with stats of

Lint Summary: 518 files
Errors: 63 Warnings: 61 Advice: 817

Estimated Lines of Code: 519422 (seems high? I think something is wrong with the estimate)

Flint++: 0m2.918s
Flint D: 0m26.623s ( a bunch of exceptions, so I'm not even sure it can handle this code base)
Flint C++: 0m3.747s

I'm giving the D version a bit of a pass here, but I'd love to see what they tested on to determine that the D version was faster than the C++ version. In any case, Flint++ is faster than both now and there are still a good amount of optimizations to do (for instance, we're allocating too many strings still).

@JossWhittle JossWhittle merged commit 9f039fe into JossWhittle:master May 13, 2014
@JossWhittle
Copy link
Owner

The estimate was really an estimate haha. It was just the number of \n found during tokenization. I only added it to help debug a bug yesterday which only occurred when the VS2013 debugger was observing, but not when running in release mode.

I'll work on a better metric if we are going to keep it in.

@JossWhittle
Copy link
Owner

Additionally facebook have mentioned here and there in comments their Tokenizer gets confused by complicated template<> usage. Something gtest has a lot of. I fixed a couple of these issue early on in the rewrite so I think we're doing well compared to them atm.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants