Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing charconv static assert for Microsoft STL C++20. #16

Merged
merged 4 commits into from
Sep 29, 2020
Merged

Fixing charconv static assert for Microsoft STL C++20. #16

merged 4 commits into from
Sep 29, 2020

Conversation

can1357
Copy link
Contributor

@can1357 can1357 commented Sep 28, 2020

Hey!

I've just found your library for parsing YAML files after getting tired of waiting 10 seconds to parse a 20mb YAML file with some other library and been trying to build this on Clang C++20 with Microsoft STL, but ran into this issue where enum class chars_format is not strictly char so it ends up breaking this static assert (since sizeof(...) ends up 16). After the fix it should build for C++20 as well, C++11 was working fine eitherway it seems.
C4_STATIC_ASSERT(sizeof(fmt) == _FTOA_COUNT);

P.S. I had to include the <iterator> header for std::size, not sure if the changes fit the coding style of this repo since I've just cloned it so my apologies if it doesn't.

@coveralls
Copy link

coveralls commented Sep 28, 2020

Pull Request Test Coverage Report for Build 380

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 95.664%

Totals Coverage Status
Change from base Build 376: 0.0%
Covered Lines: 1699
Relevant Lines: 1776

💛 - Coveralls

@biojppm
Copy link
Owner

biojppm commented Sep 29, 2020

Hey @can1357 - thanks for addressing this!

Actually, I'd like to avoid the <iterator> include as I've seen some reports that it is costly. So instead of std::size, let's use C4_COUNTOF() as the alternative.

So if you do that change, I'd be happy to merge this in.

@biojppm
Copy link
Owner

biojppm commented Sep 29, 2020

Thanks!

@biojppm biojppm merged commit 2fe974d into biojppm:master Sep 29, 2020
@can1357
Copy link
Contributor Author

can1357 commented Sep 29, 2020

std::size being constexpr, it shouldn't really be relevant but stl iterators are indeed slow in general so I understand avoiding anything that has to do with it completely. I've pushed the style change, should be g2g.

P.S. A little off topic but just so you have another reference, I'm dealing with some large YAML files so I've switched from yaml-cpp to ryml and it went from an approx. speed of 0.5mb/s to 111mb/s, a total life saver. Thank you for your work, it has been very helpful.

@biojppm
Copy link
Owner

biojppm commented Sep 29, 2020

You are right that std::size is lightweight and indeed it would be the ideal solution here.

It's the <iterator> include itself that I'd prefer not to have here. For application-level code, I guess it could be fine, but not here.

As for the speedup, that's incredible. On my benchmarks the best I've seen was ~100x speedup, but you got twice that! If possible, could you share your file(s) with me? I'd be interested in profiling the parse sometime in the future.

@biojppm
Copy link
Owner

biojppm commented Sep 29, 2020

I'm really intrigued by the 200x speedup. Is it apples to apples? Eg, maybe you are comparing optimized (ryml) vs non-optimized (yaml-cpp)?

@biojppm biojppm mentioned this pull request Sep 30, 2020
@can1357
Copy link
Contributor Author

can1357 commented Oct 1, 2020

I'm really intrigued by the 200x speedup. Is it apples to apples? Eg, maybe you are comparing optimized (ryml) vs non-optimized (yaml-cpp)?

Of course both are O3, only difference from your tests would be that I process the data I suppose (though even the initial parsing time is also something similar), also random access to large arrays from indexes as references stored in other fields. I'm processing llvm-pdbtool's yaml output on some PDBs if you want the samples. It indeed is pretty incredible because previously I was unable to test any changes in release before waiting minutes, debug being totally out of question haha.

This was ofc without any changes to my code, I simply implemented NodeRef::as mapped to >> to match the API and replaced the types.

@biojppm
Copy link
Owner

biojppm commented Oct 1, 2020

So let me get this right:

  • parsing speedup is inline with overall speedup (~200x)?
  • subsequent traversal speedup and deserialization are also inline with overall speedup?

If so, it is almost too good to be true, and I'd like to investigate. Sorry for insisting, but could you provide the largest file you have? A link is fine, even a temporary one. I'd like to profile, and of course I can generate one myself, but it will be different from yours and then the numbers will be different, whereas I'd prefer to be able to reproduce the numbers you're seeing.

@can1357
Copy link
Contributor Author

can1357 commented Oct 1, 2020

Here's the sample.

combase.pdb.zip

@biojppm
Copy link
Owner

biojppm commented Oct 1, 2020

I just ran the rapidyaml parse benchmark (parse only). The results are more in line with what I usually see:

On vs2019:

file: D:/proj/rapidyaml/bm/cases/combase.pdb.yaml
Run on (12 X 3400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 15360 KiB (x1)
---------------------------------------------------------------------------
Benchmark                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------
yamlcpp_ro       23089836500 ns   23093750000 ns            1 bytes_per_second=6.05868M/s
libyaml_ro       4727608400 ns   4718750000 ns            1 bytes_per_second=29.6514M/s
libyaml_ro_reuse 4725386900 ns   4718750000 ns            1 bytes_per_second=29.6514M/s
ryml_ro          2854529200 ns   2859375000 ns            1 bytes_per_second=48.9329M/s items_per_second=1.40594M/s
ryml_rw          2715757900 ns   2718750000 ns            1 bytes_per_second=51.464M/s items_per_second=1.47866M/s
ryml_ro_reuse    2479569100 ns   2484375000 ns            1 bytes_per_second=56.319M/s items_per_second=1.61816M/s
ryml_rw_reuse    1685076200 ns   1687500000 ns            1 bytes_per_second=82.9141M/s items_per_second=2.38228M/s

So a ~10x-16x speedup from yaml-cpp and ~1.5x-3x speedup from libyaml.

Of course if the tree traversal has significant logic including lookup for remote nodes, then the figures will improve a lot for rapidyaml, as the data is contiguous so the number of cache misses will be significantly lower than yaml-cpp. 200x could be possible to see in such situations, but can you confirm the figures you reported above? I want to be able to brag :-)

@can1357
Copy link
Contributor Author

can1357 commented Oct 1, 2020

I just ran the rapidyaml parse benchmark (parse only). The results are more in line with what I usually see:

On vs2019:

file: D:/proj/rapidyaml/bm/cases/combase.pdb.yaml
Run on (12 X 3400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 15360 KiB (x1)
---------------------------------------------------------------------------
Benchmark                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------
yamlcpp_ro       23089836500 ns   23093750000 ns            1 bytes_per_second=6.05868M/s
libyaml_ro       4727608400 ns   4718750000 ns            1 bytes_per_second=29.6514M/s
libyaml_ro_reuse 4725386900 ns   4718750000 ns            1 bytes_per_second=29.6514M/s
ryml_ro          2854529200 ns   2859375000 ns            1 bytes_per_second=48.9329M/s items_per_second=1.40594M/s
ryml_rw          2715757900 ns   2718750000 ns            1 bytes_per_second=51.464M/s items_per_second=1.47866M/s
ryml_ro_reuse    2479569100 ns   2484375000 ns            1 bytes_per_second=56.319M/s items_per_second=1.61816M/s
ryml_rw_reuse    1685076200 ns   1687500000 ns            1 bytes_per_second=82.9141M/s items_per_second=2.38228M/s

So a ~10x-16x speedup from yaml-cpp and ~1.5x-3x speedup from libyaml.

Of course if the tree traversal has significant logic including lookup for remote nodes, then the figures will improve a lot for rapidyaml, as the data is contiguous so the number of cache misses will be significantly lower than yaml-cpp. 200x could be possible to see in such situations, but can you confirm the figures you reported above? I want to be able to brag :-)

If you've noticed the data structure most of it is just indexes back into the root so there's definitely a LOT of random lookups, and I can assure you it's nowhere near 10x or 16x when doing real work with it. Not exaggration by any means either since it got me angry enough to switch a library mid-project since I was unable to debug in release(!) builds having to sit in front of the computer waiting for it to load whereas ever since I am dealing with even more files without any issues.

I did a few speed measurements before switching completely to ryml and those were the figures I've got, the other library is long gone from the project by now so I'd have to integrate it again to get a new measurement. I can probably try isolating a benchmark from the project I'm working on sometime but have a little bit too much to do right now so would take a while; with any kind of basic parsing of the data I'm somewhat certain you'd get similar results though.

@biojppm
Copy link
Owner

biojppm commented Oct 1, 2020

Thanks for replying.

I can probably try isolating a benchmark from the project I'm working on sometime

Please, no point going through that just to put some figures down. I just wanted to confirm that the numbers you gave were right. Usually I see between 10x to 70x speedup vs yaml-cpp -- which if you think about it, is already amazing: just imagine that instead of buying one house now you can buy 70 houses! The best I had seen before was 100x. But 200x is almost unbelievable; we tend to be skeptic of such results -- while our ego wants to believe that, our reasoning starts trying to poke holes in it.

Again, thanks for being patient with my prompting. Now the ego has free roam!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants