Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add roaring_bitmap_portable_deserialize_frozen #421

Merged

Conversation

andreas
Copy link
Contributor

@andreas andreas commented Jan 21, 2023

This PR adds the function roaring_bitmap_t *roaring_bitmap_portable_deserialize_frozen(const char *buf), as discussed in #352. This allows deserializing a buffer in the portable format into a frozen bitmap, which is faster than regular deserialization with roaring_bitmap_portable_deserialize.

The implementation is a combination of roaring_bitmap_frozen_view and ra_portable_deserialize. The function makes a single call to malloc and uses the "arena allocator" approach from roaring_bitmap_frozen_view. Data for the containers point directly into the provided buffer (buf). I've included unit tests adapted from testing roaring_bitmap_portable_deserialize.

To gauge the performance impact, I've added deserialization to the real_bitmaps_benchmark benchmark. The chart below shows the number of cycles spent deserializing all bitmaps in each dataset using roaring_bitmap_portable_deserialize (portable), roaring_bitmap_portable_deserialize_frozen (portable frozen), and roaring_bitmap_frozen_view (frozen view). Seemingly, portable frozen is very competitive with frozen view, while sticking to the standardised format 🙂

Total cycles to deserialize all bitmaps in dataset (1)

In terms of stylistic choices, variables names are in snake case and attempt to use the terminology from the format spec (e.g. run_flag_bitset instead of bitmapOfRunContainers).

Lastly, I should caveat this by saying I very rarely program in C, so apologies in advance for stupid mistakes or oversights 😅 I've only tested this on a mac x86.

benchmarks/real_bitmaps_benchmark.c Fixed Show fixed Hide fixed
benchmarks/real_bitmaps_benchmark.c Fixed Show fixed Hide fixed
benchmarks/real_bitmaps_benchmark.c Fixed Show fixed Hide fixed
@andreas
Copy link
Contributor Author

andreas commented Jan 22, 2023

I've addressed the above warnings in f2db5d3.

The build failure is related to unaligned access, which I can replicate locally when building with cmake -DROARING_SANITIZE=on .. I assume it ties in with the conversation in #352 on misaligned values. I'm not sure whether this means it's acceptable or not. If acceptable, I can try add suppression for roaring_bitmap_portable_deserialize_frozen as described here.

@lemire
Copy link
Member

lemire commented Jan 24, 2023

@andreas Can you write a comment as part of the function definition that this function may execute unaligned memory accesses?

@lemire
Copy link
Member

lemire commented Jan 24, 2023

It would make a lot of sense to add __attribute__((no_sanitize("undefined"))) as a function attribute for this one function.

Do something like this...

#if defined(__GNUC__) || defined(__clang__)
#define ALLOW_UNDEFINED  __attribute__((no_sanitize("undefined")))
#else
#define ALLOW_UNDEFINED
#endif

in portability.h and then right before your function just add ALLOW_UNDEFINED. This should do it.

Note that this is only sane if you have documented the fact that the function does unaligned accesses. Basically, we are pushing the responsibility with the user. There are systems where an unaligned access would crash. It is possible for the compiler to make assumptions... and crash things... so we expect the user of the library to check that it does not crash.

@lemire
Copy link
Member

lemire commented Jan 24, 2023

Right, so we are getting these warnings which are expected...

/home/runner/work/CRoaring/CRoaring/src/roaring.c:3307:28: runtime error: member access within misaligned address 0x616000035e79 for type 'struct bitset_container_t', which requires 8 byte alignment
[643](https://github.com/RoaringBitmap/CRoaring/actions/runs/3978487097/jobs/6846981837#step:3:644)
0x616000035e79: note: pointer points here
[644](https://github.com/RoaringBitmap/CRoaring/actions/runs/3978487097/jobs/6846981837#step:3:645)
 be be be  be be be be be be be be  be be be be be be be be  be be be be be be be be  be be be be be
[645](https://github.com/RoaringBitmap/CRoaring/actions/runs/3978487097/jobs/6846981837#step:3:646)
              ^ 
[646](https://github.com/RoaringBitmap/CRoaring/actions/runs/3978487097/jobs/6846981837#step:3:647)

@lemire
Copy link
Member

lemire commented Jan 24, 2023

Let us run the tests.

@andreas
Copy link
Contributor Author

andreas commented Jan 24, 2023

Can you write a comment as part of the function definition that this function may execute unaligned memory accesses?

Added in 2b2d5a9.

It would make a lot of sense to add __attribute__((no_sanitize("undefined"))) as a function attribute for this one function. [...]

I've defined ALLOW_UNALIGNED and applied it to roaring_bitmap_portable_deserialize_frozen (c73b8d3). This only suppresses misaligned memory access (no_sanitize("alignment")), rather than undefined behavior altogether. However, the tests then fail with another error:

CRoaring/include/roaring/containers/bitset.h:237:20: runtime error: member access within misaligned address 0x616000036179 for type 'const bitset_container_t' (aka 'const struct bitset_container_s'), which requires 8 byte alignment
0x616000036179: note: pointer points here
 01 01 01  02 5b 55 00 00 be be be  be a0 08 22 0b 01 00 00  00 55 55 00 00 be be be  be a0 28 22 0b
              ^
SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use CRoaring/include/roaring/containers/bitset.h:237:20

Note that this is a new occurrence of the error. Basically it seems like I need to add ALLOW_UNALIGNED to all functions exercised by test_portable_deserialize_frozen for the tests to pass with sanitization on 😬 Is this an acceptable path forward? Do you have any other ideas on how to resolve this?

@lemire
Copy link
Member

lemire commented Jan 24, 2023

Is this an acceptable path forward? Do you have any other ideas on how to resolve this?

I think so, yes. Sadly, this might require the ALLOW_UNALIGNED to spread further than we'd like.

There is one more problem. Under Visual Studio, in debug mode, this may also fail... but they have an __unaligned qualifier that we might use, as needed. Sadly, it appears that we only test in release mode under Visual Studio. Let me change that.

@lemire
Copy link
Member

lemire commented Jan 24, 2023

Sorry, we also need to change ALLOW_UNALIGNED for CROARING_ALLOW_UNALIGNEDso that we don't clash with other systems.

And we would need to do something like...

#if CROARING_REGULAR_VISUAL_STUDIO
#define croaring_unaligned __unaligned
#else 
#define croaring_unaligned
#endif   

and then we will need to add croaring_unaligned as a qualifier to a bunch of pointers.

@lemire
Copy link
Member

lemire commented Jan 24, 2023

@lemire
Copy link
Member

lemire commented Jan 24, 2023

This is damn annoying, I realize it. It is more work than you'd think. :-)

@andreas
Copy link
Contributor Author

andreas commented Jan 24, 2023

Np, I'll merge master and add annotations as needed 👍

It looks like -fsanitize-ignorelist (docs) could be an alternative to in-source annotations. Let me know if you'd prefer this instead.

@andreas
Copy link
Contributor Author

andreas commented Jan 24, 2023

With the 10 ALLOW_UNALIGNED annotations in the diff below, the build passes locally for me with ROARING_SANITIZE=on:

https://gist.github.com/andreas/e7ad3a03eeaa1a30e270785699dd7588

Additionally, there would need to be annotations for Visual Studio. I'm wondering if we're better off just excluding the test from the sanitize builds?

@lemire
Copy link
Member

lemire commented Jan 25, 2023

Push your changes into this PR. If you have concerns, just open a second PR.

I can examine Visual Studio manually if needed.

We want to test with sanitizers because our users might do so. We cannot prevent folks form running our tests with sanitizers.

We don’t want to rely on the build system because we don’t know how CRoaring is built. Not everyone uses CMake.

@andreas
Copy link
Contributor Author

andreas commented Jan 25, 2023

Got it. Merged master and pushed changes to this branch.

@lemire
Copy link
Member

lemire commented Jan 25, 2023

@andreas Let us see what CI does.

@lemire
Copy link
Member

lemire commented Jan 26, 2023

Merging. This great work.

@tudor
Copy link
Contributor

tudor commented Jan 9, 2024

I sent a PR to add portable_deserialize_frozen to the C++ API: #544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants