Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mem: Align data to natural alignment #416

Merged
merged 2 commits into from Jul 17, 2022

Conversation

Lastique
Copy link
Contributor

Ensure that the pointer returned by mem_alloc & co. is properly aligned similar to malloc on a given platform. This is important e.g. for SIMD processing, including libc string operations.

Also, made mem_realloc optimize away realloc call if there is enough storage in the allocated buffer. This only works if the input pointer is the only reference to the memory block.

Also, fixed formatting in mem_status.

@Lastique Lastique force-pushed the feature/mem_align_data branch 2 times, most recently from 8f3c320 to f3c6dac Compare June 25, 2022 17:11
@Lastique
Copy link
Contributor Author

Also, a question not directly related to this PR, but to mem. Would you be interested in a patch that makes mem::nrefs atomic?

src/mem/mem.c Outdated Show resolved Hide resolved
@Lastique
Copy link
Contributor Author

Lastique commented Jun 26, 2022 via email

@sreimers
Copy link
Member

Ok then I would say we should only align the header offset, looks like the max. diff between size and capacity is only 15 bytes and mem_realloc is mostly called with much higher diffs I think.

@Lastique
Copy link
Contributor Author

Sorry, I don't understand. The mem header is always located at the beginning of the buffer returned by malloc, which allows to pass the pointer to free. If you're going to move mem towards the data then you need additional metadata to find the malloc pointer. That requires space and gives nothing in return - the memory is still allocated by malloc either way and not being reused.

and mem_realloc is mostly called with much higher diffs I think.

Do you mean mem_realloc optimization is not useful and should be removed?

@sreimers
Copy link
Member

sreimers commented Jul 6, 2022

Do you mean mem_realloc optimization is not useful and should be removed?

Yes I think so.

src/mem/mem.c Show resolved Hide resolved
src/mem/mem.c Outdated Show resolved Hide resolved
src/mem/mem.c Outdated Show resolved Hide resolved
@alfredh
Copy link
Contributor

alfredh commented Jul 6, 2022

I think we have to be very careful with making changes here.

The mem system is used by almost all the code in re.

Please split up the PR into logical PRs.

@Lastique
Copy link
Contributor Author

Lastique commented Jul 7, 2022

Do you mean mem_realloc optimization is not useful and should be removed?

Yes I think so.

Ok. But then there's a correctness issue in mem_realloc, if it is called on a memory block with nrefs > 1. The previous implementation would realloc the memory, possibly changing the pointer to it. All references except the one used by the caller would become invalid and cause use-after-free, and since nrefs is still > 1, this new memory block will leak. I fixed this as part of the optimization (the memory block will be copied in this case). Should this fix remain?

@Lastique
Copy link
Contributor Author

Lastique commented Jul 7, 2022

I've removed mem_realloc optimization but kept the correctness fix.

@Lastique Lastique requested a review from sreimers July 7, 2022 11:35
@alfredh
Copy link
Contributor

alfredh commented Jul 9, 2022

we also have this problem that we should try to fix:

swscale: created SwsContext: 'yuv420p' 320 x 240 --> 'yuv420p' 642 x 488
[swscaler @ 0x7fc47af52000] Warning: data is not aligned! This can lead to a speed loss

the vidframe data is sometimes allocated by mem_alloc and it is used by FFmpeg.
FFmpeg recommend that a buffer is aligned to fit the SIMD optimalisations.
The alignment here is typically 16, 32, 64 bytes

Either we make mem_alloc aligned to this, or a new function for special alignment needs.

Regarding the struct mem overhead:

  1. it should be as small as possible in order to avoid wasting memory
  2. it should be as close to the natural alignment as possible (i.e. 8 bytes, 16 bytes, etc).

Regarding changes in the mem modules we have to be very careful here,
and analyze the impact and test the changes really well.
It is used by virtuall all the re code and has been rock solid for a decade.

We could also make a new testcase in retest that verifies the alignment from mem_alloc

@Lastique
Copy link
Contributor Author

Lastique commented Jul 9, 2022 via email

@alfredh
Copy link
Contributor

alfredh commented Jul 9, 2022

after some thinking, I think we can allocate aligned memory for struct vidframe,
so that it works with ffmpeg (16, 32, 64 etc).

for mem_alloc, the alignment should be the same as the malloc implementation,
i.e. 4 or 8 bytes.

most of the CPUs supported are either 32-bit or 64-bit:

32-bit: 4 bytes alignment
64-bit: 8 bytes alignment

by using types like function pointers and size_t in struct mem,
the size will fit naturally for both architectures.

For 64-bit the current overhead is 2x8=16 bytes.
We could change nrefs to uint32_t but it would still be a 4-bytes hole
at the end of the struct.

I dont understand the mem_realloc problem. Could you please write some demonstration code
or make a test in retest ?

@Lastique
Copy link
Contributor Author

Lastique commented Jul 9, 2022

most of the CPUs supported are either 32-bit or 64-bit:

32-bit: 4 bytes alignment 64-bit: 8 bytes alignment

This is not true, at least not on x86.

by using types like function pointers and size_t in struct mem,
the size will fit naturally for both architectures.

It doesn't matter which types you use, mem alignment will match the strictest alignment of each of its members (e.g. a pointer), and its size will be a multiple of its alignment.

We could change nrefs to uint32_t but it would still be a 4-bytes hole
at the end of the struct.

Not at the end of struct but after nrefs. Compilers do align members of the struct according to their alignment requirements, which means padding is introduced between members as needed.

The idea was to fit 32-bit size right after 32-bit nrefs so that there is no padding between size and dh and mem is exactly 16 bytes on a 64-bit target in the release mode. Currently, in this PR, it is 24 bytes, which is rounded up to 32 in order to align the data to 16 bytes.

I dont understand the mem_realloc problem. Could you please write some demonstration code or make a test in retest ?

void* p1 = mem_alloc(10, NULL);
assert(p1 != NULL);
void* p2 = mem_ref(p1);
p1 = mem_realloc(p1, 1000); // reallocs the buffer, p2 becomes dangling
assert(p1 != NULL);
mem_deref(p1); // leaks p1 as nrefs was 2
mem_deref(p2); // use after free

@sreimers
Copy link
Member

sreimers commented Jul 10, 2022

A quick test with retest and baresip shows we never call mem_realloc with multiple nrefs.
So I wonder if its enough to return NULL and print a warning (and consider this a bad use case/undefined behavior, that must be handled by the application):

diff --git a/src/mem/mem.c b/src/mem/mem.c
index 1f08e18c..74b33952 100644
--- a/src/mem/mem.c
+++ b/src/mem/mem.c
@@ -194,6 +194,11 @@ void *mem_realloc(void *data, size_t size)
 
        MAGIC_CHECK(m);
 
+       if (m->nrefs > 1) {
+               DEBUG_WARNING("realloc: called with multiple nrefs!\n");
+               return NULL;
+       }
+
 #if MEM_DEBUG
        mem_lock();

@Lastique
Copy link
Contributor Author

I think we did call it like that at some point (which originated the bug fix and the optimization), but we have removed all uses of mem_realloc since then. That is to say, we don't care much in our code what happens with mem_realloc, but it is possible other users might call it like that.

Returning NULL (i.e. reporting failure) is one option, but it seems unnecessary and not intuitive. I'd still prefer copying the data, but if you choose so, I can change to return NULL. Please let me know what you decide.

@Lastique
Copy link
Contributor Author

Of course, yet another option would be to remove mem_realloc completely.

Ensure that the pointer returned by mem_alloc & co. is properly aligned
similar to malloc on a given platform. This is important e.g. for SIMD
processing, including libc string operations.

Also, fixed mem_realloc behavior when there are multiple references to
the memory buffer being reallocated. In this case, the implementation
now allocates a new buffer and copies the contents and decrements the
reference counter on the original memory buffer. This way the other
references for the old buffer remain valid.

Also, fixed formatting in mem_status.
@Lastique Lastique force-pushed the feature/mem_align_data branch 2 times, most recently from 20e2ab2 to 051de66 Compare July 15, 2022 17:20
src/mem/mem.c Outdated Show resolved Hide resolved
src/mem/mem.c Outdated Show resolved Hide resolved
This allows to fit mem header in 16 bytes on 64-bit targets in
release build and save 16 bytes of overhead due to data alignment.
@sreimers sreimers merged commit 31ee066 into baresip:main Jul 17, 2022
@Lastique Lastique deleted the feature/mem_align_data branch July 17, 2022 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants