Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fake warning on Fedora 34 (in fact newer gcc version 11.2.1 20210728) #189

Closed
fcorbelli opened this issue Aug 10, 2021 · 12 comments
Closed

Comments

@fcorbelli
Copy link

I found "classical" fake warning on newer gcc

[root@fedora 47]# make
g++ -Dunix -O3 -march=native zpaqfranz.cpp -o zpaqfranz -pthread -static
In function ‘size_t compress_parents_parallel(const uint8_t*, size_t, const uint32_t*, uint8_t, uint8_t*)’,
    inlined from ‘void compress_subtree_to_parent_node(const uint8_t*, size_t, const uint32_t*, uint64_t, uint8_t, uint8_t*)’ at zpaqfranz.cpp:9425:34,
    inlined from ‘void blake3_hasher_update.part.0(blake3_hasher*, const void*, size_t)’ at zpaqfranz.cpp:9609:38:
zpaqfranz.cpp:9313:11: warning: ‘void* memcpy(void*, const void*, size_t)’ writing 32 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
 9313 |     memcpy(&out[parents_array_len * BLAKE3_OUT_LEN],
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 9314 |            &child_chaining_values[2 * parents_array_len * BLAKE3_OUT_LEN],
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 9315 |            BLAKE3_OUT_LEN);
      |            ~~~~~~~~~~~~~~~
zpaqfranz.cpp: In function ‘void blake3_hasher_update.part.0(blake3_hasher*, const void*, size_t)’:
zpaqfranz.cpp:9422:11: note: at offset 32 into destination object ‘out_array’ of size 32
 9422 |   uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
      |           ^~~~~~~~~
In function ‘void compress_subtree_to_parent_node(const uint8_t*, size_t, const uint32_t*, uint64_t, uint8_t, uint8_t*)’,
    inlined from ‘void blake3_hasher_update.part.0(blake3_hasher*, const void*, size_t)’ at zpaqfranz.cpp:9609:38:
zpaqfranz.cpp:9426:11: warning: ‘void* memcpy(void*, const void*, size_t)’ reading between 64 and 96 bytes from a region of size 32 [-Wstringop-overread]
 9426 |     memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zpaqfranz.cpp: In function ‘void blake3_hasher_update.part.0(blake3_hasher*, const void*, size_t)’:
zpaqfranz.cpp:9422:11: note: source object ‘out_array’ of size 32
 9422 |   uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
      |           ^~~~~~~~~

may I suggest some code refactoring (just to do not get those spurious warnings)?

It happens quite frequently that I have to change perfectly functional code so as not to have to "fight" with compilers often full of bugs in detecting "memory" errors.
It's not that important, just a hint

PS Newer Linux-based g++ runs BLAKE3 two time faster then "not so old" g++, on AMD.
Yes, two time faster, not 2%!

@fcorbelli
Copy link
Author

fcorbelli commented Aug 12, 2021

By the way I made two candidate fixes
pointer math instead of []

INLINE size_t compress_parents_parallel(const uint8_t *child_chaining_values,
                                        size_t num_chaining_values,
                                        const uint32_t key[8], uint8_t flags,
                                        uint8_t *out)

The compiler does not like out accessed as a vector (how big? Maybe only 0 bytes-long... WARNINGGGGGGGG!!!)

  if (num_chaining_values > 2 * parents_array_len) {
	  /*
	  franzfix to avoid strange warning
	  */
	  
	  memcpy(&out+(parents_array_len * BLAKE3_OUT_LEN),
           &child_chaining_values+(2 * parents_array_len * BLAKE3_OUT_LEN),
           BLAKE3_OUT_LEN);
/*
    memcpy(&out[parents_array_len * BLAKE3_OUT_LEN],
           &child_chaining_values[2 * parents_array_len * BLAKE3_OUT_LEN],
           BLAKE3_OUT_LEN);
*/

A bigger buffer, just to make sure the compiler is happy with MAX_SIMD_DEGREE 1

// If MAX_SIMD_DEGREE is greater than 2 and there's enough input,
  // compress_subtree_wide() returns more than 2 chaining values. Condense
  // them into 2 by forming parent nodes repeatedly.
  uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN /*franzfix / 2*/];
  while (num_cvs > 2) {
    num_cvs =
        compress_parents_parallel(cv_array, num_cvs, key, flags, out_array);
    memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
  }```

@oconnor663
Copy link
Member

Related to: #183

Thanks for digging into this. I'm realizing now that the warnings aren't related to the compiler version, but instead to the value of MAX_SIMD_DEGREE set in blake3_impl.h. With the following diff, I can get the warnings to repro on regular x86-64 Arch Linux:

diff --git a/c/blake3_impl.h b/c/blake3_impl.h
index 86ab6aa..0bf8c5e 100644
--- a/c/blake3_impl.h
+++ b/c/blake3_impl.h
@@ -45,13 +45,7 @@ enum blake3_flags {
 #include <immintrin.h>
 #endif
 
-#if defined(IS_X86)
-#define MAX_SIMD_DEGREE 16
-#elif defined(BLAKE3_USE_NEON)
-#define MAX_SIMD_DEGREE 4
-#else
 #define MAX_SIMD_DEGREE 1
-#endif
 
 // There are some places where we want a static size that's equal to the
 // MAX_SIMD_DEGREE, but also at least 2.
❯ gcc -O3 -o example example.c blake3.c blake3_dispatch.c blake3_portable.c blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S blake3_avx512_x86-64_unix.S
In function ‘compress_parents_parallel’,
    inlined from ‘compress_subtree_to_parent_node’ at blake3.c:350:9,
    inlined from ‘blake3_hasher_update.part.0’ at blake3.c:534:7:
blake3.c:238:5: warning: ‘memcpy’ writing 32 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
  238 |     memcpy(&out[parents_array_len * BLAKE3_OUT_LEN],
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  239 |            &child_chaining_values[2 * parents_array_len * BLAKE3_OUT_LEN],
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  240 |            BLAKE3_OUT_LEN);
      |            ~~~~~~~~~~~~~~~
blake3.c: In function ‘blake3_hasher_update.part.0’:
blake3.c:347:11: note: at offset 32 into destination object ‘out_array’ of size 32
  347 |   uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
      |           ^~~~~~~~~
In function ‘compress_subtree_to_parent_node’,
    inlined from ‘blake3_hasher_update.part.0’ at blake3.c:534:7:
blake3.c:351:5: warning: ‘memcpy’ reading between 64 and 96 bytes from a region of size 32 [-Wstringop-overread]
  351 |     memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: In function ‘blake3_hasher_update.part.0’:
blake3.c:347:11: note: source object ‘out_array’ of size 32
  347 |   uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
      |           ^~~~~~~~~

A value of 2 also seems to work to repro this. I haven't gotten a chance to play with your proposed fixes yet, and I'll follow up again once I do.

@oconnor663
Copy link
Member

oconnor663 commented Aug 12, 2021

EDIT: Woops, this whole comment is a silly mistake, see below.

Also notably, if I compile with the diff above and -fsanitize=address, I do get a crasher. An input of size 2047 works fine:

❯ head -c 2047 /dev/zero | ./example
5bea1ede30f4389bdeac72799d266dd054d35a3eb89e154d217bd582a21fd0c2

But an input of size 2048 fails:

❯ head -c 2048 /dev/zero | ./example
=================================================================
==13063==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd2b7b9a58 at pc 0x5568611217cd bp 0x7ffd2b7b9790 sp 0x7ffd2b7b9780
WRITE of size 8 at 0x7ffd2b7b9a58 thread T0
    #0 0x5568611217cc in compress_chunks_parallel /home/jacko/BLAKE3/c/blake3.c:181
    #1 0x5568611217cc in blake3_compress_subtree_wide /home/jacko/BLAKE3/c/blake3.c:274
    #2 0x55686112c802 in compress_subtree_to_parent_node /home/jacko/BLAKE3/c/blake3.c:341
    #3 0x55686112c802 in blake3_hasher_update /home/jacko/BLAKE3/c/blake3.c:534
    #4 0x55686112140e in main /home/jacko/BLAKE3/c/example.c:14
    #5 0x7f121ace0b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #6 0x55686112122d in _start (/home/jacko/BLAKE3/c/example+0x222d)

Address 0x7ffd2b7b9a58 is located in stack of thread T0 at offset 56 in frame
    #0 0x5568611215e6 in blake3_compress_subtree_wide /home/jacko/BLAKE3/c/blake3.c:268

So GCC might be onto something here. If I change MAX_SIMD_DEGREE to 2, then the cutoff for the same effect moves to 4095/4096 bytes of input.

@oconnor663
Copy link
Member

Ah woops, silly mistake on my part. The above is a crasher because the diff is bad. The blake3_simd_degree function obviously must not return a value greater than MAX_SIMD_DEGREE, so that function needs to be patched at the same time. With a diff like this, I still get the compiler warnings, but I don't get the crasher:

diff --git a/c/blake3_dispatch.c b/c/blake3_dispatch.c
index 6518478..278a299 100644
--- a/c/blake3_dispatch.c
+++ b/c/blake3_dispatch.c
@@ -245,6 +245,7 @@ void blake3_hash_many(const uint8_t *const *inputs, size_t num_inputs,
 
 // The dynamically detected SIMD degree of the current platform.
 size_t blake3_simd_degree(void) {
+  return 1;
 #if defined(IS_X86)
   const enum cpu_feature features = get_cpu_features();
   MAYBE_UNUSED(features);
diff --git a/c/blake3_impl.h b/c/blake3_impl.h
index 86ab6aa..0bf8c5e 100644
--- a/c/blake3_impl.h
+++ b/c/blake3_impl.h
@@ -45,13 +45,7 @@ enum blake3_flags {
 #include <immintrin.h>
 #endif
 
-#if defined(IS_X86)
-#define MAX_SIMD_DEGREE 16
-#elif defined(BLAKE3_USE_NEON)
-#define MAX_SIMD_DEGREE 4
-#else
 #define MAX_SIMD_DEGREE 1
-#endif
 
 // There are some places where we want a static size that's equal to the
 // MAX_SIMD_DEGREE, but also at least 2.

@sneves
Copy link
Collaborator

sneves commented Aug 12, 2021

See also #94

@oconnor663
Copy link
Member

Ugh yeah, I think I understood more of this at one point and just forgot it in the meantime.

@oconnor663
Copy link
Member

@fcorbelli could you test out #191 in your environment and confirm that it silences these warnings?

@fcorbelli
Copy link
Author

fcorbelli commented Aug 13, 2021

In fact

  1. warnings are compiler-version related. The newer one does, the older does not

  2. the compiler... is right

  3. with just this (#define MAX_SIMD_DEGREE 1)

 assert(num_cvs <= MAX_SIMD_DEGREE_OR_2);
  uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
  while (num_cvs > 2) {
    num_cvs =
        compress_parents_parallel(cv_array, num_cvs, key, flags, out_array);
    memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
  }

the warning is triggered.
Why?
Because MAX_SIMD_DEGREE_OR_2 =>2 * 32 / 2 == 32 (the out_array FIXED size)
But num_cvs is variable and every chunk is 32-bytes-long. So num_cvs is at least 3, then can write 3*32= 96 bytes (bigger than 32).

And

  memcpy(out, cv_array, 2 * BLAKE3_OUT_LEN);

write 64-bytes (2*32) into a 32 bytes long size.

Therefore, for me, a bigger out_array is needed (the last memcpy warning) and better a variable-sized one (num_cvs is not fixed).

=> some reworking is needed, not just an assert()

$ make
g++ -O3  zpaqfranz.cpp -o zpaqfranz -pthread -static
In function 'void compress_subtree_to_parent_node(const uint8_t*, size_t, const uint32_t*, uint64_t, uint8_t, uint8_t*)',
    inlined from 'void blake3_hasher_update.part.0(blake3_hasher*, const void*, size_t)' at zpaqfranz.cpp:9621:38:
zpaqfranz.cpp:9438:11: warning: 'void* memcpy(void*, const void*, size_t)' reading between 64 and 96 bytes from a region of size 32 [-Wstringop-overflow=]
 9438 |     memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

=> something like this (entering num_cvs in the array size)

  uint8_t out_array[num_cvs*MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];

(just a snippet, because warning: ISO C++ forbids variable length array 'out_array').
malloc() and free() or whatever

=> and num_cvs is changed inside the loop, so what is the MAX num_cvs (to pre-allocate the array)?

@fcorbelli
Copy link
Author

The second (first) warning is, for me, a spurious one, just because the compiler cannot "know" how big is the memory @ uint8_t out, accessed by [], switching from "[" to "+(" (yes, very quick and very dirty, but much bigger changes are needed if you define a 32-bytes-long pointer structure etc etc instead of the uint8_t)

Therefore some pointer-math is my suggestion (some side-effects? I do not think so)

@fcorbelli
Copy link
Author

Just to be clear something like this will not work

  // If MAX_SIMD_DEGREE is greater than 2 and there's enough input,
  // compress_subtree_wide() returns more than 2 chaining values. Condense
  // them into 2 by forming parent nodes repeatedly.
/*franzfix*/  
  uint8_t *out_array;
  
  out_array=(uint8_t*)malloc(num_cvs*sizeof(MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2));

  while (num_cvs > 2) {
    num_cvs =
        compress_parents_parallel(cv_array, num_cvs, key, flags, out_array);
/// num_cvs can grow or is assured always num_cvs (i)>num_cvs (i+1) ?
    memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
  }
  memcpy(out, cv_array, 2 * BLAKE3_OUT_LEN);
  free(out_array);

@fcorbelli
Copy link
Author

 while(num_cvs > 2 && num_cvs_is_reasonable) {
  // while (num_cvs > 2 && num_cvs <= MAX_SIMD_DEGREE_OR_2) {
    num_cvs =
        compress_parents_parallel(cv_array, num_cvs, key, flags, out_array);
    memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
  }

Does not work on 8.5, unless you delete the /2 in the array definition (or at least so it seems to me)

@oconnor663
Copy link
Member

I believe this is fixed by b8e2dda, but please reopen if I've missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants