Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

void * vs char * vs uint8_t * #5497

Closed
kaspar030 opened this issue Jun 2, 2016 · 54 comments
Closed

void * vs char * vs uint8_t * #5497

kaspar030 opened this issue Jun 2, 2016 · 54 comments
Labels
Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Type: question The issue poses a question regarding usage of RIOT

Comments

@kaspar030
Copy link
Contributor

We do use the three inconsistently.

I would like to collect facts on what we know about each other, then make an informed decision on when to use what, then make the uses consistent.

@kaspar030 kaspar030 added Type: question The issue poses a question regarding usage of RIOT Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR labels Jun 2, 2016
@kaspar030
Copy link
Contributor Author

I'll start collecting what I know:

  • C allows use of any type pointer as void * and the other way around without casting
  • C using void* and char* has different semantics than using other pointer types regarding to strict aliasing rules. As uint8_t is just a typedef, is it equivalent to the others in that regard?
  • char c; if (c == 0xff) ... is always false, which leads to unexpected bugs

Subjective stuff:

  • using uint8_t when handling byte data feels more natural compared to char*
  • using void* improves readability a lot as no casting is needed
  • dealing with a mixed char * and uint8_t codebase sucks because of needed casts

@kaspar030
Copy link
Contributor Author

  • standard C code usually uses char* for everything string related (printf, strcmp/len/tok/...) and void* for everything where buffers are supplied (read/write/memcpy/memset/...)

@Kijewski
Copy link
Contributor

Kijewski commented Jun 2, 2016

Fun fact: char != signed char != unsigned char. Whether c == 0xff can be true is implementation defined.

@kYc0o
Copy link
Contributor

kYc0o commented Jun 2, 2016

I think we should stick to the common sense and try to be consistent with the standard C coding style.
For me a char* is always a pointer to string related (character buffers and so on), uint8_t* to integer buffers or addresses and void* to function pointers, undefined return type callbacks... I don't know what is the current use of such pointer types on RIOT though...

@kaspar030
Copy link
Contributor Author

kaspar030 commented Jun 2, 2016

Fun fact: char != signed char != unsigned char. Whether c == 0xff is implementation defined.

They revised that in C99:

C99 N1256 draft 6.2.5/15 "Types" has this to say about the signed-ness of type char:

    The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

and in a footnote:

    CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

So char for us (--std=c99) is either signed char or unsigned char, or is it really a third type despite having the same range as signed or unsigned char? edit it is a third type, just read my own quote. ;)

@kaspar030
Copy link
Contributor Author

For me a char* is always a pointer to string related (character buffers and so on),

What about if (c == 0xff)? When using char*, that code is not portable.

@kYc0o
Copy link
Contributor

kYc0o commented Jun 2, 2016

do you have a more concrete example for that case?

@kaspar030
Copy link
Contributor Author

[kaspar@booze src]$ cat test.c
#include <stdio.h>

int main(void)
{
    char c = 0xff;
    if (c==0xff) {
        printf("c==0xff\n");
    } else {
        printf("c!=0xff\n");
    }
    return 0;
}
[kaspar@booze src]$ gcc -std=c99 -Wall test.c -o testx
[kaspar@booze src]$ ./testx
c!=0xff
[kaspar@booze src]$

-pedantic would catch the assignment, though.

@kYc0o
Copy link
Contributor

kYc0o commented Jun 2, 2016

yeah I understand the behaviour, the thing is that I cannot see how char* can be affected by such statement... you think that could affect a pointer that is in the upper or lower limits of the value?

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

My usage of these types of parameters is atm:

  • char *: strings (and only strings),
  • void *: any buffer, type, thing that can be anything, I usually use that to type input or output data, but I also use
  • uint8_t * for byte-arrays, especially if in the function I iterate over the buffer or if I edit parts of it.

@kaspar030
Copy link
Contributor Author

how char* can be affected by such statement...

Ah, I mean for example if we'd use char* for incoming serial bytes, and wait for a control character (like, 0xff), this would need not always obvious casting when comparing to one of the "characters" the pointer points to.

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

Just to make my conventions a little clearer (and I argue that at least in my work I'm pretty consistent with that): unless the type is 100% clear (e.g. I expect an IPv6 header) I use void *. The only exception for code-readability and to not having to need to store a temporary variable is when I do byte-related work on the data (e.g. iteration for checksums) in the function. Of the c == 0xff I did not know before, but I'm not sure either if I ran into any problems with that with uint8_t *. char * I only use in string related contexts (e.g. ipv6_addr_to_str()).

@kaspar030
Copy link
Contributor Author

kaspar030 commented Jun 2, 2016

Of the |c == 0xff| I did not know before, but I'm not sure I either I
ran into any problems with that with |uint8_t *|. [sic]

Doesn't affect uint8_t as 0xff is a perfectly valid value there.

@kaspar030
Copy link
Contributor Author

kaspar030 commented Jun 2, 2016 via email

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

unless the type is 100% clear (e.g. I expect an IPv6 header) I use |void *|.

Do you mean the expected type within the function or at the call site?
e.g., for UART, SPI, ... the callee always expects a byte array, but the
caller might send any type.

Both, but primarily on the caller side. A case were it even isn't that clear for the callee would be netdev_driver_t::get()

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

You do monospace by using foobar btw ;-) (edit: backticks in backticks is a little tricky ^^)

@kaspar030
Copy link
Contributor Author

kaspar030 commented Jun 2, 2016

You do monospace by using ````` btw ;-)

does it work when replying via email: foobar? edit nope, thought so. ;(

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

No, doesn't seem like it.

@kaspar030
Copy link
Contributor Author

kaspar030 commented Jun 2, 2016

Both, but primarily on the caller side.

So whenever a buffer (which is equivalent to byte array) is expected, we'd use void*, even if within the function we iterate over it? That would basically mean we use uint8_t as type for byte arrays but void* to pass them around, unless we're 100% sure that the caller will pass a definitely untyped byte array, in which case we'd use uint8_t* for the parameter type?

@LudwigKnuepfer
Copy link
Member

I would opt for keeping all three around (and use them reasonably), too.

@LudwigKnuepfer
Copy link
Member

PS:
If anything I'd lose the void * - explicit casting is more reasonable in general.

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

If anything I'd lose the void * - explicit casting is more reasonable in general.

👎 in any case I use it anything else doesn't make sense.

So whenever a buffer (which is equivalent to byte array) is expected, we'd use void*, even if within the function we iterate over it?

Huh? That was exactly the opposite of what I said, nor do I propose anything. I use uint8_t * when a byte-array is expected by the caller because e.g. it want's to iterate over it.

That would basically mean we use uint8_t as type for byte arrays but void* to pass them around, unless we're 100% sure that the caller will pass a definitely untyped byte array, in which case we'd use uint8_t* for the parameter type?

For the first part I agree, but I'm not quite sure what we need to be "100% sure". When the situation arises, one look into the function's code makes clear, why I used a uint8_t at that point.

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

Oh btw, just remember another thing I generally use uint8_t * for: link-layer addresses.

@kaspar030
Copy link
Contributor Author

Huh? That was exactly the opposite of what I said, nor do I propose anything.

I know. ;) But you also said that you'd make the decision primarily based on the caller side expected type, which is contradicting. I thought we could clear that up.

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

I reject your reality and substitute my own ;-).

No but seriously, I thought it was clear: When the type from the caller can be expected to be anything, than I use void *. That's what I wanted to say, but maybe I wasn't clear enough 💃

@kaspar030
Copy link
Contributor Author

I feel we're getting somewhere:

  • use char* for strings and only for strings, because C + libc do it that way
  • use uint8_t as type for untyped byte buffers, but use void * to pass them around. uint8_t because we're dealing with bytes and not characters, void* to avoid unnecessary casting
  • use uint8_t* to pass "typed" byte buffers, e.g., link-layer addresses, where it avoids unnecessary temporary variable
  • break those guidelines wherever reasonable ;)

@miri64
Copy link
Member

miri64 commented Jun 2, 2016

Is not wanting to convert all the usages of msg.content.ptr a reasonable reason to break these guidelines? Because I made multiple efforts to do so, but its just too much ^^.

@kaspar030
Copy link
Contributor Author

@Stargateur @liuziangexit of course you're right, C defines CHAR_BIT >= 8.

Is anyone trying to use RIOT on such a platform?

@Stargateur
Copy link

Stargateur commented Jul 1, 2018

@kaspar030 I was just talking about semantic, if uint8_t is available, CHAR_BIT would be 8. But keep in mind that uint8_t do not allow you to alias other type contrary to char.

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>

bool check(uint8_t *a, uint32_t *b)
{
    *a = 5;
    *b = 6;
    if (*a == 5) {
        return true;
    }
    return false;
}

int main(void)
{
    uint32_t foo;
    // This is undefined behavior
    printf("%s", check((uint8_t *)&foo, &foo) ? "true" : "false");
}

However a lot of compiler do not optimize this cause a lot of people think uint8_t is a character type. But nothing in the standard prevent compiler to return true for this snipped.

@liuziangexit
Copy link

liuziangexit commented Jul 1, 2018 via email

@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@kaspar030
Copy link
Contributor Author

But keep in mind that uint8_t do not allow you to alias other type contrary to char.

I thought "uint8_t" doesn't have to be a character type. It is when typedef'ed to "(signed) char", right?

@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Aug 19, 2019
@Stargateur
Copy link

But keep in mind that uint8_t do not allow you to alias other type contrary to char.

I thought "uint8_t" doesn't have to be a character type. It is when typedef'ed to "(signed) char", right?

Yes but you will rely on implemented behavior, in my projet I avoid that, and personally I think it's a very bad way to implement uint8_t, this disallow a lot of optimization.

@kaspar030
Copy link
Contributor Author

Yes but you will rely on implemented behavior, in my projet I avoid that, and personally I think it's a very bad way to implement uint8_t, this disallow a lot of optimization.

Just to be clear, it only disallows optimization in case uint8_t is not a typedef'ed char?

@Stargateur
Copy link

Yes but you will rely on implemented behavior, in my projet I avoid that, and personally I think it's a very bad way to implement uint8_t, this disallow a lot of optimization.

Just to be clear, it only disallows optimization in case uint8_t is not a typedef'ed char?

no that the contrary, that will not allow optimisation, if uint8_t is a typedef to char, the compiler will assume the data can alias anything, so it will disallow optimization on some case. Like a function that take a pointer to structure and a pointer to char, the compiler assume both could be alias, so if uint8_t is a typedef to char the same rule will apply. That why I don't like this way of implementation. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110, we can see that gcc make a bad decision but as always they stick with it.

You can see https://stackoverflow.com/questions/57259126/why-does-the-rust-compiler-not-optimize-code-assuming-that-two-mutable-reference, Rust should assume nothing alias on most case too. But does not currently.

@stale
Copy link

stale bot commented Feb 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Feb 23, 2020
@miri64 miri64 removed the State: stale State: The issue / PR has no activity for >185 days label Feb 23, 2020
@stale
Copy link

stale bot commented Aug 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 27, 2020
@stale stale bot closed this as completed Sep 27, 2020
@miri64 miri64 reopened this Sep 27, 2020
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Sep 27, 2020
@stale
Copy link

stale bot commented Jun 3, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Jun 3, 2021
@stale stale bot closed this as completed Jul 8, 2021
@miri64 miri64 reopened this Jul 8, 2021
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Jul 8, 2021
@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Mar 2, 2022
@miri64 miri64 removed the State: stale State: The issue / PR has no activity for >185 days label Mar 2, 2022
@maribu
Copy link
Member

maribu commented May 17, 2023

Looking at the code base and recent changes, there seems to be a consensus now:

  • Use void * as return type / argument type for opaque data (such as data to transmit, store, read, encode as base64, ...) - which is consistent with the standard C lib and POSIX
  • Use char * for strings only
  • Use uint8_t [] for network addresses

If anyone disagrees, please reopen.

@maribu maribu closed this as completed May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Type: question The issue poses a question regarding usage of RIOT
Projects
None yet
Development

No branches or pull requests

9 participants