Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

described enums numeric mode #866

Open
KostinPavel opened this issue Mar 3, 2023 · 81 comments
Open

described enums numeric mode #866

KostinPavel opened this issue Mar 3, 2023 · 81 comments

Comments

@KostinPavel
Copy link

KostinPavel commented Mar 3, 2023

Version of Boost

1.81

Description

Extraction from "underlying" value does not work. Also not covered are "signed underlying" and "unsigned underlying".

Steps necessary to reproduce the problem

#include <boost/describe.hpp>
#include <boost/json.hpp>
#include <boost/json/src.hpp>

namespace my_namespace
{
enum class enum_class_use_underlying_type : uint8_t
{
  var0 = 0,
  var1 = 1
};
BOOST_DESCRIBE_ENUM(enum_class_use_underlying_type)
}  // namespace my_namespace

#include <iostream>

int main()
{
  try {
    const my_namespace::enum_class_use_underlying_type
        enum_class_use_underlying_value {
            my_namespace::enum_class_use_underlying_type::var0};

    const boost::json::value json_value =
        boost::json::value_from(enum_class_use_underlying_value);

    std::cout << "enum_class_use_underlying_value: " << json_value
              << std::endl;  // OK

    const auto enum_class_use_underlying_value2 =
        boost::json::value_to<my_namespace::enum_class_use_underlying_type>(
            json_value);  // Bug

    return 0;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
  }
  return 1;
}

Output:

enum_class_use_underlying_value: 0
value is not a string [boost.json:30 at /usr/include/boost/json/detail/value_to.hpp:558:9 in function 'boost::json::result<T> boost::json::detail::value_to_impl(boost::json::try_value_to_tag<T>, const boost::json::value&, described_enum_conversion_tag)']

Because in "boost/json/detail/value_from.hpp" function value_from_helper correct use condition if( name ):

// described enums
template<class T>
void
value_from_helper(
    value& jv,
    T from,
    described_enum_conversion_tag)
{
    (void)jv;
    (void)from;
#ifdef BOOST_DESCRIBE_CXX14
    char const* const name = describe::enum_to_string(from, nullptr);
    if( name )
    {
        string& str = jv.emplace_string();
        str.assign(name);
    }
    else
    {
        using Integer = typename std::underlying_type< remove_cvref<T> >::type;
        jv = static_cast<Integer>(from);
    }
#endif
}

but in "boost/json/detail/value_to.hpp" function value_to_impl not using condition if( name ):

// described enums
template<class T>
result<T>
value_to_impl(
    try_value_to_tag<T>,
    value const& jv,
    described_enum_conversion_tag)
{
    T val = {};
    (void)jv;
#ifdef BOOST_DESCRIBE_CXX14
    error_code ec;

    auto str = jv.if_string();
    if( !str )
    {
        BOOST_JSON_FAIL(ec, error::not_string);
        return {system::in_place_error, ec};
    }

    if( !describe::enum_from_string(str->data(), val) )
    {
        BOOST_JSON_FAIL(ec, error::unknown_name);
        return {system::in_place_error, ec};
    }
#endif

    return {system::in_place_value, val};
}
@grisumbras
Copy link
Member

First, what is "'underlying' value"?

Second, at first I thought that I forgot to document that conversion happens from described enumerators. But I did document it. So, everything works as documented. Did you forget to describe the enumerators? What are you describing the enum for, if not for enumerators?

What is the behaviour you expected?

@KostinPavel
Copy link
Author

In declare: "enum class enum_class_use_underlying_type : uint8_t"
uint8_t - is underlying

@grisumbras
Copy link
Member

That's a type, not a value. What's the behaviour you expected?

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

Non-descriptive names should be serialized and deserialized as numbers.
Serialization works as expected, but deserialization doesn't work as expected.

@grisumbras
Copy link
Member

Ok, so currently the library behaves as documented. And, I guess you ask for a different behaviour?

I'll explain the rationale for our current behaviour. Library-provided conversions are generic and thus try to be the most logical for the types they support. It's logical that for described enums users want to convert to and from strings denoting enumerators.

Converting from a random number to some generic enum is problemtaic, because you cannot programmatically determine the allowed range of values for the enum (it's not always the full domain of the underlying type). Thus we decided to not allow such conversion.

Converting from an enum to its underlying type is never a problem, and we did not want to throw from value_from, so we decided to allow such a conversion.

So, now that you know the rationale, what's your rationale for describing an enum, but not its enumerators? If you want
to just represent enum as a number, you can always provide value_to/from overloads

namespace my_namespace
{

void value_from( ::boost::json::value_from_tag, value& jv, enum_class_use_underlying_type e )
{
    using UnderlyingType = std::underlying_type< enum_class_use_underlying_type >::type;
    jv = static_cast< UnderlyingType >(e);
}

enum_class_use_underlying_type
value_from( ::boost::json::value_to_tag<enum_class_use_underlying_type>, value const& jv )
{
    using UnderlyingType = std::underlying_type< enum_class_use_underlying_type >::type;
    auto const n = jv.to_number< UnderlyingType >();
    return static_cast<enum_class_use_underlying_type>(n);
}

}

@KostinPavel
Copy link
Author

All right. I have to use a solution similar to yours. Yes, I had to enter ranges and/or lists of valid values for "enum". But still there should be a built-in way for the non-string representation of "enum" to ensure compatibility with previous solutions.

@grisumbras
Copy link
Member

What previous solutions? We never provided a generic conversion for described (or any other kind of) enums with a different behaviour.

@KostinPavel
Copy link
Author

Suppose we are expecting a string, but there is a number. It is necessary to at least try to return a value, even without checking the validity of the value. Top-level code can additionally check if the value is correct.
While in "describe" I did not see the built-in possibility of such a solution.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

What previous solutions? We never provided a generic conversion for described (or any other kind of) enums with a different behaviour.

For example, the client and server parts may use different libraries. And for compatibility it is required to be able to work with "enum" as with numbers in JSON.

@grisumbras
Copy link
Member

Ok, so this does not sound like a generic conversion. This is a specific conversion for specific protocol. So, you will have to implement it yourself. Since you already have unearthed implementation for our described enum conversions, you can use it as the basis for your implementation.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

I have my own solution, and I have been successfully using it for a long time. But then I think that people would not be confused, it is necessary to remove the direct conversion to a number.
Then they will immediately understand that they need a specific solution for enums as numbers.

@grisumbras
Copy link
Member

I shared your concern when we implemented this. But currently we don't throw from our generic conversions to value (unless it's a memory allocation error), so we do not have a way to communicate the error to the user. Maybe I should add a note in the docs to explain the situation with described enums?

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

For example, you can do something like this:

// described enums
template<class T>
result<T>
value_to_impl(
    try_value_to_tag<T>,
    value const& jv,
    described_enum_conversion_tag)
{
    T val = {};
    (void)jv;
#ifdef BOOST_DESCRIBE_CXX14
    error_code ec;

    auto str = jv.if_string();
    auto num = jv.if_int64();
    auto unum = jv.if_uint64();
    if(str)
    {
      if( !describe::enum_from_string(str->data(), val) )
      {
          BOOST_JSON_FAIL(ec, error::unknown_name);
          return {system::in_place_error, ec};
      }
    }
    else if( num )
    {
       char const* const name = describe::enum_to_string(*num, nullptr);
       if( name )
       {
         BOOST_JSON_FAIL(ec, error::not_string);
         return {system::in_place_error, ec};
       }
       val =  static_cast<T>(*num);
    }
    else if( unum )
    {
       char const* const name = describe::enum_to_string(*unum, nullptr);
       if( name )
       {
         BOOST_JSON_FAIL(ec, error::not_string);
         return {system::in_place_error, ec};
       }
       val = static_cast<T>(*unum);
    }
    else
    {
        BOOST_JSON_FAIL(ec, error::exhausted_variants);
        return {system::in_place_error, ec};
    }

#endif

    return {system::in_place_value, val};
}

@KostinPavel
Copy link
Author

Worse, I think it's not being done. Those who need to check the values for validity will do this separately.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

Alternatively, you can add a #define that will serialize and deserialize as a number. In the #define description, you can specify features...
Or combine these two methods.

@grisumbras
Copy link
Member

This

val =  static_cast<T>(*num);

is undefined behaviour for many enums. And we don't have a way to check if it is or isn't.

@KostinPavel
Copy link
Author

Yes, I know. But it comes from compilers and language. You use pointers - although they can also fail. This feature is not easy to take into account - it should not be an obstacle to the operation of the code.

@KostinPavel
Copy link
Author

Someday the check will be built into the language, then everything will work as it should. And now it's stupid to stop the development of the code because of this reason.

@KostinPavel
Copy link
Author

Those who need more guarantees now can take care of it without your checking.

@grisumbras
Copy link
Member

No, it's completely different from pointers. There's no way to defend against UB in that case, thus we will not implement this behaviour.

The point of having generic conversions is that they are uncontroversial. Users can always provide their preferred conversions for their types.

@KostinPavel
Copy link
Author

Checks can and do for pointers as well. A value missing from the enum will not crash the program. This value will be treated as other values. Programmers in code may or may not handle other values. It's not bad - it's just a special case.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

the only thing to worry about is the size of the type. Might need to add
std::numeric_limits<std::underlying_type_t<T>>::min()
std::numeric_limits<std::underlying_type_t<T>>::max()

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

For example like this:

// described enums
template<class T>
result<T>
value_to_impl(
    try_value_to_tag<T>,
    value const& jv,
    described_enum_conversion_tag)
{
    T val = {};
    (void)jv;
#ifdef BOOST_DESCRIBE_CXX14
    error_code ec;

    auto str = jv.if_string();
    auto num = jv.if_int64();
    auto unum = jv.if_uint64();
    if(str)
    {
      if( !describe::enum_from_string(str->data(), val) )
      {
          BOOST_JSON_FAIL(ec, error::unknown_name);
          return {system::in_place_error, ec};
      }
    }
    else if( num )
    {
       char const* const name = describe::enum_to_string(*num, nullptr);
       if( name )
       {
         BOOST_JSON_FAIL(ec, error::not_string);
         return {system::in_place_error, ec};
       }
       if(*num < std::numeric_limits<std::underlying_type_t<T>>::max() || *num > std::numeric_limits<std::underlying_type_t<T>>::max())
       {
         BOOST_JSON_FAIL(ec, error::size_mismatch);
         return {system::in_place_error, ec};
       }       
       val =  static_cast<T>(*num);
    }
    else if( unum )
    {
       char const* const name = describe::enum_to_string(*num, nullptr);
       if( name )
       {
         BOOST_JSON_FAIL(ec, error::not_string);
         return {system::in_place_error, ec};
       }
       if(*unum < std::numeric_limits<std::underlying_type_t<T>>::max() || *unum > std::numeric_limits<std::underlying_type_t<T>>::max())
       {
         BOOST_JSON_FAIL(ec, error::size_mismatch);
         return {system::in_place_error, ec};
       }
       val = static_cast<T>(*unum);
    }

    BOOST_JSON_FAIL(ec, error::exhausted_variants);
    return {system::in_place_error, ec};

#endif

    return {system::in_place_value, val};
}

@grisumbras
Copy link
Member

There are actual compiler optimizations that take advantage of the allowed enum range. Your program may crash. It may behave erratically. That's what "undefined behaviour" means.

@KostinPavel
Copy link
Author

The above example addresses this issue.

@grisumbras
Copy link
Member

There's no point in explaining to me how to amend the current implementation to get this or that behaviour. I've wrote those functions, I know how to change them. But I'm telling you, we're not adding behaviour that silently enables UB.

@grisumbras
Copy link
Member

No, it does not

@grisumbras
Copy link
Member

enum E {a=1, b=2};
int n = 4;
E e = static_cast<E>(n); // UB

@KostinPavel
Copy link
Author

Explain to me, please, why?
What am I not considering?

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

enum E {a=1, b=2};
int n = 4;
E e = static_cast<E>(n); // UB

Please add std::numeric_limits<std::underlying_type_t>::min() std::numeric_limits<std::underlying_type_t>::max() test to this code.

@KostinPavel
Copy link
Author

Fine. I tried to justify all my thoughts. You decide. The developer is you.
My solution uses the described approach.
Whether or not this approach is available to others is up to you.

@KostinPavel
Copy link
Author

The only other way I can help is to make a branch with a pull request.

@grisumbras
Copy link
Member

There's no way to check if that conversion will be UB or not. If you know how to do it, please change the example I provided (the one with a crashing program) so that it does not crash. Then we can discuss whether your change is possible to integrate into the library. Things you've suggested so far would not have solved the problem, because they do not address the problem in any way.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

static_assert is almost the same as "throw" but the handler is not defined by the programmer. All you have to do is check the range when the values come in at runtime.
It is important that the proposal does not make it worse!

@grisumbras
Copy link
Member

I repeat, I can't check the range. Unless I'm missing something. Please, show me in CE how to do that, by making the program to not crash.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

#include <limits>
#include <cstdio>
#include <stdexcept>

enum E {a=1, b};

char const* message(E e) {
    switch( e ) {
    case a: return "a";
    case b: return "b";
    //case E(0): return "0"; // this programmer error use sanitazer to detect this
    //case E(3): return "3"; // this programmer error use sanitazer to detect this
    default: return "something else";
    }
}

int main(int argc, char**)
{    
    try
    {
    argc += 3;     
    if(argc < std::numeric_limits<std::underlying_type_t<E> >::min() || argc > std::numeric_limits<std::underlying_type_t<E> >::max())
    {
        throw std::runtime_error("overflow");
    }
    E e = static_cast<E>(argc);
    std::printf(message(e));
    } catch (const std::exception& e) {
        std::printf(e.what());        
    }
    return 0;
}

@KostinPavel
Copy link
Author

In this library, there is no need to iterate over the values in switch!

@grisumbras
Copy link
Member

grisumbras commented Mar 3, 2023

You did not catch the the value being outside of enum's range. You just changed the code so that UB would not manifest. So, you don't know how to check the range either. Of course, it's because there's no way to do that.

Just to reiterate, I will not add silent undefined behaviour to the library. Users are free to write their own conversions, which can have as much UB as they want. They also may solve the issue by only using enums with fixed underlying types. But as generic conversions have to be generic, there's no acceptable solution on our end.

@KostinPavel
Copy link
Author

I think you are wrong.
In the proposed solution, UB also remain on the side of the developer. But they do not need to finish your decision with their crutches.

@KostinPavel
Copy link
Author

The code remains transparent within the capabilities of the language and the compiler.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

Do you think someone expects that it will be converted to a number, but not back?
This is truly hidden behavior!

@grisumbras
Copy link
Member

It's defined by definition. Because it is documented. I will consider adding a note to make this more explicit.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

This is code from the library's user space!

char const* message(E e) {
    switch( e ) {
    case a: return "a";
    case b: return "b";
    //case E(0): return "0"; // this programmer error use sanitazer to detect this
    //case E(3): return "3"; // this programmer error use sanitazer to detect this
    default: return "something else";
    }
}

@KostinPavel
Copy link
Author

"-fstrict-enums" also set by the library user

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

example of a warning to a library user:

C:\msys64\mingw64\bin\g++.exe main.cpp -o d:\VSCode\CPP\enum_test\output\main.exe -Wall -Wextra -g3 

main.cpp: In function 'const char* message(E)':
main.cpp:10:5: warning: case value '0' not in enumerated type 'E' [-Wswitch]
   10 |     case E(0): return "0";
      |     ^~~~
main.cpp:11:5: warning: case value '3' not in enumerated type 'E' [-Wswitch]
   11 |     case E(3): return "3";
      |     ^~~~

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

Here, a crash. https://godbolt.org/z/s8sjqxqa6

The library will not protect against such an error.

Maybe a bug in https://godbolt.org/
Because it worked for me as expected.

D:\VSCode\CPP\enum_test\output> gcc --version
gcc.exe (Rev10, Built by MSYS2 project) 12.2.0
C:\msys64\mingw64\bin\g++.exe main.cpp -o d:\VSCode\CPP\enum_test\output\main.exe -std=c++20 -O3 -fstrict-enums 
PS D:\VSCode\CPP\enum_test> cd 'd:\VSCode\CPP\enum_test\output'
PS D:\VSCode\CPP\enum_test\output> & .\'main.exe'
3
PS D:\VSCode\CPP\enum_test\output>

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

Crashes because argc is not set or optimized in emulator in site. Try this.

int main(int argc, char**)
{
    argc = 0;
    E e = static_cast<E>(argc + 3);
    std::printf(message(e));
    return 0;
}

or this

int main(int argc, char**)
{
    argc = 10;
    E e = static_cast<E>(argc + 3);
    std::printf(message(e));
    return 0;
}

it Works.

@KostinPavel
Copy link
Author

Please let us know if it worked for you.

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

I also tried it:

#include <limits>
#include <cstdio>


enum E {a=1, b};

char const* message(E e) {
    switch( e ) {
    case a: return "a";
    case b: return "b";
    case E(0): return "0";
    case E(3): return "3";
    default: return "something else";
    }
}

int main(int argc, char**)
{
    argc = std::numeric_limits<int>::max();
    std::printf("%i\n",argc);
    std::printf("%i\n",argc+3);
    E e = static_cast<E>(argc + 3);
    std::printf(message(e));
    return 0;
}

It works too:

ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 0
2147483647
-2147483646
something else

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

I don't think you'll find a correct case that doesn't work.

@pdimov
Copy link
Member

pdimov commented Mar 3, 2023

This

val =  static_cast<T>(*num);

is undefined behaviour for many enums.

In this specific case not, because the value has been checked to have a corresponding enumerator (via enum_to_string).

@pdimov
Copy link
Member

pdimov commented Mar 3, 2023

enum E {a=1, b};

char const* message(E e) {
    switch( e ) {
    case a: return "a";
    case b: return "b";
    case E(0): return "0";
    case E(3): return "3";
    default: return "something else";
    }
}

This is OK because 3 is in the range of the enum (which is 0..3 in this case because it has to contain the values 1 and 2). But E(4) would be UB.

@pdimov
Copy link
Member

pdimov commented Mar 3, 2023

The rule is, if the value can't fit in the allowed range of the enum, behavior is undefined. The allowed range is: if the enum has a fixed underlying type, the range is that of the type; otherwise, the range is the range of the hypothetical integral type with the smallest number of bits that can fit all the enumerators. So

enum E1 { a = 1, b = 5 }; // range is 0..7
enum E2: int {}; // range is that of int
enum class E3: int {}; // range is that of int
enum class E4 {}; // range is that of int, because enum class always has a fixed underlying type

So in principle, we can check is_scoped_enum (which is implementable as in https://github.com/boostorg/endian/blob/develop/include/boost/endian/detail/is_scoped_enum.hpp), and there's no UB there for any underlying_type value.

But there's no reliable way to detect is_fixed_enum (E2 above.) I have a trait that kind of does (https://github.com/boostorg/endian/blob/feature/is-fixed-enum/include/boost/endian/detail/is_fixed_enum.hpp) but it requires C++17 and doesn't quite work correctly on all compilers.

And there's absolutely no way to obtain the range of E1 (except if it's correctly described, in which case we can compute it from the enumerator descriptors).

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

Perhaps you need to make implementations for each of the four options.

enum E1 { a = 1, b = 5 }; // range is 0..7 // ??? most compilers will choose a chunk of memory that is a multiple of a byte for this type
enum E2: int {}; // range is that of int //(std::is_scoped_enum) does not require discussion, since the solution is proposed 
enum class E3: int {}; // range is that of int // does not require discussion, since the solution is proposed
enum class E4 {}; // range is that of int, because enum class always has a fixed underlying type //does not require discussion, since the solution is proposed

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

It's a good idea to compare values from string descriptions, but it's not clear how to convert to a number then?
Maybe some kind of template specifier or define is required?

if( num )
{
       char const* const name = describe::enum_to_string(*num, nullptr);
       if( name )
       {
         ...

@pdimov
Copy link
Member

pdimov commented Mar 3, 2023

// ??? most compilers will choose a chunk of memory that is a multiple of a byte for this type

It's still undefined behavior. Bad things will happen if not today, tomorrow.

In fairness, -fsanitize=undefined doesn't diagnose this for both GCC and Clang, and it seems to be unspecified before C++17, not undefined.

@KostinPavel
Copy link
Author

to detect the numeric mode, you can use:

//default
template<typename T>
constexpr bool need_numeric_mode = false;

//specialization:
template<my_enum_type>
constexpr bool need_numeric_mode = true;

@KostinPavel
Copy link
Author

KostinPavel commented Mar 3, 2023

The structure of the code is something like this:

// described enums
template<class T>
result<T>
value_to_impl(
    try_value_to_tag<T>,
    value const& jv,
    described_enum_conversion_tag)
{
    T val = {};
    (void)jv;
#ifdef BOOST_DESCRIBE_CXX14
    error_code ec;

    if constexpr (need_numeric_mode<T>)
    {
        auto num = jv.if_int64();
        auto unum = jv.if_uint64();
        if( num )
        {
            char const* const name = describe::enum_to_string(*num, nullptr);
            if( !name ) // detection unnamed variants!!!
            {
                BOOST_JSON_FAIL(ec, error::value);
                return {system::in_place_error, ec};
            }
            val =  static_cast<T>(*num);
        }
        else if( unum )
        {
            char const* const name = describe::enum_to_string(*unum, nullptr);
            if( !name )
            {
                BOOST_JSON_FAIL(ec, error::value);
                return {system::in_place_error, ec};
            }
            val =  static_cast<T>(*unum);
        }
        else
        {
            BOOST_JSON_FAIL(ec, error::not_number);
            return {system::in_place_error, ec};
        }
    }
    else
    {
        auto str = jv.if_string();
        if(!str)
        {
            BOOST_JSON_FAIL(ec, error::not_string);
            return {system::in_place_error, ec};
        }
        if( !describe::enum_from_string(str->data(), val) )
        {
            BOOST_JSON_FAIL(ec, error::unknown_name);
            return {system::in_place_error, ec};
        }
    }

#endif

    return {system::in_place_value, val};
}

@KostinPavel
Copy link
Author

Something similar is needed when converting to a number.

@KostinPavel KostinPavel changed the title described enums underlying mode described enums numeric mode Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants