Non-C enums #428

lerno · 2022-04-12T20:49:11Z

Currently enums use (and allow) C style enum value definition:

enum Foo
{
    BAR = 123,
    BAZ = 44,
}

With associated values, an enum may look like this:

enum Foo : int (int offset, char* extra_name, double x)
{
	BAZ(12, "hello", 3.0),
	BOO(33, "oekfe", 4.0) = 3,
}

Here we can do things like:

Foo f = (Foo)3;
int x = f.offset; // 33
puts(f.extra_name); // Prints "oekfe"
int ordinal = f.ordinal; // 1

Some things are a bit problematic: the associated values need either compile to an array indexed by ordinal, or a function with a switch statement. Neither is particularly fast. It's also easy to get invalid values with non-ordinals.
The mapping between ordinal <=> value for enums with defined ordinal values need either be a hash map or a switch lookup, again being a fairly unattractive solution speed wise.

In the compiler too, care must be taken since something like (Foo)123 should be valid even if there is no such enum value defined. It can't be assumed to be incorrect under C semantics.

The alternative here is to leverage the associated values for non-C enums while retaining the ability to do much of what C offers:

For our example above, we would instead define this:

enum Foo (int value, int offset, char* extra_name, double x)
{
	BAZ(0, 12, "hello", 3.0),
	BOO(3, 33, "oekfe", 4.0),
}

As we see, the user defined ordinal values are now one of the associated values. We can easily construct a macro that using linear search looks for an enum by value (more optimized versions can of course be made):

Foo f = @enum_find(Foo, value, 3); // Will return Foo.BOO

Even better we can allow it to return an optional:

Foo! f = @enum_find(Foo, value, 3); // Will return Foo.BOO

Which is much safer than just doing (Foo)some_int.

This allows us a better distinction between serialized "stable" versions of an enum and the ordinal.

Actually writing searches over enums is easy thanks to foreach in C3:

foreach (Foo f : Foo.elements) { /* search */ }

The advantage here is also more solid semantics, with enums mapping directly to ordinals, enforcing more robust conversions and in general being more well defined.

The disadvantage is that it adds a little friction when converting from C.

Something like

typedef enum
{
   INVALID = -1,
   RUNNING = 0,
   SHUTDOWN = 1,
} State;

Would need to - if values are important - be converted to:

enum State (int code)
{
    INVALID(-1),
    RUNNING(0),
    SHUTDOWN(1),
}

And then use state.code when interfacing with C, rather than using just state

The text was updated successfully, but these errors were encountered:

sirwhinesalot · 2022-04-13T16:17:53Z

So one very strong point about this solution is that foreach loop, not only is it efficient since it's just iterating over the arrays, it lets enums be what they're supposed to be, enumerations. Efficiency is pretty good too, the indirection can be optimized out most of the time and the additional space wasted is irrelevant in the grand scheme of things (who has enums with thousands of cases? Even then it would be nothing). C interoperability is easy too, as mentioned.

This to me is the right way to implement enums, the C style is wrong, and the Rust/Swift style is even worse (they're tagged unions / sum types / variant types, not enumerations).

The find macro, which I personally find unnecessary, doesn't cause any issues and doesn't interfere with any other language features, which is nice. But my only note on it is that the return type of that macro should be an array or iterator of some kind, not a single value, since multiple members of the enumeration may share the same value in the same field.

data-man · 2022-04-13T16:33:49Z

who has enums with thousands of cases?

It's me. :)

enum with all colors from Wikipedia's lists of colors
enum with all Unicode codepoints

sirwhinesalot · 2022-04-13T16:43:03Z

Hahaha, well, guess there's always one case where it is needed. For the unicode codepoints, there are 16 bits worth of values per plane (and 17 planes), so 1,114,112 total possible codepoints. With either the C solution or this solution, the codepoint is technically the ordinal of the enum, no associated value is necessary. So the associated data would be empty and no array would be generated.

Colors though, there's are only like... 7 colors, what is wikipedia talking about.... Alabaster? Man what are artists smoking these days. Assuming that the color is desired though, having associated RGB values is probably quite useful I'm guessing.

lerno · 2022-04-13T17:03:35Z

Another drawback that should be noted is API stability in the case it's stored in a struct:

enum Abc
{
   FOO,
   BAR
}   
struct Foo
{
    Abc x;
}

However one should also note that structs are not stable across versions if their alignment, size or field order changes either. API stability is an important aspect that perhaps requires its separate issue.

sirwhinesalot · 2022-04-13T17:33:17Z

API stability is definitely its own challenge. If your enum ordinals are always size_t sized then at least the struct remains the same size across versions, assuming you only store the ordinal (and you should only store the ordinal, the data is in a global array). So here I don't think storing them in structs causes any specific issues, beyond the standard issues with structs.

A bigger issue is the assumption that enums are total, i.e. that any switch statement that covers all enum cases doesn't need a default case. Rust has a special annotation to prevent this to allow for future-proofing.

lerno · 2022-04-13T18:38:31Z

@sirwhinesalot If ordinals are strictly ordered, I find that it makes sense to define them to be char by default, and only make it bigger if explicitly asked for. An alternative to annotation on the type is to have annotation on the switch. Both have advantages.

sirwhinesalot · 2022-04-13T19:58:34Z

I see the char thing as an optimization, it makes sense but is bad for future-proofing (as discussed). I have no hard thoughts on the matter, dynamic library compatibility across versions is very hard unless you go fully dynamic.

For the switch, my preference is on the type, because if you make the annotation on the switch, it's easy to trigger a bug by passing a new enum case to an older version of a function that has no branch to deal with it. To avoid that you at the very least need to track in the function signature that it only deals with one specific version of the enum, not just on the switch itself. It gets messy fast and is not worth it.

On a language with structural types, you could track in the function signature exactly which cases it expects, with an "..." to say that it can also accept arbitrary cases to some extent, similar to row-polymorphic variants in ocaml. But now you're in fancy language feature territory.

lerno · 2022-04-14T08:49:39Z

struct FooData
{
  int value;
  int offset;
  char* extra_name,
  double x;
}
enum Foo : FooData
{
    BAZ = { 0, 12, "hello", 3.0 }
    BOO = { 3, 33, "oekfe", 4.0 },
}

I'm just going to leave this alternative syntax here, which I'm not fond of. There is a single advantage and that is that this would allow simple enums to have a built in "byValue" function, which does a comparison on the value (something that would not apply to structs)

I also dislike this syntax because it does not seem to scale well and leaves the enum definition outside of the enum.

lerno · 2022-05-07T19:47:34Z

So at this point I've gone ahead with working on the Non-C enum. For the case where enums need to be stable, consts, typed and wrapped in a sub module is preferred:

// Old:
enum Errno : ErrnoType
{
  EPERM = 1,
  ENOENT = 2
  ...
}
// New
module libc::errno;
define Errno = distinct ErrnoType;

const Errno EPERM = 1;
const Errno ENOENT = 2;
const Errno ESRCH = 3;

This seems to create a good distinction between grouped constants and enums in practice. As can be seen, there is still type safety, while at the same time there's API stability when using them.

Incidentally, enums becoming integral values means they are similar to fault so long term might be conflated with fault, allowing one to use:

enum Foo : anyerr
{
  ...
}

For optional returns values, which just wasn't reasonable while enums where still C style.

lerno · 2022-05-07T19:51:42Z

The following rules / functions may possibly be available:

Foo f1 = (Foo)some_value; // Panic if out of range in safe, otherwise UB-ish
Foo! f2 = Foo.from(some_value); // Err is missing. Should possibly be written as a macro
int x = (int)f1; // Always ok
int y = foo.ordinal; // Same as the cast
int z = Foo.count; // Number of enums
Foo[*] foos = Foo.elements; // All enums in an array
// Alternatively call it Foo.all

lerno · 2022-05-11T18:55:39Z

Implemented.

lerno closed this as completed May 11, 2022

lerno mentioned this issue Feb 13, 2024

enum with values / distinct const #1129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-C enums #428

Non-C enums #428

lerno commented Apr 12, 2022

sirwhinesalot commented Apr 13, 2022 •

edited

Loading

data-man commented Apr 13, 2022

sirwhinesalot commented Apr 13, 2022

lerno commented Apr 13, 2022

sirwhinesalot commented Apr 13, 2022

lerno commented Apr 13, 2022

sirwhinesalot commented Apr 13, 2022

lerno commented Apr 14, 2022 •

edited

Loading

lerno commented May 7, 2022

lerno commented May 7, 2022

lerno commented May 11, 2022

Non-C enums #428

Non-C enums #428

Comments

lerno commented Apr 12, 2022

sirwhinesalot commented Apr 13, 2022 • edited Loading

data-man commented Apr 13, 2022

sirwhinesalot commented Apr 13, 2022

lerno commented Apr 13, 2022

sirwhinesalot commented Apr 13, 2022

lerno commented Apr 13, 2022

sirwhinesalot commented Apr 13, 2022

lerno commented Apr 14, 2022 • edited Loading

lerno commented May 7, 2022

lerno commented May 7, 2022

lerno commented May 11, 2022

sirwhinesalot commented Apr 13, 2022 •

edited

Loading

lerno commented Apr 14, 2022 •

edited

Loading