Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-C enums #428

Closed
lerno opened this issue Apr 12, 2022 · 11 comments
Closed

Non-C enums #428

lerno opened this issue Apr 12, 2022 · 11 comments

Comments

@lerno
Copy link
Collaborator

lerno commented Apr 12, 2022

Currently enums use (and allow) C style enum value definition:

enum Foo
{
    BAR = 123,
    BAZ = 44,
}

With associated values, an enum may look like this:

enum Foo : int (int offset, char* extra_name, double x)
{
	BAZ(12, "hello", 3.0),
	BOO(33, "oekfe", 4.0) = 3,
}

Here we can do things like:

Foo f = (Foo)3;
int x = f.offset; // 33
puts(f.extra_name); // Prints "oekfe"
int ordinal = f.ordinal; // 1

Some things are a bit problematic: the associated values need either compile to an array indexed by ordinal, or a function with a switch statement. Neither is particularly fast. It's also easy to get invalid values with non-ordinals.
The mapping between ordinal <=> value for enums with defined ordinal values need either be a hash map or a switch lookup, again being a fairly unattractive solution speed wise.

In the compiler too, care must be taken since something like (Foo)123 should be valid even if there is no such enum value defined. It can't be assumed to be incorrect under C semantics.

The alternative here is to leverage the associated values for non-C enums while retaining the ability to do much of what C offers:

For our example above, we would instead define this:

enum Foo (int value, int offset, char* extra_name, double x)
{
	BAZ(0, 12, "hello", 3.0),
	BOO(3, 33, "oekfe", 4.0),
}

As we see, the user defined ordinal values are now one of the associated values. We can easily construct a macro that using linear search looks for an enum by value (more optimized versions can of course be made):

Foo f = @enum_find(Foo, value, 3); // Will return Foo.BOO

Even better we can allow it to return an optional:

Foo! f = @enum_find(Foo, value, 3); // Will return Foo.BOO

Which is much safer than just doing (Foo)some_int.

This allows us a better distinction between serialized "stable" versions of an enum and the ordinal.

Actually writing searches over enums is easy thanks to foreach in C3:

foreach (Foo f : Foo.elements) { /* search */ }

The advantage here is also more solid semantics, with enums mapping directly to ordinals, enforcing more robust conversions and in general being more well defined.

The disadvantage is that it adds a little friction when converting from C.

Something like

typedef enum
{
   INVALID = -1,
   RUNNING = 0,
   SHUTDOWN = 1,
} State;

Would need to - if values are important - be converted to:

enum State (int code)
{
    INVALID(-1),
    RUNNING(0),
    SHUTDOWN(1),
}

And then use state.code when interfacing with C, rather than using just state

@sirwhinesalot
Copy link

sirwhinesalot commented Apr 13, 2022

So one very strong point about this solution is that foreach loop, not only is it efficient since it's just iterating over the arrays, it lets enums be what they're supposed to be, enumerations. Efficiency is pretty good too, the indirection can be optimized out most of the time and the additional space wasted is irrelevant in the grand scheme of things (who has enums with thousands of cases? Even then it would be nothing). C interoperability is easy too, as mentioned.

This to me is the right way to implement enums, the C style is wrong, and the Rust/Swift style is even worse (they're tagged unions / sum types / variant types, not enumerations).

The find macro, which I personally find unnecessary, doesn't cause any issues and doesn't interfere with any other language features, which is nice. But my only note on it is that the return type of that macro should be an array or iterator of some kind, not a single value, since multiple members of the enumeration may share the same value in the same field.

@data-man
Copy link
Contributor

who has enums with thousands of cases?

It's me. :)

@sirwhinesalot
Copy link

Hahaha, well, guess there's always one case where it is needed. For the unicode codepoints, there are 16 bits worth of values per plane (and 17 planes), so 1,114,112 total possible codepoints. With either the C solution or this solution, the codepoint is technically the ordinal of the enum, no associated value is necessary. So the associated data would be empty and no array would be generated.

Colors though, there's are only like... 7 colors, what is wikipedia talking about.... Alabaster? Man what are artists smoking these days. Assuming that the color is desired though, having associated RGB values is probably quite useful I'm guessing.

@lerno
Copy link
Collaborator Author

lerno commented Apr 13, 2022

Another drawback that should be noted is API stability in the case it's stored in a struct:

enum Abc
{
   FOO,
   BAR
}   
struct Foo
{
    Abc x;
}

However one should also note that structs are not stable across versions if their alignment, size or field order changes either. API stability is an important aspect that perhaps requires its separate issue.

@sirwhinesalot
Copy link

API stability is definitely its own challenge. If your enum ordinals are always size_t sized then at least the struct remains the same size across versions, assuming you only store the ordinal (and you should only store the ordinal, the data is in a global array). So here I don't think storing them in structs causes any specific issues, beyond the standard issues with structs.

A bigger issue is the assumption that enums are total, i.e. that any switch statement that covers all enum cases doesn't need a default case. Rust has a special annotation to prevent this to allow for future-proofing.

@lerno
Copy link
Collaborator Author

lerno commented Apr 13, 2022

@sirwhinesalot If ordinals are strictly ordered, I find that it makes sense to define them to be char by default, and only make it bigger if explicitly asked for. An alternative to annotation on the type is to have annotation on the switch. Both have advantages.

@sirwhinesalot
Copy link

I see the char thing as an optimization, it makes sense but is bad for future-proofing (as discussed). I have no hard thoughts on the matter, dynamic library compatibility across versions is very hard unless you go fully dynamic.

For the switch, my preference is on the type, because if you make the annotation on the switch, it's easy to trigger a bug by passing a new enum case to an older version of a function that has no branch to deal with it. To avoid that you at the very least need to track in the function signature that it only deals with one specific version of the enum, not just on the switch itself. It gets messy fast and is not worth it.

On a language with structural types, you could track in the function signature exactly which cases it expects, with an "..." to say that it can also accept arbitrary cases to some extent, similar to row-polymorphic variants in ocaml. But now you're in fancy language feature territory.

@lerno
Copy link
Collaborator Author

lerno commented Apr 14, 2022

struct FooData
{
  int value;
  int offset;
  char* extra_name,
  double x;
}
enum Foo : FooData
{
    BAZ = { 0, 12, "hello", 3.0 }
    BOO = { 3, 33, "oekfe", 4.0 },
}

I'm just going to leave this alternative syntax here, which I'm not fond of. There is a single advantage and that is that this would allow simple enums to have a built in "byValue" function, which does a comparison on the value (something that would not apply to structs)

I also dislike this syntax because it does not seem to scale well and leaves the enum definition outside of the enum.

@lerno
Copy link
Collaborator Author

lerno commented May 7, 2022

So at this point I've gone ahead with working on the Non-C enum. For the case where enums need to be stable, consts, typed and wrapped in a sub module is preferred:

// Old:
enum Errno : ErrnoType
{
  EPERM = 1,
  ENOENT = 2
  ...
}
// New
module libc::errno;
define Errno = distinct ErrnoType;

const Errno EPERM = 1;
const Errno ENOENT = 2;
const Errno ESRCH = 3;

This seems to create a good distinction between grouped constants and enums in practice. As can be seen, there is still type safety, while at the same time there's API stability when using them.

Incidentally, enums becoming integral values means they are similar to fault so long term might be conflated with fault, allowing one to use:

enum Foo : anyerr
{
  ...
}

For optional returns values, which just wasn't reasonable while enums where still C style.

@lerno
Copy link
Collaborator Author

lerno commented May 7, 2022

The following rules / functions may possibly be available:

Foo f1 = (Foo)some_value; // Panic if out of range in safe, otherwise UB-ish
Foo! f2 = Foo.from(some_value); // Err is missing. Should possibly be written as a macro
int x = (int)f1; // Always ok
int y = foo.ordinal; // Same as the cast
int z = Foo.count; // Number of enums
Foo[*] foos = Foo.elements; // All enums in an array
// Alternatively call it Foo.all

@lerno
Copy link
Collaborator Author

lerno commented May 11, 2022

Implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants