Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymorphic Maybe [Feature Request] #65

Open
ghost opened this issue Oct 3, 2013 · 8 comments
Open

Polymorphic Maybe [Feature Request] #65

ghost opened this issue Oct 3, 2013 · 8 comments
Labels
enhancement LBNF Concerning the LBNF syntax and its checking

Comments

@ghost
Copy link

ghost commented Oct 3, 2013

To make something like this possible :

CU. CompilationUnit ::= [PackageDecl] {ImportDeclaration} {TypeDeclaration}

giving rise to

data CompilationUnit = CU [PackageDecl] (Maybe ImportDeclaration) (Maybe TypeDeclaration)
@gdetrez
Copy link
Contributor

gdetrez commented May 8, 2015

This came up again in the google group. One difficulty is to figure out how to best implement it in all target languages.

I'm not sure about C, C++ and C#.

@gdetrez
Copy link
Contributor

gdetrez commented Jul 9, 2015

I have two open questions before I can try to work on this:

What is the most idiomatic way to implement this in different languages?
I have an idea for Haskell, OCaml and Java but not for the C-like languages (C, C++ and C#).

What syntax to use?
If I understood correctly the example in #150, ebnf uses [...] but this already means list in lbnf. The comment above suggest {...} but for me it would be more natural to use a regex-like syntax like Foo? meaning 0 or 1 occurrence of Foo.

@forflo
Copy link
Contributor

forflo commented Aug 8, 2016

Maybe that's too simple, but for C/C++ you could just use null pointers. At least that's the approach IcarusVerilog's vhdlpp uses. Vhdlpp is a VHDL to Verilog transpiler which I'm currently using to parse VHDL...

An object for say

fooLabel . foo_baz ::= <"bla"> "foobar";

would look like

class foo_baz {/*...*/};
class fooLabel : foo_baz {
    symbol_ptr *m1; //might be NULL!
    symbol_ptr *m2;
};

This approach, however, has one obvious drawback: Another layer of indirection must be introduced in order to point to terminal symbols too.

But I think there is another, much more elegant solution of the problem!

Our current goal was to directly map rules containing optional parts onto an appropriate ADT or class hierarchy. But what if we'd use a transformation on the AST of the LBNF in order to expand, or better "desugar away" these optionals.

Simply put, the following LBNF

AllSuffix . Suffix ::= <"foobar"> "all";
NameSuffix . Suffix ::= Simple_name;

should be expanded (on AST level) to the desugared LBNF

AllSuffix . Suffix ::= "all";
AllSuffix_foo . Suffix ::= "foobar" "all";
NameSuffix . Suffix ::= Simple_name;

before any further processing.

This has the charming side effect, that none of the existing backends have to be touched.

A general algorithm for this expansion is very easy to see, if you consider a statements containing two or more optionals. For example an AllSuffix additionally could contain a begin delimiter and a end label.

AllSuffix . Suffix ::= <"?"> <"foobar"> "all" <"?">;

This desugars to every possible combination

AllSuffix000 . Suffix ::=              "all"    ;
AllSuffix001 . Suffix ::=              "all" "?";
AllSuffix010 . Suffix ::=     "foobar" "all"    ;
AllSuffix011 . Suffix ::=     "foobar" "all" "?";
AllSuffix100 . Suffix ::= "?"          "all"    ;
AllSuffix101 . Suffix ::= "?"          "all" "?";
AllSuffix110 . Suffix ::= "?" "foobar" "all"    ;
AllSuffix111 . Suffix ::= "?" "foobar" "all" "?";

The binary numbers are not really intended to go into an implementation and only serve as an analogy that each optional can be either present 1 or not 0.

@forflo
Copy link
Contributor

forflo commented Aug 8, 2016

The new approach to optionals could also fairly easy be extended to support nested optionals too. A snippet like

Foobar . FooFoo ::= < <Label> "-" <Label> ":" > "foobar

is quite relevant for practial use, because VHDL-2008's grammar makes heavy use of those constructs in EBNF.

This piece of grammar

block_header ::=
[ generic_clause [ generic_map_aspect ; ] ]
[ port_clause [ port_map_aspect ; ] ]

for instance was just copy pasted from the language's official reference manual.

@gdetrez
Copy link
Contributor

gdetrez commented Aug 9, 2016

Interesting idea @forflo. I realize now that what you want to do and what was suggested in this issue are actually two different features.

There is actually already a mechanism similar to what you're suggesting in BNFC in the rules macro. We could extend this macro to allow nested disjonctions, then you could write your example as

rules FooFoo = ( ( Label |) "-" ( Label |) ":" |) "foobar" ;

This has the advantage that it doesn't introduce new symbols in the language.

@forflo
Copy link
Contributor

forflo commented Aug 9, 2016

BNFC in the rules macro.

Exactly. I like your proposed syntax.

There are a number of drawbacks even with this solution.

  1. A rules macro containing n optional symbols would generate 2^n different rules whose labels also have to be distinct. Hence, we need a way to produce label prefixes that are meaningful.
  2. For each rule with more than one label, the C/C++ backend produces a subclass for each distinct label of the rule. Hence, the AST structure will be kind of bloated.

The second point is not really a drawback of the current approach since it is present already. If you manually expand optional grammar symbols, you have to face the exact same issue.

Let me clarify point one by means of a previous example.

AllSuffix . Suffix ::= <"?"> <"foobar"> "all" <"?">;

translates to your poposed syntax as follows

rules Suffix = ( "?" |) ( "foobar" |) "all" ( "?" |) ;

which should expand to

Suffix_opt000 . Suffix ::=              "all"    ;
Suffix_opt001 . Suffix ::=              "all" "?";
Suffix_opt010 . Suffix ::=     "foobar" "all"    ;
Suffix_opt011 . Suffix ::=     "foobar" "all" "?";
Suffix_opt100 . Suffix ::= "?"          "all"    ;
Suffix_opt101 . Suffix ::= "?"          "all" "?";
Suffix_opt110 . Suffix ::= "?" "foobar" "all"    ;
Suffix_opt111 . Suffix ::= "?" "foobar" "all" "?";

As an interpreter writer, you'd have to do a lot pattern matching on the different constructors of data Suffix, but I think that's more elegant than the need to handle Maybe Monads or NULL pointers, since - in a way - the parser does these decisions for you and only produces a slightly "richer" AST.

I really do think, that this is the most feasible and general solution for our problem since we (1) don't have to touch backends (2) don't have to introduce new symbols to LBNF. We'd only take the next logical step. That is, from macros helping with lists and operator precedence to macros helping with optionals.

Moreover, I don't think there really is a general, non-ugly method of mapping optionality onto type systems of all target languages. Also, I don't think that we need this kind of functionality, because there is at least one major language out there whose grammar does not serve well as the basis for later processing -- of course I'm talking about VHDL...

@dkasak
Copy link

dkasak commented Nov 20, 2017

I was planning on using BNFC and this feature would really make my life easier. Is this still planned?

From experience with tavor, its ?(category) syntax works really well and is easy to read (IMO) so I think a regex-like syntax would be best for this, i.e. (category)?.

Taking further inspiration from it, its repeated and permutation groups are really nice and work like this:

  • +n(category) specifies that category should repeat exactly n times in sequence, where n is a natural number
  • +n,m(category) is similar to the above, except it specifies that category should repeat between n and m times (inclusive)
  • @(cat1 | cat2 | cat3) specifies that cat1, cat2 and cat3 all have to occur in sequence, but in any order

These probably aren't in the scope of this particular issue, but I think they are nice potential additions to consider.

@andreasabel andreasabel added the LBNF Concerning the LBNF syntax and its checking label Feb 9, 2023
@ScottFreeCode
Copy link

Late, but I want to mention as far as Maybe types in the backends:

  • Modern C++ has Optional. I remember it being added around a decade ago.
  • For C, the simplest solution is a pointer for its nullability, and this is idiomatic as far as I know.

It's possible to implement an Optional in C without pointers or dynamic allocation (and I think the C++ Optional template does this under the hood), but it gets complicated and questionable pretty quickly:

  • You need a struct, with a boolean for whether it's present or Nothing, and with its other field being the actual type.
  • Either the actual type "inside" the Optional needs to have a default value it can be initialized to when the Optional starts as Nothing, or you have to be careful about it being uninitialized in that case, or it has to be safely initialized to all zero bytes if that's possible, or something. (You could make it a union of the actual type and some irrelevant type with no meaningful value, though if I'm not mistaken you still need the boolean to track which of those types the union currently actually is, so I'm not sure that actually helps anything unless all the other options are invalid.) Screw this up and you get undefined behavior and the entire program could behave unexpectedly under optimization.
  • C has no templates or generics, you can't write straight code that defines or can use the Optional type regardless of the actual type "inside" it. Either you'll need to duplicate the code that deals with Optional (maybe that's fine if it's generated as a backend!) or the only way I know of to deduplicate the code is to squish it into C's text-munging macros, which can be messy and hard to work with.

Compared to all those issues, the only drawback to using a pointer instead (which is a natural, built-in feature of the language), is that the memory for it will usually need to be dynamically allocated rather than stack allocated.

Of course, when you're parsing text input to generate a data structure, dynamic allocation is probably the norm anyway… I guess the one drawback really is that in some cases a null pointer would be an error and in others it would be a valid state meaning no value, and the type system won't tell you which; but C programmers have to get used to that in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement LBNF Concerning the LBNF syntax and its checking
Projects
None yet
Development

No branches or pull requests

5 participants