Templates/memory views: Resolve parser ambiguities at a later stage #904

robertwb · 2009-06-30T08:01:53Z

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:

SomeNameOtherName is actually not ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:

Some things must be type declarations -- like "a_", "(a_)()", "unsigned int".
Some things must be expressions -- like "a+b", "a::b" etc.
Some things are ambiguous:
- "somename" can of course be either
- "a(b)" can either be a function call, or a declaration like this:

# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...

So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?

Migrated from http://trac.cython.org/ticket/342

The text was updated successfully, but these errors were encountered:

robertwb · 2009-06-30T08:02:43Z

@dagss changed description from

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:


SomeName[OtherName](]) is actually *not* ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:
 - Some things must be type declarations -- like "a*", "(a*)()", "unsigned int".
 - Some things must be expressions -- like "a+b", "a::b", "d()" (?) etc.
 - Some things are ambiguous:
    - "somename" can of course be either
    - "a(b)" can either be a function call, or a declaration like this:

# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...

So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?

to

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:

SomeNameOtherName is actually not ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:

Some things must be type declarations -- like "a_", "(a_)()", "unsigned int".
Some things must be expressions -- like "a+b", "a::b", "d()" (?) etc.
Some things are ambiguous:
- "somename" can of course be either
- "a(b)" can either be a function call, or a declaration like this:

# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...
```}
So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?
commented

robertwb · 2009-06-30T08:03:54Z

@dagss changed description from

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:

SomeNameOtherName is actually not ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:

Some things must be type declarations -- like "a_", "(a_)()", "unsigned int".
Some things must be expressions -- like "a+b", "a::b", "d()" (?) etc.
Some things are ambiguous:
- "somename" can of course be either
- "a(b)" can either be a function call, or a declaration like this:

# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...
```}
So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?


to

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

 * A[B](]) can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
 * A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for ```cdef extern``` function arguments.

Extract from conversation from Dag to Kurt:

SomeName[OtherName](]) is actually *not* ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:
 - Some things must be type declarations -- like "a*", "(a*)()", "unsigned int".
 - Some things must be expressions -- like "a+b", "a::b" etc.
 - Some things are ambiguous:
    - "somename" can of course be either
    - "a(b)" can either be a function call, or a declaration like this:

takes a function returning a and taking b as argument:

cdef extern foo(a(b))

If giving the argument a name, it is written like this:

cdef extern foo(a(argname)(b))

wierd stuff...

So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?
commented

robertwb · 2009-07-02T05:09:29Z

scoder commented

Why a new node type? Isn't IndexNode enough to deal with this in the parser? There would then be a transform after (early?) type analysis that would replace it with the right implementation node depending on the object it operates on.

robertwb · 2009-07-05T14:10:07Z

@dagss commented

IndexNode is in an expression context, this is in type context. While technically possible, IndexNode inherits from ExprNode, which BracketTypeNode definitely wouldn't.

robertwb · 2010-02-04T06:46:49Z

@robertwb changed component from Parsing to C++
milestone from wishlist to 0.13
owner from somebody to robertwb
commented

robertwb · 2010-02-04T06:47:42Z

@robertwb changed resolution to fixed
status from new to closed
commented

I believe I resolved this when I wrote support for declaring templated C++ types.

robertwb closed this as completed Feb 4, 2010

robertwb added C++ defect labels Aug 16, 2016

robertwb added this to the 0.13 milestone Aug 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Templates/memory views: Resolve parser ambiguities at a later stage #904

Templates/memory views: Resolve parser ambiguities at a later stage #904

robertwb commented Jun 30, 2009

robertwb commented Jun 30, 2009

robertwb commented Jun 30, 2009

robertwb commented Jul 2, 2009

robertwb commented Jul 5, 2009

robertwb commented Feb 4, 2010

robertwb commented Feb 4, 2010

Templates/memory views: Resolve parser ambiguities at a later stage #904

Templates/memory views: Resolve parser ambiguities at a later stage #904

Comments

robertwb commented Jun 30, 2009

robertwb commented Jun 30, 2009

robertwb commented Jun 30, 2009

takes a function returning a and taking b as argument:

If giving the argument a name, it is written like this:

wierd stuff...

robertwb commented Jul 2, 2009

robertwb commented Jul 5, 2009

robertwb commented Feb 4, 2010

robertwb commented Feb 4, 2010