Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templates/memory views: Resolve parser ambiguities at a later stage #904

Closed
robertwb opened this issue Jun 30, 2009 · 6 comments
Closed
Milestone

Comments

@robertwb
Copy link
Contributor

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

  • AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
  • A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:

SomeNameOtherName is actually not ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:

  • Some things must be type declarations -- like "a_", "(a_)()", "unsigned int".
  • Some things must be expressions -- like "a+b", "a::b" etc.
  • Some things are ambiguous:
    • "somename" can of course be either
    • "a(b)" can either be a function call, or a declaration like this:
# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...

So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?

Migrated from http://trac.cython.org/ticket/342

@robertwb
Copy link
Contributor Author

@dagss changed description from

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

  • AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
  • A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:


SomeName[OtherName](]) is actually *not* ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:
 - Some things must be type declarations -- like "a*", "(a*)()", "unsigned int".
 - Some things must be expressions -- like "a+b", "a::b", "d()" (?) etc.
 - Some things are ambiguous:
    - "somename" can of course be either
    - "a(b)" can either be a function call, or a declaration like this:

# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...

So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?

to

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

  • AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
  • A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:

SomeNameOtherName is actually not ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:

  • Some things must be type declarations -- like "a_", "(a_)()", "unsigned int".
  • Some things must be expressions -- like "a+b", "a::b", "d()" (?) etc.
  • Some things are ambiguous:
    • "somename" can of course be either
    • "a(b)" can either be a function call, or a declaration like this:
# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...
```}
So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?
commented

@robertwb
Copy link
Contributor Author

@dagss changed description from

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

  • AB can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
  • A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for cdef extern function arguments.

Extract from conversation from Dag to Kurt:

SomeNameOtherName is actually not ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:

  • Some things must be type declarations -- like "a_", "(a_)()", "unsigned int".
  • Some things must be expressions -- like "a+b", "a::b", "d()" (?) etc.
  • Some things are ambiguous:
    • "somename" can of course be either
    • "a(b)" can either be a function call, or a declaration like this:
# takes a function returning a and taking b as argument:
cdef extern foo(a(b))
# If giving the argument a name, it is written like this:
cdef extern foo(a(argname)(b))
# wierd stuff...
```}
So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?


to

Especially if we want to introduce templates, the scheme below should be used to resolve a syntax ambiguity. This holds whether [or () is selected:

 * A[B](]) can (in type context) mean either a C array of size B, or a template with B as argument if [is chosen.
 * A(B) can (in type context) mean either an unnamed C function returning type A and taking an argument of type B (yes, really!), or a template with B as argument if () is chosen.

Both of these are only a problem where the declarator name can be dropped though, i.e. inside sizeof or for ```cdef extern``` function arguments.

Extract from conversation from Dag to Kurt:

SomeName[OtherName](]) is actually *not* ambiguous, it's just that it is ambiguous in the parser! Later on, SomeName can be resolved, and it will be known whether SomeName is a Cython type (=>buffer) or a struct/typedef/C type (=> C array without name).

So:

a) Forget about deciding this at parse time. Instead parse to a much rawer "BracketTypeNode" (containing base_type and axes), and leave the decision until Cython's declaration analysis phase (where the base_type can be analysed before axes, so base_type will tell what needs to be done with axes).

b) However, this requires that the axes are also parsed without making too many assumptions -- which is potentially hard. Basically this calls for an additional method (in addition) to p_expr and p_c_declarator, which basically parses something which can be "either an expression or declarator". I.e. p_expr_or_c_declarator (with only the empty=True case for p_c_declarator).

Now:
 - Some things must be type declarations -- like "a*", "(a*)()", "unsigned int".
 - Some things must be expressions -- like "a+b", "a::b" etc.
 - Some things are ambiguous:
    - "somename" can of course be either
    - "a(b)" can either be a function call, or a declaration like this:

takes a function returning a and taking b as argument:

cdef extern foo(a(b))

If giving the argument a name, it is written like this:

cdef extern foo(a(argname)(b))

wierd stuff...

So the strategy would be to have p_expr_or_c_declarator return a parse tree which was "unresolved" (like, ExprOrTypeNode). And then one could afterwards call either analyse_as_expr or analyse_as_type on the tree (when one knew what to expect). If the tree then e.g. contained something which could only be interpreted as an expression, and one called analyse_as_type, an error would be raised at that point.

This seems like a quite big task which I'm unsure about spending time on. But the result is much more "correct", in that the parser doesn't make decisions it really can't do. Also it helps moving logic out of the parser in general. What do you think?
commented

@robertwb
Copy link
Contributor Author

robertwb commented Jul 2, 2009

scoder commented

Why a new node type? Isn't IndexNode enough to deal with this in the parser? There would then be a transform after (early?) type analysis that would replace it with the right implementation node depending on the object it operates on.

@robertwb
Copy link
Contributor Author

robertwb commented Jul 5, 2009

@dagss commented

IndexNode is in an expression context, this is in type context. While technically possible, IndexNode inherits from ExprNode, which BracketTypeNode definitely wouldn't.

@robertwb
Copy link
Contributor Author

robertwb commented Feb 4, 2010

@robertwb changed component from Parsing to C++
milestone from wishlist to 0.13
owner from somebody to robertwb
commented

@robertwb
Copy link
Contributor Author

robertwb commented Feb 4, 2010

@robertwb changed resolution to fixed
status from new to closed
commented

I believe I resolved this when I wrote support for declaring templated C++ types.

@robertwb robertwb closed this as completed Feb 4, 2010
@robertwb robertwb added this to the 0.13 milestone Aug 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant