Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Using functor to store position information #176

Closed
Centril opened this issue Apr 18, 2016 · 10 comments
Closed

Question: Using functor to store position information #176

Centril opened this issue Apr 18, 2016 · 10 comments
Assignees
Labels
enhancement Haskell position Concerning position information in parsed AST
Milestone

Comments

@Centril
Copy link

Centril commented Apr 18, 2016

According to bnfc --help:
--functor Make the AST a functor and use it to store the position of the nodes

How would you go about storing the position of the nodes?
The type that

parseProgram = pProgram . myLexer

yields is:

parseProgram :: String -> Err (Program ())

I.e: no position information is added... So does that mean that you have to use something else than happy, or can you get it to add that position information for you?

@pascalh
Copy link
Contributor

pascalh commented Apr 29, 2016

If I read the bnfc sources correctly source positions are not supported by the happy backend of bnfc. Actually the lexer generated by alex yields a list of tokens which are annotated with their source positions, but the happy parser ignores them.

@gdetrez
Copy link
Contributor

gdetrez commented Apr 29, 2016

@pascalh is correct, the help comment describe the intended behavior but I still need to find time to complete the feature. The functor part works but not the position part...

Sorry for the inconsistent documentation 😕.

@gdetrez gdetrez modified the milestone: 2.9 Sep 7, 2016
@Centril
Copy link
Author

Centril commented Nov 1, 2016

Any updates on this @gdetrez ?

@gdetrez
Copy link
Contributor

gdetrez commented Nov 2, 2016

@Centril partly: I have done some work on this but I was interrupted by other things...

I'll try to rebase and push what I already have this week and see if I can finish it soon-ish. Would you be able to provide some feedback?

@gdetrez gdetrez self-assigned this Nov 2, 2016
@Centril
Copy link
Author

Centril commented Nov 3, 2016

@gdetrez Sure, what type of feedback do you want? Peer review or testing (or both)?

Just give me a link to the commit at which point to test it

gdetrez added a commit that referenced this issue Nov 4, 2016
When using bnfc to generate haskell code with the --functor option, the
parser will no store the begining position of the corresponding code in
each node. This might be useful, for instance, to report errors.

Note that this is possibly a larger change than it had to be because I
tried something new: instead of generating code by putting together
strings (or the slightly fancier PrettyPrint.Doc), I refactored the code
that makes happy production to generate an AST which is then
pretty-printed. Both the AST types and the pretty printer being of
course generated by BNFC itself! This seems like an obvious thing to do
but for some reason we never did.

The main advantage is that it makes much better use of the type
checker (whene everything is a string, it doesn't mean much that the
type checking passes...).

In addition, you get much cleaner code by getting rid of all the
separators, keywords, symbols and parentheses that are now inserted by the
pretty printer.
mlazowik pushed a commit to mlazowik/bnfc that referenced this issue Dec 7, 2017
When using bnfc to generate haskell code with the --functor option, the
parser will no store the begining position of the corresponding code in
each node. This might be useful, for instance, to report errors.

Note that this is possibly a larger change than it had to be because I
tried something new: instead of generating code by putting together
strings (or the slightly fancier PrettyPrint.Doc), I refactored the code
that makes happy production to generate an AST which is then
pretty-printed. Both the AST types and the pretty printer being of
course generated by BNFC itself! This seems like an obvious thing to do
but for some reason we never did.

The main advantage is that it makes much better use of the type
checker (whene everything is a string, it doesn't mean much that the
type checking passes...).

In addition, you get much cleaner code by getting rid of all the
separators, keywords, symbols and parentheses that are now inserted by the
pretty printer.
@woehr
Copy link

woehr commented Apr 23, 2018

Is there any chance of getting this merged? Can I help somehow?

Wolf480pl pushed a commit to Wolf480pl/bnfc that referenced this issue Dec 7, 2018
When using bnfc to generate haskell code with the --functor option, the
parser will no store the begining position of the corresponding code in
each node. This might be useful, for instance, to report errors.

Note that this is possibly a larger change than it had to be because I
tried something new: instead of generating code by putting together
strings (or the slightly fancier PrettyPrint.Doc), I refactored the code
that makes happy production to generate an AST which is then
pretty-printed. Both the AST types and the pretty printer being of
course generated by BNFC itself! This seems like an obvious thing to do
but for some reason we never did.

The main advantage is that it makes much better use of the type
checker (whene everything is a string, it doesn't mean much that the
type checking passes...).

In addition, you get much cleaner code by getting rid of all the
separators, keywords, symbols and parentheses that are now inserted by the
pretty printer.
Wesmania pushed a commit to Wesmania/bnfc that referenced this issue Dec 31, 2018
When using bnfc to generate haskell code with the --functor option, the
parser will no store the begining position of the corresponding code in
each node. This might be useful, for instance, to report errors.

Note that this is possibly a larger change than it had to be because I
tried something new: instead of generating code by putting together
strings (or the slightly fancier PrettyPrint.Doc), I refactored the code
that makes happy production to generate an AST which is then
pretty-printed. Both the AST types and the pretty printer being of
course generated by BNFC itself! This seems like an obvious thing to do
but for some reason we never did.

The main advantage is that it makes much better use of the type
checker (whene everything is a string, it doesn't mean much that the
type checking passes...).

In addition, you get much cleaner code by getting rid of all the
separators, keywords, symbols and parentheses that are now inserted by the
pretty printer.
Wesmania pushed a commit to Wesmania/bnfc that referenced this issue Dec 31, 2018
When using bnfc to generate haskell code with the --functor option, the
parser will no store the begining position of the corresponding code in
each node. This might be useful, for instance, to report errors.

Note that this is possibly a larger change than it had to be because I
tried something new: instead of generating code by putting together
strings (or the slightly fancier PrettyPrint.Doc), I refactored the code
that makes happy production to generate an AST which is then
pretty-printed. Both the AST types and the pretty printer being of
course generated by BNFC itself! This seems like an obvious thing to do
but for some reason we never did.

The main advantage is that it makes much better use of the type
checker (whene everything is a string, it doesn't mean much that the
type checking passes...).

In addition, you get much cleaner code by getting rid of all the
separators, keywords, symbols and parentheses that are now inserted by the
pretty printer.
@JKTKops
Copy link

JKTKops commented Oct 15, 2019

Any news on this? This would be a very useful feature!

@andreasabel
Copy link
Member

Seems like this feature sits on an unmerged branch https://github.com/BNFC/bnfc/compare/176-source-position which my predecessor left behind. After three years of course it does not merge cleanly any more. I don't know if there was a reason this was not finished.

I'd be happy for someone to pick it up and integrate it with the current source of BNFC.

@andreasabel andreasabel added position Concerning position information in parsed AST enhancement and removed bug labels Jan 4, 2020
@andreasabel andreasabel modified the milestones: 2.9, 3.0 Nov 25, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 4, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 5, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 7, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 10, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 10, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 10, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 13, 2020
Commelina added a commit to Commelina/bnfc that referenced this issue Dec 15, 2020
andreasabel pushed a commit that referenced this issue Jan 12, 2021
This is PR #327.

Effects and TODO (Andreas):
#327 (comment)

We can now look at the user experience (UX).  I tried `--functor` on a tiny grammar for Hutton's razor:
```
EPlus. Exp  ::= Exp "+" Exp1;
EInt.  Exp1 ::= Integer;
```
Then I looked at the products to discover how to use the new feature.

- `Test.hs` is unchanged w.r.t. the version without `--functor`.  This is good.  Yet it did not give me any hints how to use positions.

- `Abs.hs` places an additional parametric value at each AST constructor
  ```haskell
  data Exp a = EInt a Integer | EPlus a (Exp a) (Exp a)
    deriving (C.Eq, C.Ord, C.Show, C.Read)

  instance C.Functor Exp where
  ```
  I think the `Functor` instance can and should now be automatically generated by GHC using the `DeriveFunctor` extension. Also, `Foldable` and `Traversable` can be derived.
  The file `Abs.hs` should give me some clues about the positions that the parser delivers.  I'd favor something along
  ```haskell
  data Position = Position { startLine :: Int, startColumn :: Int }
  data Exp' a = EInt a Integer | EPlus a (Exp' a) (Exp' a) ...
  type Exp = Exp' Position
  ```

- `Par.y`:
  ```haskell
  %name pExp_internal Exp
  %name pExp1_internal Exp1
  ...
  %%

  Integer :: { (Maybe (Int, Int), Integer) }
  Integer  : L_integ  { (Just (tokenLineCol $1), (read ((tokenText $1))) :: Integer) }

  Exp :: { (Maybe (Int, Int),  (Razor.Abs.Exp (Maybe (Int, Int))) ) }
  Exp : Exp '+' Exp1 { (fst $1, Razor.Abs.EPlus (fst $1) (snd $1) (snd $3)) }

  Exp1 :: { (Maybe (Int, Int),  Razor.Abs.Exp (Maybe (Int, Int)) ) }
  Exp1 : Integer { (fst $1, Razor.Abs.EInt (fst $1) (snd $1)) }
  {
  ...
  myLexer = tokens
  pExp = (>>= return . snd) . pExp_internal
  pExp1 = (>>= return . snd) . pExp1_internal
  }
  ```

Issues with the generated parser:

- Concerning the last lines: The simpler code is
  ```haskell
  pExp = fmap snd . pExp_internal
  ```

- `pExp` etc. should have a type signature

- There should be a comment to point out the entrypoints at the end of the file (for better UX).

- (Minor issue, not sure: `pExp_internal`:  better name scheme would be `pExpWithPosition` etc.)

- The `Maybe (Int, Int)` shouldn't be duplicated everywhere.  See comment on `Abs.hs` how to factor out a type of `Position` and types for AST trees with position information.

- Can we get `Position` instead of `Maybe Position`?  I see the `Nothing` comes from the empty list case, but maybe there we could return the position of the following token?
@andreasabel andreasabel modified the milestones: 3.0, 2.9.1 Jan 21, 2021
@andreasabel andreasabel assigned andreasabel and unassigned gdetrez Jan 21, 2021
@andreasabel
Copy link
Member

This has been implemented by @Commelina in #327 and will be released with 2.9.1.

andreasabel added a commit that referenced this issue Jan 21, 2021
Functor as well as Foldable and Traversable can be derived using the
proper LANGUAGE extensions.
andreasabel added a commit that referenced this issue Jan 21, 2021
Also: parameterized version of AST is primed, unprimed is
position-carrying:

  type Exp = Exp' BNFC'Position
  data Exp' a = EPlus (Exp' a) (Exp' a) ...
andreasabel added a commit that referenced this issue Jan 21, 2021
Also abbreviate 'Either String' to 'Err' in generated happy file.
andreasabel added a commit that referenced this issue Jan 21, 2021
If one runs `bnfc --functor` on two grammars, one does not want two
different (yet isomorophic) definitions of the position
type (generativity problem).

As type synonyms expand to their definition, it is better to use
Maybe(Int,Int) as position type, and only define a synonym.
andreasabel added a commit that referenced this issue Jan 22, 2021
whenever BNFC'Position is defined.

Fix-up to d90e876
andreasabel added a commit that referenced this issue Jan 22, 2021
Input is not always 'String', can also be 'Text' or 'ByteString'.

Fix-up 7f21679.
@andreasabel
Copy link
Member

Released in 2.9.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Haskell position Concerning position information in parsed AST
Projects
None yet
Development

No branches or pull requests

6 participants