Skip to content

feat: add greedy newline tokenizer#25

Merged
bcoe merged 7 commits into
mainfrom
@bycedric/feat/newline
Jan 7, 2021
Merged

feat: add greedy newline tokenizer#25
bcoe merged 7 commits into
mainfrom
@bycedric/feat/newline

Conversation

@byCedric
Copy link
Copy Markdown
Contributor

@byCedric byCedric commented Dec 25, 2020

Part of #16 and building on top of #24, using the same greedy principle as whitespace nodes.

Screenshot 2020-12-25 at 15 34 14

It's still a draft, at least until we decided to merge or change #24. Also running into this issue where we trim the outer newlines/whitespaces. Might be better to either not-do that, to maintain positioning within the raw input.

Screenshot 2020-12-25 at 15 38 15

@byCedric byCedric requested review from bcoe and wesleytodd December 25, 2020 14:38
@byCedric byCedric changed the title feat: add greedy whitespace tokenizer feat: add greedy newline tokenizer Dec 25, 2020
@byCedric byCedric force-pushed the @bycedric/feat/newline branch 3 times, most recently from 9adefc0 to 2beccd0 Compare December 25, 2020 14:57
@bcoe
Copy link
Copy Markdown
Member

bcoe commented Dec 25, 2020

@byCedric in general this looks great to me.

One question it begs, is do we consider footers part of the body, where this complicates things a bit is that we end up with the newline token between the body and first footer.

Thinking out loud, how hard would it be to pull the footer and dividing newline up to the same level as message.

@bcoe
Copy link
Copy Markdown
Member

bcoe commented Dec 25, 2020

It's a little bit uglier of a grammar, but does this get the point across and give a better tree structure?

<message>       ::= <summary>, <newline>*, <body>, <newline>*, <footer>*
                 |  <summary>

<body>          ::= <any text except footer*>, <newline>, <body>
                 | <any text except footer*>

@byCedric
Copy link
Copy Markdown
Contributor Author

It's a little bit uglier of a grammar, but does this get the point across and give a better tree structure?

<message>       ::= <summary>, <newline>*, <body>, <newline>*, <footer>*
                 |  <summary>

<body>          ::= <any text except footer*>, <newline>, <body>
                 | <any text except footer*>

Yes, I think so! Although I think we need to use the + instead of * here. If there is no newline, it's part of the summary.text. Same for the footer, if there is no newline there it's parsed as the body.

Both footer and body are optional, maybe something like this? I'm not too familiar with (e)BNF structures.

<message ::= <summary> [<newline>+, <body>], [<newline>+ <footer>]

@bcoe
Copy link
Copy Markdown
Member

bcoe commented Dec 26, 2020

@byCedric what about:

<message ::= <summary> [<newline>+, <body>], [<newline> <footer>+]

<body>   ::= <any text except newline or footer+>, <newline>, <body>
                 | <any text except newline or footer+>

<footer>        ::= <token>, <separator>, <whitespace>*, <value>, [<newline>]

should we do this in a separate PR.

@byCedric byCedric force-pushed the @bycedric/feat/whitespace branch from d4c8cf7 to 1529eb1 Compare December 27, 2020 15:03
@byCedric byCedric force-pushed the @bycedric/feat/newline branch from 2beccd0 to c9eee03 Compare December 27, 2020 15:13
@byCedric

This comment has been minimized.

bcoe pushed a commit that referenced this pull request Dec 27, 2020
Prior to this, footer was a child node of body which feels off conceptually.
This also makes it more difficult to add a newline node that falls between
the body and footer.

Refs: #25
bcoe pushed a commit that referenced this pull request Dec 27, 2020
Prior to this, footer was a child node of body which feels off conceptually.
This also makes it more difficult to add a newline node that falls between
the body and footer.

Refs: #25
@bcoe
Copy link
Copy Markdown
Member

bcoe commented Dec 27, 2020

Just to double check if I understand what you mean @bcoe, you prefer something like this right?

I think we're pretty much on the same page conceptually, I opened up #26 as a potential approach.

Not sure if I did the body text right here, or if we should leave them as empty "" text nodes.

The problem with including the \n in the text nodes is that we define the next nodes as:

<any UTF8-octets except newline>*

i.e., they explicitly don't contain a newline. Could we just add a note for implementers that the original body can be by joining with \n?

Base automatically changed from @bycedric/feat/whitespace to main December 27, 2020 23:41
bcoe pushed a commit that referenced this pull request Dec 27, 2020
Prior to this, footer was a child node of body which feels off conceptually.
This also makes it more difficult to add a newline node that falls between
the body and footer.

Refs: #25
Copy link
Copy Markdown
Member

@wesleytodd wesleytodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to add for me, but I am following along on the other conversations which I think are good to figure out.

As for adding a note that you need to add a newline which would not be represented in the tree, I think this is not a desirable solution. I also think that we should not say "anything but a new line" and also, "you need to add a new line". So I am not sure where I land on this yet.

@byCedric byCedric force-pushed the @bycedric/feat/newline branch from c9eee03 to 9dcc412 Compare December 31, 2020 14:15
@byCedric
Copy link
Copy Markdown
Contributor Author

@bcoe @wesleytodd I just updated the PR to include the latest main. Because of some weird/complex merge conflicts, I opted for a reimplementation. Right now it parses commits like the image below. In a follow-up PR, I'll implement @bcoe's suggestion to pull newline from footer and put in in the tree as sibling (as children of message).

I agree with your comments about concatenating the newlines in text, but with the current state people can easily merge the body back to a string with:

const tree = parser(<message>)
const body = tree.children.find(node => node.type === 'body')

// this will join `<text>` and `<newline>*` together without handling newline or text node types
const text = body.children.map(node => node.value).join('') 

Current state of newlines
Screenshot 2020-12-31 at 15 17 09

@byCedric byCedric marked this pull request as ready for review December 31, 2020 14:21
@byCedric byCedric requested review from bcoe and wesleytodd December 31, 2020 14:22
Copy link
Copy Markdown
Member

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in the grammar in the README, there are a few places where we have <newline>, I think this should always be <newline>+, or <newline>*, depending on whether it's optional.

@bcoe bcoe merged commit 5b12227 into main Jan 7, 2021
@bcoe bcoe deleted the @bycedric/feat/newline branch January 7, 2021 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants