Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl: Add sed parser #3

Merged
merged 2 commits into from
Dec 13, 2018
Merged

impl: Add sed parser #3

merged 2 commits into from
Dec 13, 2018

Conversation

tomsmeding
Copy link
Collaborator

This doesn't include a regex parser yet; they are just parsed as a bare
string without further processing.

This also includes a basic AST definition. I'm not all that happy with the
representation of addresses, but it does work (it seems).

This doesn't include a regex parser yet; they are just parsed as a bare
string without further processing.
@tomsmeding
Copy link
Collaborator Author

tomsmeding commented Dec 11, 2018

In relation to the addresses as well: the posix standard seems to suggest that all regular expressions, even those in addresses, can be surrounded by any pair of matching characters as delimiters. So while slashes are the usual choice, the following would in my reading constitute a valid command according to the standard:

^addr^ s,pat,repl,g

Now having different delimiters in a substitute command is quote useful, especially when e.g. working with paths with a lot of slashes in them already (it saves some escaping). In addition, using them there is never ambiguous as far as I can see.

However, allowing different delimiters in an address can be ambiguous; see the following:

sxsb xreplx

This is both a valid s command (with regex sb and replacement string repl; the delimiter is x), and a valid b command with label xreplx and a one-part regex address with the regex x (where the delimiter is s).

Now ambiguity in the grammar is not wanted, so GNU sed rightfully didn't do this; they allow fully arbitrary characters as s and y delimiters, but require / for addresses. They do allow arbitrary delimiters in an address regex if the first delimiter is preceded by a \, like the following:

\sxsb xreplx

This is the previous example, but with a backslash added. GNU sed should parse this as a b command with the parts mentioned above. Without the backslash, it parses it as the s command. In my parser, I copied this behaviour from GNU sed.

Please, before merging, make sure this is written down somewhere more findable than a random pull request. I'm tired now.

@Lambdara Lambdara self-assigned this Dec 13, 2018
@Lambdara
Copy link
Owner

I'm not sure why, but the "a" command doesn't seem to work.

> parse parseProgram "repl" "a lol"
Left "repl" (line 1, column 3):
unexpected "l"
expecting "\\\n" or lf new-line

@tomsmeding
Copy link
Collaborator Author

tomsmeding commented Dec 13, 2018

@Uwila That's because I read the standard to mean that a takes its argument with a different syntax, as follows:

a \
lol

And indeed, while GNU sed normally accepts the a lol syntax, it accepts only the above when passed --posix as a flag.

But actually, the backslash-newline syntax doesn't seem to be accepted by my parser either, so let me look at that.

Thanks @Uwila for the report
@Lambdara Lambdara mentioned this pull request Dec 13, 2018
@Lambdara Lambdara merged commit 58d6a94 into master Dec 13, 2018
@Lambdara Lambdara deleted the impl-parser branch December 13, 2018 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants