Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shell substitutions with parentheses fail #118

Closed
arowla opened this issue Dec 24, 2013 · 12 comments
Closed

Shell substitutions with parentheses fail #118

arowla opened this issue Dec 24, 2013 · 12 comments

Comments

@arowla
Copy link
Contributor

arowla commented Dec 24, 2013

The example below fails due to the closing parenthesis in the command, as the parser will consider the first closing parenthesis it finds to be the end of the shell substitution statement.

DATE := $(python -c "import datetime; print datetime.datetime.now()")
@myronahn
Copy link
Contributor

@dirtyvagabond @aboytsov @egamble I was working on this bug a bit and came across a big problem in the parser.

The monads in the parser are backtracking monads so they are not supposed to have side effects, however the monad that handles any shell substitution (command-sub) actually shells out right in the monad. When I was testing a fix to this bug, I noticed that it was shelling out twice because of the backtracking.

Unfortunately, being able to do command-line substitution right in the parser is critical for defining variables and in-line shell commands.

So this bug revealed a pretty deep problem. This may be the impetus we need to separate the lexer from the parser.

@dirtyvagabond
Copy link
Contributor

@myronahn oh man, this sounds like a nice find. so... are you volunteering to do some hard core lexer/parser development?

@aboytsov
Copy link
Contributor

I agree that a proper parser/lexer separation is the right way to go about fixing it. But in the meantime, we can easily make sure every shell substitution is executed once by maintaining a global hashmap that stores the result of each run (and caches it for backtracking). Thoughts?

@egamble
Copy link

egamble commented Jan 21, 2014

I use memoization with fnparse in the parser for my language Timeless. It
works pretty well:
https://github.com/egamble/timeless/blob/master/src/clj/timeless/bootstrap/parser.clj

On Tue, Jan 21, 2014 at 11:53 AM, Artem Boytsov notifications@github.comwrote:

I agree that a proper parser/lexer separation is the right way to go about
fixing it. But in the meantime, we can easily make sure every shell
substitution is executed once by maintaining a global hashmap that stores
the result of each run (and caches it for backtracking). Thoughts?


Reply to this email directly or view it on GitHubhttps://github.com//issues/118#issuecomment-32956284
.

@myronahn
Copy link
Contributor

Good idea - hash map/memoization sounds good except for a couple of things:

  • I'm not sure if there is a chance that backtracking will cause the parser to choose an entirely different path, that is, it might try the shell expansion and, after backtracking, decide that it is actually not a shell expansion after all. I wouldn't be surprised if this were a possibility.
  • Backtracking also might cause the shell expansion to parse differently, or might cause a different portion of the text to be parsed as the shell expansion.

However, these are probably very rare cases that can be ignored for now until the full parser/lexer separation. I might be able to detect when this happens and error out - but chances are, it would error out anyway.

We'll probably have to memoize the line number (or something) with the shell command as well since it could be that the user actually wants to run the same shell command multiple times in different parts of the workflow.

I'll play around with this to see if I can come up with a quick fix.

myronahn pushed a commit that referenced this issue Jan 22, 2014
Also memoize the calls to shell (along with file line/column)
Clear memoize map when new workflow is run
myronahn pushed a commit that referenced this issue Jan 22, 2014
@dirtyvagabond
Copy link
Contributor

@myronahn is this close-able?

@myronahn
Copy link
Contributor

@dirtyvagabond Well the immediate issue is hack-fixed but I wouldn't call it a long-term fix. The deeper issue I brought up is still an issue.

@dirtyvagabond
Copy link
Contributor

@amalloy may find this interesting, we recently talked about the parser/lexer separation considerations

@amalloy
Copy link
Contributor

amalloy commented May 27, 2014

@myronahn I'm curious what sort of input causes the parser to backtrack enough to execute a shell expansion twice. I don't see any rules that would obviously cause this sort of issue.

@amalloy
Copy link
Contributor

amalloy commented May 29, 2014

Anyway, we should be able to parse these just as well as bash does, and support ' and " within the command list. Supporting nested shell escapes might be a bit ambitious, but is probably not impossible either?

@myronahn
Copy link
Contributor

@amalloy It's been a while since I've worked on this, but if I'm not mistaken, it was the original test case at the beginning of this bug that caused the backtrack (and I believe I put it in the regression test suite as well).

@amalloy
Copy link
Contributor

amalloy commented Jul 17, 2014

We've been able to handle shell expansions containing ) for a while; I've added a test confirming this. Avoiding double-expansion of ambiguous shell commands is still an open question, but I don't think it's part of issue #118, so we should close this one.

@amalloy amalloy closed this as completed Jul 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants