Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encourage dataflow programmers to switch to streem from shell #36

Open
ekg opened this issue Dec 17, 2014 · 4 comments
Open

encourage dataflow programmers to switch to streem from shell #36

ekg opened this issue Dec 17, 2014 · 4 comments

Comments

@ekg
Copy link

ekg commented Dec 17, 2014

I do a lot of flow-oriented programming in shell, so I'm interested in streem as a replacement for the traditional unix shell. (I use zsh, but the differences are minimal enough in this sense that we can talk about shell generally.)

Unix shell scripting was designed from the very beginning with flow-oriented programming is mind. However, while it is trivial to make pipelines in shell, it is not easy to construct logically-driven stream processing functions, where data flows are managed directly using code written in shell. There are a number of workarounds. Often it's easiest to write such functions in other languages and glue them together in shell. Take, for instance, the universe of perl, awk, and sed one-liners that litter an old web of ~user pages and persist in the various stackexchanges and github. I think this is very much a path-dependent situation that relates to the difficulty of writing an efficient, interactive, and interpreted language.

Despite its age and cruft, shell has a huge following in data-oriented science, where I imagine streem would have the greatest impact. These are exactly the users we'd like to attract, and the most complete form of capture would be if they switched from using bash, zsh, and tcsh to using streem itself. I don't think this would be too difficult to achieve, and would have little bearing on the functionality delivered by the language. It would have an effect on its usability.

I propose that streem adopts command syntax that would enable a streem REPL to be used in place of a traditional unix shell. In my mind, this would imply a few basic considerations:

  • It should use, or at least allow for [command] [arg]* syntax. In other words, it shouldn't require the use of parentheses for every function call.
  • It should employ shell-like semantics for stream redirection. This already seems to be the case for pipes. Writing to a file, or into a named streem function, should be as easy as [command] [arg]* >[file-or-function].
  • STDIN and STDOUT should be synonymous with /dev/stdin and /dev/stdout.
  • streem should have a PATH that describes where non-streem system commands can be found. Some efforts should be made to not clobber functions that are considered standard (cd, ls, pwd, ...), but it should also be OK for streem functions to redefine these commands (seq being an example that has already been shown in the examples).
  • streem should adopt syntax that eases the kinds of patterns that are met by functions in moreutils, such as tee (write a stream to multiple sinks/files), and pee (push a data stream into multiple other pipelines or functions). One approach would be to use > rather than | to allow for forks and clarify the type of the sink that is being used (is it conceptually a named pipe, a file, or a function). For example, >@ could indicate forking/splitting a pipeline into a named streem function or command.
  • Perhaps streem could employ the shell syntax & for spawning subprocesses and job control.

The idea isn't to remain compliant with *nix standards, but to make an environment which is attractive to the largest group of people who are currently using flow-oriented patterns.

@alexispurslane
Copy link

👍 This is a good idea

@nickmccurdy
Copy link
Contributor

@ekg

I propose that streem adopts command syntax that would enable a streem REPL to be used in place of a traditional unix shell.

While I don't like a lot of syntax in standard shells, and while Streem would make things a lot nicer syntactically, I think this is a pretty big goal. In my opinion, this would make Streem too complicated. This would bring a lot of reserved words and weird syntax into Streem (for example, the tons of different conditional statements you can have in bash, and weird semantics with spaces in variables). I think that being able to have nicer syntax for common tasks would be more useful than having full shell support, especially since more explicit syntax sugar could easily be added for calling shell commands (like system() or the backticks in Ruby).

I like the idea of having a shell for Streem. However, pulling commands that could conflict with Streem functions could be confusing, especially if something is actually a shell builtin but seems like a normal command. I personally disagree with the goal of making Streem a full shell compatible with bash/zsh/etc. For example, what happens if someone calls if (the bash builtin) in Streem? What happens if someone calls a bash function and expects to be able to use certain escaped characters supported by bash, but Streem's escapes are used instead and do something different (or vice versa)? Additionally, this gets more complicated when people have their PATH set up in unexpected ways, or may be using a system shell with slightly different syntax than bash. If we try to make Streem back compatible with system shells, we're limited in what syntax we can implement, or we'll have to find detours around clashes in syntax.

It should use, or at least allow for [command] [arg]* syntax. In other words, it shouldn't require the use of parentheses for every function call.

This could make syntax more confusing, as parenthesis (while somewhat ugly) make things more obvious. However, we could theoretically do something like what Haskell does (where arguments don't need parenthesis or commas, but you can use currying). That could be pretty interesting, especially if Streem would curry by default (like Haskell).

It should employ shell-like semantics for stream redirection...

While > and < are less obvious than file IO functions, I think having that syntax would be pretty interesting, since it's very stream friendly and concise.

streem should adopt syntax that eases the kinds of patterns that are met by functions in moreutils...

This sounds like a pretty flexible idea.

@ekg
Copy link
Author

ekg commented Dec 18, 2014

@nicolasmccurdy

I'm going to attempt to clarify, as I think there is some confusion about what I'm proposing. For instance, I didn't intend to imply this at all:

For example, what happens if someone calls if (the bash builtin) in Streem? What happens if someone calls a bash function and expects to be able to use certain escaped characters supported by bash, but Streem's escapes are used instead and do something different (or vice versa)?

I am not suggesting some kind of merge of the reserved words and syntax from shell. Streem should stand as its own language. My suggestion is that it borrow the most-useful stream and dataflow-oriented idioms from shell. These do not include logical control structures in shells, which are so annoying to use that even experienced users of unix shells typically drop into more syntactically consistent languages when writing applications that include more than a few if statements. I agree this would make things very complicated, but for that reason it's not what I propose.

If we try to make Streem back compatible with system shells, we're limited in what syntax we can implement, or we'll have to find detours around clashes in syntax.

I agree. Different system shells aren't even completely compatible with each other, so this would seem an impossible and sisyphean goal. I mean that streem, with its own syntax and conventions, should still be perfectly usable as a replacement for other systems shells.

Let me try to refine the idea:

  1. System commands and files (as they produce or store streams of data) are natural first-class citizens in streem. There should be no need to use backtick escaping, system(), or file.open() syntax to call them or interact with them. The user PATH should be respected in this sense, although it's sensible that streem primitives should take precedence when using streem.
  2. Dataflow oriented constructs like <, >, |, ( ), <( ), should be used wholesale in streem, and in ways that are much more uniform than they have been used in system shells. For example, > being used to generate forks, rather than requiring external utility functions to do so.

(1) would allow a streem REPL to be used in place of a system shell for basic file management, data inspection, and system navigation. The result would be an interactive setting in which pipelines can be prototyped quickly. As this is currently the case for shell, streem could only do worse by not enabling this kind of work. (2) provides convenience that really matters for interactive work, and it builds up a particular way of working with data that is familiar to almost everyone who seeks to do with shell what they might someday better do with streem.

The basic idea is that someone familiar with bash, tcsh, or zsh could sit down and navigate the filesystem, inspect data, and run some basic processing tasks in streem with exactly the same ease as they currently can in shell. This means that commands in the path are exposed as functions in streem, and "just do" what you expect them to.

I'm focusing on the usability standpoint, which suggests to me a particular, undecorated, command syntax. As you note, currying could be an interesting paradigm to manage this:

This could make syntax more confusing, as parenthesis (while somewhat ugly) make things more obvious. However, we could theoretically do something like what Haskell does (where arguments don't need parenthesis or commas, but you can use currying). That could be pretty interesting, especially if Streem would curry by default (like Haskell).

This might be an approach to rationalize this kind of organization and improve interactive usability.

On the other hand, in the design I'm describing we are left with the issue of variable escaping. Perhaps a command(...) syntax is a way to mitigate this issue as well.

@alexispurslane
Copy link

👍👍 When streem is done, I'm not sticking around with bash anymore!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants