A functional DSL that helps data pipelining in Bash
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
Cargo.lock
Cargo.toml
README.md

README.md

About

Parabash is a language that can be thought of as a Parallel Bash interpreter. It executes Bash commands in parallel, providing support for chaining callbacks on completion of either a collection or a single shell command, as well as conditional error handling.

Why?

Writing large-scale Bash scripts becomes difficult for the following reasons:

  1. Bash scripts are typically not linted prior to execution. Minor typographical errors and awry types can cause scripts to exit unexpectedly. if conditions in Bash are notoriously cumbersome, yet arise frequently.
  2. Bash scripts exhibit poor concurrency. All parallelism is via background scripts. wait forces main blocking. Individual actions cannot register a callback on completion.
  3. Bash scripts exhibit poor error handling. Chaining via && is the only guarantee that successive operations will not execute. $? captures only the value of the last command run.

Bash is not a great data pipelining language - but, despite this drawback, Bash survives and is used widely even in legacy systems. Bash is also superb compared to certain languages in certain respects, such as regular expression evaluation.

Rather than fix Bash's flaws, it makes more sense to provide a better abstraction over Bash. This is an attempt at that.

Features

Parabash aims to support:

  1. High concurrency for embarassingly parallel scripts.
  2. Functional-style chaining, filtering by cases, and reducing for outputs.
  3. Strict guarantees about types: all passed values are strings only.

Examples

Hello World

A sample Parabash script:

main = (`echo "Hello world!"`)

Commands to run are encased in backticks. The () represent a single unit of parallel execution - it may contain more than one command. The entry point for our language is referenced by a reserved keyword called main.

Concurrent Commands

A more interesting example:

get_logged_in_users = (`who`) -> (
			          `cut -d' ' -f 1 | sort | uniq`, # get all users
				  `cut -d' ' -f 12 | sort | uniq`  # get all timestamps
                                 ) -> (`echo "Done!"`)  # only runs when all the above are finished

get_current_tcp_connections = (`lsof -i TCP`)

main = (
        get_logged_in_users,
        get_current_tcp_connections -> (`grep server`),
        `echo "Hi again!"`
       )

The -> syntax permits us to chain commands to be run in sequence. The , syntax specifies that these comma-separated blocks commands are to be run in parallel.

If the preceding () contains only one command, the stdout of that command and the exit code is passed to the next () - they are accessible as ${1} and ${2} respectively inside a Bash statement enclosed in backticks, and both are strings. If the preceding () contains multiple commands, the next () only runs when all of the statements above have completed - no stdout is passed in to that command.

As may be obvious, # are used for comments.

You can assign a variable name to a chain of commands, and reuse that name everywhere instead for that command. Variable declarations are scoped to the enclosing () - if declared outside parentheses, they are part of the global scope and are accessible everywhere.

Conditional Handling

main = (
        (`ls file.sh`) [-> (`echo "Success!"), 
                        -!> (`echo "Not success!"`)]
        (`ls other_file.sh`)
       ) [-> (`echo "All tasks completed"`),
          -!> (`echo "A task failed")]

The [] indicate case conditions. The -> is what to run if exit code of the preceding command was zero (success). The -!> is what to run if it is not zero (failure).

If the preceding () contains multiple commands, the ->, -!> conditions only execute for those commands which do not have a dedicated case condition for them. All commands in -!> receive stderr and the exit code, both of which continue to be treated as strings.

This is another way of saying that a single malfunctioning script in a collection of scripts will propagate its error until it finds a relevant [] or escapes to global scope, at which point it will throw an error and stop everything. You cannot chain a -!> to global scope.

Infinite composition

main = (
        (`lsof -i 4`),
        (`lsof -i 6)
       ) | (`grep server`)

The | operator appends the stdouts of all the commands in the preceding () together and sends them as one to the next (). Note that, unlike ->, all of the commands in the next () receive the joined stdouts. Exit codes, which are also sent along, are always presumed zero - if any script fails inside a preceding (), it will trigger an error and the | operator will never get called.

Variable declaration and usage

As mentioned above, variables may be defined as a simple variable = value within a () scope.

Variables are automatically hoisted and always evaluated first. If a variable points to a (), the rest of the scripts in the enclosing scope will wait until that variable value resolves to just the stdout of that () (exit codes are never assigned to variables). This sort of behaviour is useful for defining a limited block.

Additionally, Parabash supports string interpolation for variable values using ${variable} syntax inside backticks.

main = (
        `echo "${MYNAME}"`
        # the following is hoisted to the top of the scope, sleeps for five seconds, resolves to 
        # `Parabash`. Only then does `echo ${MYNAME}` run. 
        MYNAME = (`sleep 5`) -> (`echo "Parabash"`),
       )

See Special Dollar Signs for some important notes on the usage of ${}. For obvious reasons, variable names cannot be primitives like 1, 1.023, etc, but must be simple strings without any whitespace.

An undefined variable name will throw a compilation error.

FAQS and Gotchas

Special Dollar Signs

The standard dollar signs are not available in Parabash. It is possible to use them within a single script, but do not expect them to have any semantic meaning outside the context of your program - $?, for instance, will not work as you expect.

This makes sense because each bash command is effectively sandboxed from the other. They have no knowledge of backgrounded processes, and the shell ID will not be of much use since each process runs in its own shell.

However, Parabash does make available exactly one $ to all scripts, though it is not one in standard Bash. This is the ${} construct, which can be used for string interpolation. Stdout (or, in error cases passed to ->, stderr) of the previous process is always accessible as ${1} - the exit code is always accessible as ${2}. Variables defined in this scope or higher are always accessible by name.

In typical Bash, ${} is shell parameter expansion. This is no longer possible to do in Parabash, since Parabash actually expands this before letting Bash take over.

Case handling

Sometimes you only want commands to run if some condition is met with the previous (). This may be accomplished by using && inside backticks.

main = (`$(( ( RANDOM % 9 )  + 1 ))`) -> (`grep 5 <<< ${1} && echo "Yay!"`)

It is actually highly recommended to use grep as needed.

Implementation details

Parabash will be implemented in Rust using the nom parser combinator library.

Grammar

EBNF format (probably will be revised as I identify any issues with the format)

Variable = [A-Za-z0-9]
String = "'", .*, "'"
Command = "`", .*, "`"
Unit = Command | Variable, '=', (String | Variable)
Executable = "(", [{Unit, ","}], ")"
ChainedExecutable = CaseExecutable, [{"->", ChainedExecutable}]
CaseExecutable = Executable ["[", "->", ChainedExecutable, ",", "-!>", ChainedExecutable "]"]
Composition = Executable, "|" , CaseExecutable