Skip to content

chenguo/parsh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PARSH README


============
INTRODUCTION
============

  Parsh, the Parallel Dependency Shell, attempts to automate parallel execution
of commands whenever possible, without any extra effort from script writers or
generation programs.

  As a brief architectural summary, Parsh has a central file hash table data
structure and three main threads that operate on this structure: a parser, an
evaluator, and a reaper. These threads all communicate with a central file
hash table structure to determine which commands are runnable and which must wait.

  The file hash is a hash table of files that are currently being accessed or
will be accessed by processed commands. Each file has a single entry in the
table, and each entry contains an "accessor queue," a queue of commands waiting
for access to the file.  When multiple commands need access to a single file,
they are enqueued. The first command in the queue (or commands, if they are all
reads) has access to the file and are considered "runnable," while the
remaining are blocked.

  The parser takes input from the user or a script, parses the commands from
them, and adds them to the hash. At this stage, if a new command accesses a
file that is not already in the hash table, a new file entry will be created.
Otherwise the command will be enqueued in the command queue of an existing
file entry.

  The evaluator makes use of the concept of a "frontier," which is a list of
commands in the file hash table that are runnable. The hash table has taken
care of all matters of dependency and control flow, thus the evaluator only
needs to mindlessly evaluate the next command in the frontier list.

  The reaper catches signals sent by exiting processes, removes the
appropriate commands from the file hash, and re-evaluates the dependency status
of related commands, possibly marking them as runnable.


============
Dependencies
============

  There are three types of dependencies in Parsh:
  1) File dependencies
  2) Control flow dependencies
  3) Variable dependencies


File Dependencies:
  Commands attempting to access the same file are considered to collide if at
least one is not a read access. The structure for a file dependency has the
following fields:

  inode #: primary way of identifying files, since with inode numbers there is
no need to worry about links.
  absolute path: for resolving the inode # of a link, and as a secondary way
of identifying a file when an inode # is not available (i.e. the file has yet
to be created).
  read/write: access type to determine dependency

  For file dependencies, each file is hashed into the file hash table. Each
file entry contains the fields listed above. Each command that accesses a file
(an "accessor") will enqueue itself in the file entry's accessor queue. In the
case of multiple accessors, the head of the queue is runnable, while the rest
are blocked. In the case consecutive read accessors are at the head of the
queue, they are all simultaneously runnable.


Control Flow Dependencies:
  Some commands must know another's exit status to determine whether to run or
not. These maybe a command inside an IF statement, or a command waiting to see
if control flow reaches a BREAK. A simple IF-like dependency (IF, ||, UNTIL and
the such) are handled solely within the dependency hash. A CONTINUE or BREAK
dependency is more complicated, and thus requires its own data structure.

  nest level: the true level the CONTINUE or BREAK statement belongs to. For
example, a BREAK 2 nested 4 levels deep acts on other commands at nest level
2 or greater.
  parent: to follow the nest chain up.
  iteration: loop iteration. For nested loops, this is the iteration of the
innermost loop. If the iteration of an outer loop is needed, the parent
pointer should be followed.

  Note that in order to establish a dependency, a command must be at or below
the same nest level. For CONTINUE, to be at the same iteration is a
requirement, while for BREAK the command needs to be at the same or a later
iteration in the nest level of the BREAK command.

  Control flow dependencies are handled internally in the file hash table. A
control flow command will be considered an accessor of all the subcommands
in it. Upon evaluating the branch decision, the appropriate subcommands will
be inserted into the hash table.

  For example, an IF command is considered an accessor of all files
accessed in the conditional, if, and else part of the command. When the IF
command becomes runnable, the conditional part is evaluated. Depending on the
outcome, either the if or else part will become runnable.


Variable Dependencies:
  In Parsh, variables use static single assignment form, thus there is never
a value collision. Rather, the only variable dependencies in Parsh will be a
command waiting for a delayed value resolution. This will be explained in more
detail in the variables section.

  Variable dependencies are handled by the variable module. For the scope of
this section, it will be sufficient to say that a variable whose value is not
yet ready counts towards a accessor command's dependencies, and upon the value
becoming available the accessor will be updated.

About

Parallel Dependency Shell

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages