Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
bd00b88
Add support for pipe refs
gliargovas Aug 29, 2023
6cd9325
Remove assertions for command nodes due to PipeNodes
gliargovas Aug 29, 2023
db4ed66
Merge branch 'support-for-local-assignments' into support-for-pipes
gliargovas Aug 29, 2023
ade6fbd
Remove redundant logs
gliargovas Aug 29, 2023
867a88a
Check PipeNodes independently
gliargovas Aug 29, 2023
52b3936
Add first version of report script
gliargovas Aug 30, 2023
2ebcedc
Fix typo in ast PipeNode assertion check
gliargovas Aug 31, 2023
dac2afa
Make plotting individual and improve format
gliargovas Aug 31, 2023
5d4d9da
Add first dgsh benchmark
gliargovas Aug 31, 2023
2c1f699
Improve structure of benchmark report script
gliargovas Aug 31, 2023
735cc97
Merge pull request #57 from binpash/support-for-pipes
angelhof Aug 31, 2023
32ea150
Adjust benchmark report script to work in different directories
gliargovas Aug 31, 2023
1a4031c
Add 2nd dgsh (no func) script
gliargovas Aug 31, 2023
ae4fbd6
Add 2_no_func.sh
gliargovas Aug 31, 2023
d78e3f7
Use difflib to print line diffs
gliargovas Aug 31, 2023
ab1e225
Merge remote-tracking branch 'origin/future' into improve-benchmarks-…
gliargovas Sep 1, 2023
a3c1449
Add timestamp logging functions
gliargovas Sep 1, 2023
e8a051d
Add support for more complex timestamp logging
gliargovas Sep 1, 2023
f932a38
Add timed logs for main scheduler workflow
gliargovas Sep 1, 2023
416bfdd
Improve log messages and add log to loop unrolling
gliargovas Sep 4, 2023
738f48f
Change scheduler debug level
gliargovas Sep 4, 2023
1601a5d
Change config parsing to handle complex commands
gliargovas Sep 4, 2023
44a18fb
Use env for benchmarks instestead of input file args
gliargovas Sep 4, 2023
ddb8a52
Call the orch script correctly
gliargovas Sep 4, 2023
21b477f
Count how many times a command was executed along with time
gliargovas Sep 4, 2023
4ef1ab2
Add a gantt chart of benchmark execution
gliargovas Sep 4, 2023
e927c29
Print time lost and other execution statistics
gliargovas Sep 5, 2023
ed095a6
Fix awk parsing bug
gliargovas Sep 6, 2023
0d72535
Add 3.sh benchmark
gliargovas Sep 6, 2023
af0fcee
Uncomment awk commands
gliargovas Sep 6, 2023
98a8078
Add 4.sh benchmark
gliargovas Sep 6, 2023
d208132
Add 5.sh dgsh benchmark
gliargovas Sep 6, 2023
190afbd
Add benchmark config for dgsh scripts 3-5
gliargovas Sep 6, 2023
f04d945
Add 7.sh dgsh benchmark without function calls
gliargovas Sep 7, 2023
eeaa938
Comment-out 7.sh awk command
gliargovas Sep 7, 2023
b39ff74
Ad latest benchmark config for 6.sh and 7.sh
gliargovas Sep 7, 2023
743c8f2
Rename benchmark 7 to 8
gliargovas Sep 7, 2023
9880559
Add 9.sh
gliargovas Sep 7, 2023
e5ca215
Add more env variables to keep track of
gliargovas Sep 7, 2023
265909e
Split pre exec cmd outside of run command
gliargovas Sep 7, 2023
4a41309
Remove LC_ALL var export from benchmarks
gliargovas Sep 7, 2023
c3286ed
Adjust gantt plot dimensions dynamically
gliargovas Sep 7, 2023
727eed6
Add 7.sh
gliargovas Sep 8, 2023
9275db1
Correctly resolve all terminal nodes
gliargovas Sep 8, 2023
b441971
Add 17.sh
gliargovas Sep 8, 2023
4452cee
Add 16.sh
gliargovas Sep 8, 2023
00f9b19
Rename benchmark scripts
gliargovas Sep 8, 2023
01faf8f
Update the operation of some scheduling components
gliargovas Sep 12, 2023
aafd376
Add scheduler optimization args
gliargovas Sep 12, 2023
60dc74f
Optional sandbox killing based on arg
gliargovas Sep 12, 2023
63b2acb
Update logs while killing
gliargovas Sep 12, 2023
d4dc63c
Add early rerun first (buggy) implementation
gliargovas Sep 13, 2023
11657b9
Fix early resolution corectedness issues
gliargovas Sep 13, 2023
fb4c863
Save orch logs while running benchmarks
gliargovas Sep 13, 2023
ac0720e
Add options to save results
gliargovas Sep 14, 2023
2a7e3e4
Add sandbox kiling arg again
gliargovas Sep 14, 2023
00198bd
Add correct argument splitting
gliargovas Sep 14, 2023
42c5a50
Rewrite proc killing commands using psutil
gliargovas Sep 15, 2023
6b3e678
Use most recent process killing methods
gliargovas Sep 15, 2023
492c3ad
Fix typos in 7.sh
gliargovas Sep 15, 2023
a996e51
Fix minor proc killing bug
gliargovas Sep 16, 2023
168c713
Remove redundant pre-commit resolution checks
gliargovas Sep 19, 2023
f240d69
Use recent pash version
gliargovas Sep 19, 2023
4e98790
Add matplotlib dependency for reporting
gliargovas Sep 19, 2023
af0d3f6
Fix trace parsing for mkdir -p
gliargovas Sep 19, 2023
dd5365a
Add the option to run the trace parser independently
gliargovas Sep 19, 2023
7b68326
Add README.md for benchmarks
gliargovas Sep 19, 2023
d6f7f79
Add improved top-level README
gliargovas Sep 19, 2023
53c5ea5
Update README.md
gliargovas Sep 20, 2023
f4880ee
Fix typo in installation script
gliargovas Sep 21, 2023
db75a7c
Remove redundant reporting script
gliargovas Sep 21, 2023
5aa5834
Add option to start speculation on first wait
gliargovas Sep 21, 2023
0bcf463
Add extra check when setting most recent env
gliargovas Sep 21, 2023
f96e20a
Add early on-wait-received env resolution check
gliargovas Sep 21, 2023
b7d6077
Add check for initial env on non-po nodes
gliargovas Sep 21, 2023
eb218df
move env check on wait after loop unrolling
gliargovas Sep 21, 2023
62ae3d8
Simplify Node 0 env initialization
gliargovas Sep 21, 2023
749a77f
Refactor env resolution to improve quality
gliargovas Sep 21, 2023
541873f
Remove sleep while killing processes
gliargovas Sep 21, 2023
ea169bf
Use most recent pash instance
gliargovas Sep 21, 2023
97dea9f
Add a reversed instance of 1.sh
gliargovas Sep 25, 2023
8ba9653
Add improved config for running dgsh benchmarks
gliargovas Sep 25, 2023
74de062
Use correct speedup calculation
gliargovas Sep 25, 2023
09f3579
Move env resolution after loop unrolling but before po progression
gliargovas Sep 25, 2023
25c854d
Make update_and_restart_nodes work with the transitive closure of the…
gliargovas Sep 25, 2023
353101f
Clarify hs is not a shell
gliargovas Sep 26, 2023
f9fb245
Capitalize config global variables
gliargovas Sep 26, 2023
54ec042
Removeredundant commented-out assertion
gliargovas Sep 26, 2023
5c7ef20
Addexplanatory comments and future full PO assertions
gliargovas Sep 26, 2023
06c1e07
Refactor dependency resolution by abstracting early and late resolution
gliargovas Sep 26, 2023
2f9372c
Remove redundant arg
gliargovas Sep 26, 2023
929070b
Use most recent pash changes
gliargovas Sep 26, 2023
509b6aa
Merge pull request #59 from binpash/improve-benchmarks-and-reporting
gliargovas Sep 26, 2023
52b0d90
Merge branch 'improve-benchmarks-and-reporting' into future
gliargovas Sep 26, 2023
5a73bd6
Merge branch 'main' into future
gliargovas Sep 26, 2023
8935a84
Resolve merge conflits with main correctly
gliargovas Sep 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 51 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,78 @@
## Dynamic Parallelizer
## hs README

A dynamic parallelizer that optimistically/speculatively executes everything in a script in parallel and ensures that it executes correctly by tracing it and reexecuting the parts that were erroneous.
### Overview

## Installing
`hs` is a system for executing shell scripts out of order. It achieves this by tracing the script's execution, and if an error arises due to speculative execution, the script re-executes the necessary parts to ensure correct outcomes. The project aims to boost the parallel execution of shell scripts, reducing their runtime and enhancing efficiency.

```sh
./scripts/install_deps_ubuntu20.sh
```
### Structure

## Tests
The project's top-level directory contains the following:

To run the tests:
```sh
cd test
./test_orch.sh
```
- `deps`: Dependencies required by `hs`.
- `docs`: Documentation and architectural diagrams.
- `model-checking`: Tools and utilities for model checking.
- `parallel-orch`: Main orchestration components.
- `pash-spec.sh`: Entry script to initiate the `hs` process.
- `README.md`: This documentation file.
- `report`: Generated reports related to test runs and performance metrics.
- `requirements.txt`: List of Python dependencies.
- `Rikerfile`: Configuration file for Riker.

### TODO Items
### Installation

#### Complete control flow and complex script support
Install `hs` on your Linux-based machine by following these steps:

Extend the architecture to support complete scripts and not just partial order graphs of commands.
**Note:** Currently works with `Ubuntu 20.04` or later

A potential solution is shown below:
1. Navigate to the project directory:
```sh
cd path_to/dynamic-parallelizer
```

![Architecture Diagram](/docs/handdrawn_architecture.jpeg)
2. Run the installation script:
```sh
./scripts/install_deps_ubuntu20.sh
```

This solution includes a preprocessor that creates two executable artifacts:
- the preprocessed/instrumented script (similar to what the PaSh-JIT preprocessor produces)
- the partial program order graph (a graph of commands that will be speculated and executed with tracing from the orchestrator)
This script will handle all the necessary installations, including dependencies, try, Riker, and PaSh.

The graph might contain unexpanded commands, so the orchestrator should support unexpanded strings.
On these commands, the orchestrator can speculate for the value of these strings and then when they become the frontier (the preprocessed script has reached them), we actually know their values and could confirm/abort the speculation.
### Running `hs`

The two executors communicate with each other and progress through the script execution in tandem. The JIT executor (left) also needs to trace execution to inform the orchestrator about changes in the environment.
The main entry script to initiate `hs` is `pash-spec.sh`. This script sets up the necessary environment and invokes the orchestrator in `parallel-orch/orch.py`. It's designed to accept a variety of arguments to customize its behavior, such as setting debug levels or specifying log files.

#### Orchestator: Partial Program Order Graph
Example of running the script:

**Note:** we have moved to a continuous scheduling implementation. An example explaining its operation can be found [here](/docs/example.md).
```bash
./pash-spec.sh [arguments] script_to_speculatively_run.sh
```

The orchestrator needs to support arbitrary partial program order graphs (instead of just sequences of instructions), to figure out the precise real program order dependencies.
**Arguments**:

- `-d, --debug-level`: Set the debugging level. Default is `0`.
- `-f, --log_file`: Define the logging output file. By default, logs are printed to stdout.
- `--sandbox-killing`: Kill any running overlay instances before committing to the lower layer.
- `--env-check-all-nodes-on-wait`: On a wait, check for environment changes between the current node and all other waiting nodes. (not fully functional yet!)

An instance of a graph is shown below:
### Testing

![Example Partial Program Order Graph](/docs/handdrawn_partial_program_order.jpeg)
To run the provided tests:

One important characteristic of the graph (and the speculative execution algorithm) is that there is a committed prefix-closed part that has already executed and cannot be affected.
The rest of the graph is uncommited and therefore might or might not have completed execution. The uncommited frontier, the part of the graph adjacent to the prefix is guaranteed to execute and complete without speculation (since we have both the environment and the variables resolved) and this is part of the argument for the termination of the algorithm. Every step that the orchestration takes, it can always commit the uncommited frontier, and therefore the commited prefix grows until it reaches the whole graph.
```bash
./test/test_orch.sh
```

#### Orchestrator: Backward dependencies and Execution Isolation/Aborting/Reverting
For in-depth analysis, set the `DEBUG` environment variable to `2` for detailed logs and redirect logs to a file:

How do we resolve backward dependencies? For example:
```sh
grep foo in1 > out1
grep bar in0 > in1 ## Its write might affect the first command exec.
```bash
DEBUG=2 ./test/test_orch.sh 2>logs.txt
```

One solution would be to run the non-frontier (non-root) commands in an isolated environment and only at the end of their execution commit their results. This might have significant overhead, except if we can just write to temporary files and then move them? Or let them work in a temporary directory?
### Contributing and Further Development

Contributions are always welcome! The project roadmap includes extending the architecture to support complete scripts, optimizing the scheduler for better performance, etc.

Another way would be to dynamically track writes of non-frontier commands and stop them when they try to write to something that might be a read dependency of the first, but there are timing issues here that I don't see how to resolve.
For a detailed description of possible optimizations, see the [related issues](https://github.com/binpash/dynamic-parallelizer/issues?q=is%3Aopen+is%3Aissue+label%3Aoptimization)

#### Commands that change current directory
### License

Can we actually trace that and not run these commands? Is that simply a change of an environment variable? They will run in a forked version anyway, but we want to see their results.
`hs` is licensed under the MIT License. See the `LICENSE` file for more information.
2 changes: 1 addition & 1 deletion deps/pash
Submodule pash updated 2 files
+9 −0 compiler/pash.py
+2 −1 requirements.txt
64 changes: 42 additions & 22 deletions parallel-orch/analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,47 +29,67 @@ def parse_shell_to_asts(input_script_path) -> "list[AstNode]":
except libdash.parser.ParsingException as e:
logging.error(f'Parsing error: {e}')
exit(1)


def validate_node(ast) -> bool:
assert(isinstance(ast, (CommandNode, PipeNode)))
if isinstance(ast, CommandNode):
return True
else:
for cmd in ast.items:
assert isinstance(cmd, CommandNode)

## Returns true if the script is safe to speculate and execute outside
## of the original shell context.
##
## The script is not safe if it might contain a shell primitive. Therefore
## the analysis checks if the command in question is one of the underlying
## shell's primitives (in our case bash) and if so returns False
def safe_to_execute(asts: "list[AstNode]", variables: dict) -> bool:
## There should always be a single AST per node and it must be a command
assert(len(asts) == 1)
ast = asts[0]
assert(isinstance(ast, CommandNode))
logging.debug(f'Ast in question: {ast}')

def is_node_safe(node: CommandNode, variables: dict) -> str:
## Expand and check whether the asts contain
## a command substitution or a primitive.
## a command substitution or a primitive.
## If so, then we need to tell the original script to execute the command.

## Expand the command argument
cmd_arg = ast.arguments[0]
cmd_arg = node.arguments[0]
exp_state = expand.ExpansionState(variables)
## TODO: Catch exceptions around here
expanded_cmd_arg = expand.expand_arg(cmd_arg, exp_state)
cmd_str = string_of_arg(expanded_cmd_arg)
logging.debug(f'Expanded command argument: {expanded_cmd_arg} (str: "{cmd_str}")')

## TODO: Determine if the ast contains a command substitution and if so
## run it in the original script.
## In the future, we should be able to perform stateful expansion too,
## and properly execute and trace command substitutions.


## KK 2023-05-26 We need to keep in mind that whenever we execute something
## in the original shell, then we cannot speculate anything
## after it, because we cannot track read-write dependencies
## in the original shell.

if cmd_str in BASH_PRIMITIVES:
return False

return True


def is_pipe_node_safe_to_execute(node: PipeNode, variables: dict) -> bool:
for cmd in node.items:
logging.debug(f'Ast in question: {cmd}')
if not is_node_safe(cmd, variables):
return False
return True

## Returns true if the script is safe to speculate and execute outside
## of the original shell context.
##
## The script is not safe if it might contain a shell primitive. Therefore
## the analysis checks if the command in question is one of the underlying
## shell's primitives (in our case bash) and if so returns False
def safe_to_execute(asts: "list[AstNode]", variables: dict) -> bool:
## There should always be a single AST per node and it must be a command
assert(len(asts) == 1)
if isinstance(asts[0], PipeNode):
return is_pipe_node_safe_to_execute(asts[0], variables)
else:
assert(isinstance(asts[0], CommandNode))
logging.debug(f'Ast in question: {asts[0]}')
return is_node_safe(asts[0], variables)
## TODO: Determine if the ast contains a command substitution and if so
## run it in the original script.
## In the future, we should be able to perform stateful expansion too,
## and properly execute and trace command substitutions.


BASH_PRIMITIVES = ["break",
"continue",
"return"]
Expand Down
13 changes: 9 additions & 4 deletions parallel-orch/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os
import subprocess
import logging
import time


## TODO: Figure out how logging here plays out together with the log() in PaSh
Expand All @@ -27,17 +28,21 @@ def log_root(msg, *args, **kwargs):


## Ensure that PASH_TMP_PREFIX is set by pa.sh
assert(not os.getenv('PASH_SPEC_TMP_PREFIX') is None)
PASH_SPEC_TMP_PREFIX = os.getenv('PASH_SPEC_TMP_PREFIX')

SOCKET_BUF_SIZE = 8192

SCHEDULER_SOCKET = os.getenv('PASH_SPEC_SCHEDULER_SOCKET')

MAX_KILL_ATTEMPTS = 10 # Define a maximum number of kill attempts for each process in the partial program order

INSIGNIFICANT_VARS = {'PWD', 'OLDPWD', 'SHLVL', 'PASH_SPEC_TMP_PREFIX', 'PASH_SPEC_SCHEDULER_SOCKET', 'PASH_SPEC_TOP',
'PASH_TOP', 'PASH_TOP_LEVEL','RANDOM', 'LOGNAME', 'MACHTYPE', 'MOTD_SHOWN', 'OPTERR', 'OPTIND',
'PPID', 'PROMPT_COMMAND', 'PS4', 'SHELL', 'SHELLOPTS', 'SHLVL', 'TERM', 'UID', 'USER', 'XDG_SESSION_ID'}

SIGNIFICANT_VARS = {'foo', 'bar', 'baz'}
SIGNIFICANT_VARS = {'foo', 'bar', 'baz', 'file1', 'file2', 'file3', 'file4', 'file5', 'LC_ALL', 'nchars', 'filename'}

START_TIME = time.time()

NAMED_TIMESTAMPS = {}

SANDBOX_KILLING = False
SPECULATE_IMMEDIATELY = False
14 changes: 7 additions & 7 deletions parallel-orch/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,18 @@ def async_run_and_trace_command_return_trace(command, node_id, latest_env_file,
trace_file = util.ptempfile()
stdout_file = util.ptempfile()
stderr_file = util.ptempfile()
post_exec_env = util.ptempfile()
post_execution_env_file = util.ptempfile()
logging.debug(f'Scheduler: Stdout file for: {node_id} is: {stdout_file}')
logging.debug(f'Scheduler: Stderr file for: {node_id} is: {stderr_file}')
logging.debug(f'Scheduler: Trace file for: {node_id}: {trace_file}')
process = async_run_and_trace_command_return_trace_in_sandbox(command, trace_file, node_id, stdout_file, stderr_file, latest_env_file, post_exec_env, speculate_mode)
return process, trace_file, stdout_file, stderr_file, post_exec_env
process = async_run_and_trace_command_return_trace_in_sandbox(command, trace_file, node_id, stdout_file, stderr_file, latest_env_file, post_execution_env_file, speculate_mode)
return process, trace_file, stdout_file, stderr_file, post_execution_env_file

def async_run_and_trace_command_return_trace_in_sandbox_speculate(command, node_id, latest_env_file):
process, trace_file, stdout_file, stderr_file, post_exec_env = async_run_and_trace_command_return_trace(command, node_id, latest_env_file, speculate_mode=True)
return process, trace_file, stdout_file, stderr_file, post_exec_env
process, trace_file, stdout_file, stderr_file, post_execution_env_file = async_run_and_trace_command_return_trace(command, node_id, latest_env_file, speculate_mode=True)
return process, trace_file, stdout_file, stderr_file, post_execution_env_file

def async_run_and_trace_command_return_trace_in_sandbox(command, trace_file, node_id, stdout_file, stderr_file, latest_env_file, post_exec_env, speculate_mode=False):
def async_run_and_trace_command_return_trace_in_sandbox(command, trace_file, node_id, stdout_file, stderr_file, latest_env_file, post_execution_env_file, speculate_mode=False):
## Call Riker to execute the command
run_script = f'{config.PASH_SPEC_TOP}/parallel-orch/run_command.sh'
args = ["/bin/bash", run_script, command, trace_file, stdout_file, latest_env_file]
Expand All @@ -32,7 +32,7 @@ def async_run_and_trace_command_return_trace_in_sandbox(command, trace_file, nod
else:
args.append("standard")
args.append(str(node_id))
args.append(post_exec_env)
args.append(post_execution_env_file)
# Save output to temporary files to not saturate the memory
logging.debug(args)
process = subprocess.Popen(args, stdout=None, stderr=None)
Expand Down
Loading