A Unix pipeline debugger, profiler, and optimizer. Give it any shell pipeline string and Pipespy runs each stage while capturing intermediate data, timing, and line/byte counts between stages. It then renders a visual flow report showing where data gets filtered, which stages are bottlenecks, and how much data flows through each pipe.
The novel part: Pipespy also acts as a linter for shell pipelines. It detects common anti-patterns (useless use of cat, sort-before-grep, grep piped to wc -l, redundant sorts, awk-used-as-cut) and suggests concrete, runnable optimizations — rewritten pipeline fragments with estimated speedup. Think of it as a profiler, debugger, and static analyzer combined into one tool for the humblest unit of Unix computing: the pipe.
Pipespy works with any pipeline you can express as a string. It handles quoted arguments, nested subshells, escaped pipes, and environment variable prefixes. It runs each stage sequentially with intercepted I/O, so you get an accurate picture of what happens at every step without modifying your original commands.
- Stage-by-stage profiling: execution time, input/output line counts, byte sizes, and percentage breakdowns
- Data flow visualization: see how many lines and bytes flow between each pair of stages, and where data gets dropped
- Bottleneck detection: automatically identifies the slowest stage
- Filter analysis: finds the stage that removes the most data and shows filter ratios
- Anti-pattern detection: identifies 8 common pipeline inefficiencies
- Useless use of cat
- Sort before grep (wasted sort on unfiltered data)
- Consecutive grep chains that could be combined
- Redundant sorts
- Echo piped to a command (use here-strings)
- grep | wc -l (use grep -c)
- awk used only as cut
- Large sorts without prior filtering
- Optimization suggestions: concrete rewrites with estimated speedups
- Remove useless cat (use direct file read or redirection)
- Replace grep | wc -l with grep -c
- Move filters before sorts to reduce data volume
- Add LC_ALL=C for faster byte-wise sorting/matching
- Use sort --parallel for large datasets
- Sample data inspection: peek at the actual data flowing through each stage
- JSON output: machine-readable results for integration with other tools
- Static analysis mode: analyze pipelines without executing them
- Timeout control: per-stage timeouts to prevent runaway commands
Requires Python 3.9+. No external dependencies.
# Clone and install
git clone https://github.com/bkmashiro/pipespy.git
cd pipespy
pip install -e .
# Or run directly without installing
python -m pipespy "your | pipeline | here"# Profile a pipeline and see where time and data go
pipespy "cat /var/log/syslog | grep error | sort | uniq -c | sort -rn | head -20"
# Get optimization suggestions for a messy pipeline
pipespy "cat access.log | sort | grep 404 | sort | grep -v static | wc -l" --no-color
# Static analysis only — no execution, just anti-pattern detection
pipespy "cat file | sort | grep pattern | wc -l" --no-run
# JSON output for scripting
pipespy "echo hello | wc -w" --json
# Inspect actual data at each stage
pipespy "printf 'banana\napple\ncherry\n' | sort | head -2" --samplespipespy "cat server.log | grep '\" 500 ' | awk '{print \$1}' | sort | uniq -c | sort -rn | head"Output shows each stage with timing, line counts, filter ratios, and a visual time bar:
Stage 1: cat server.log
Time: 683us ██░░░░░░░░░░░░░░░░░░ 12%
Out: 5,000 lines (609.3 KB)
--------------------------------------------------
Stage 2 [BIGGEST FILTER]: grep '" 500 '
Time: 1.1ms ███░░░░░░░░░░░░░░░░░ 19%
In: 5,000 lines (609.3 KB)
Out: 284 lines (34.7 KB)
Filter: 94.3% removed
...
Pipespy automatically flags common mistakes:
pipespy "cat log | sort | grep ERROR | sort | grep FATAL | wc -l" --no-run Anti-patterns Detected
------------------------------------------------------------
[i] useless-cat (stage 1)
Useless use of cat — sort can read files directly.
Suggestion: Replace `cat log | sort ...` with `sort ... log`
[~] sort-before-grep (stage 2)
sort runs before grep — the sort processes more data than necessary.
Suggestion: Move grep before sort to filter first, then sort the smaller result set.
[i] grep-wc (stage 6)
grep piped to wc -l — grep has a built-in count flag.
Suggestion: Replace `grep FATAL | wc -l` with `grep -c FATAL`
pipespy "cat access.log | grep 404 | awk '{print \$1}' | sort | uniq -c | sort -rn | head" Optimization Suggestions
------------------------------------------------------------
remove-cat: Remove useless cat — grep can read access.log directly.
Before: cat access.log | grep 404 | awk '{print $1}' | sort | uniq -c | sort -rn | head
After: grep 404 access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head
Speedup: eliminates one process + pipe
lc-all-c: Set LC_ALL=C for grep for byte-wise comparison (much faster on ASCII data).
Before: grep 404
After: LC_ALL=C grep 404
Speedup: ~2-5x for sort/grep on ASCII
pipespy "printf 'a\nb\nc\n' | sort | head -2" --jsonReturns a structured JSON object with stages, summary, antipatterns, and optimizations arrays — useful for CI integration or building dashboards.
The Data Flow section shows exactly how data moves between stages:
Data Flow
------------------------------------------------------------
cat --> 5,000 lines (609.3 KB) (-94.3%) grep
grep --> 284 lines (34.7 KB) awk
awk --> 284 lines (3.7 KB) sort
sort --> 284 lines (3.7 KB) (-82.7%) uniq
uniq --> 49 lines (1.0 KB) sort
sort --> 49 lines (1.0 KB) (-79.6%) head
# Feed a file as stdin to the first stage
pipespy "grep ERROR | awk '{print \$NF}' | sort | uniq -c" --input server.log| Flag | Description |
|---|---|
-s, --samples |
Show sample data (first/last 5 lines) at each stage |
-j, --json |
Output results as JSON |
--no-color |
Disable ANSI color codes |
--no-run |
Static analysis only (no execution) |
-t, --timeout |
Per-stage timeout in seconds (default: 60) |
--keep |
Keep intermediate temp files (prints paths to stderr) |
-i, --input |
Feed a file as stdin to the first stage |
-V, --version |
Show version |
src/pipespy/
__init__.py Package metadata and version
__main__.py python -m pipespy entry point
cli.py Argument parsing, orchestration
parser.py Pipeline string -> list of PipelineStage objects
Handles quoting, escapes, subshells, env prefixes
executor.py Runs each stage sequentially with intercepted I/O
Captures timing, byte counts, line counts, samples
analyzer.py Computes aggregate stats: bottleneck, biggest filter,
overall reduction, time fractions, data flow edges
antipatterns.py 8 pattern detectors that flag common pipeline mistakes
optimizer.py 5 optimization passes that suggest concrete rewrites
display.py Renders visual reports (ANSI terminal) or JSON output
The execution model is straightforward: the parser splits the pipeline string by unquoted pipe characters, the executor runs each stage as a subprocess (feeding the previous stage's output file as stdin), and the analyzer/antipatterns/optimizer modules examine the results to produce insights. All intermediate data is written to temporary files that are cleaned up after analysis (unless --keep is passed).
pip install pytest
pytest tests/ -v78 tests covering the parser, executor, analyzer, anti-pattern detector, optimizer, display renderer, and end-to-end CLI integration.
bash demo/run_demo.shGenerates a 5000-line sample access log, runs a realistic pipeline through Pipespy with visual output, JSON output, and static analysis modes.
MIT