Pipespy

A Unix pipeline debugger, profiler, and optimizer. Give it any shell pipeline string and Pipespy runs each stage while capturing intermediate data, timing, and line/byte counts between stages. It then renders a visual flow report showing where data gets filtered, which stages are bottlenecks, and how much data flows through each pipe.

The novel part: Pipespy also acts as a linter for shell pipelines. It detects common anti-patterns (useless use of cat, sort-before-grep, grep piped to wc -l, redundant sorts, awk-used-as-cut) and suggests concrete, runnable optimizations — rewritten pipeline fragments with estimated speedup. Think of it as a profiler, debugger, and static analyzer combined into one tool for the humblest unit of Unix computing: the pipe.

Pipespy works with any pipeline you can express as a string. It handles quoted arguments, nested subshells, escaped pipes, and environment variable prefixes. It runs each stage sequentially with intercepted I/O, so you get an accurate picture of what happens at every step without modifying your original commands.

Features

Stage-by-stage profiling: execution time, input/output line counts, byte sizes, and percentage breakdowns
Data flow visualization: see how many lines and bytes flow between each pair of stages, and where data gets dropped
Bottleneck detection: automatically identifies the slowest stage
Filter analysis: finds the stage that removes the most data and shows filter ratios
Anti-pattern detection: identifies 8 common pipeline inefficiencies
- Useless use of cat
- Sort before grep (wasted sort on unfiltered data)
- Consecutive grep chains that could be combined
- Redundant sorts
- Echo piped to a command (use here-strings)
- grep | wc -l (use grep -c)
- awk used only as cut
- Large sorts without prior filtering
Optimization suggestions: concrete rewrites with estimated speedups
- Remove useless cat (use direct file read or redirection)
- Replace grep | wc -l with grep -c
- Move filters before sorts to reduce data volume
- Add LC_ALL=C for faster byte-wise sorting/matching
- Use sort --parallel for large datasets
Sample data inspection: peek at the actual data flowing through each stage
JSON output: machine-readable results for integration with other tools
Static analysis mode: analyze pipelines without executing them
Timeout control: per-stage timeouts to prevent runaway commands

Installation

Requires Python 3.9+. No external dependencies.

# Clone and install
git clone https://github.com/bkmashiro/pipespy.git
cd pipespy
pip install -e .

# Or run directly without installing
python -m pipespy "your | pipeline | here"

Quick Start

# Profile a pipeline and see where time and data go
pipespy "cat /var/log/syslog | grep error | sort | uniq -c | sort -rn | head -20"

# Get optimization suggestions for a messy pipeline
pipespy "cat access.log | sort | grep 404 | sort | grep -v static | wc -l" --no-color

# Static analysis only — no execution, just anti-pattern detection
pipespy "cat file | sort | grep pattern | wc -l" --no-run

# JSON output for scripting
pipespy "echo hello | wc -w" --json

# Inspect actual data at each stage
pipespy "printf 'banana\napple\ncherry\n' | sort | head -2" --samples

Usage

Basic profiling

pipespy "cat server.log | grep '\" 500 ' | awk '{print \$1}' | sort | uniq -c | sort -rn | head"

Output shows each stage with timing, line counts, filter ratios, and a visual time bar:

  Stage 1: cat server.log
    Time:      683us  ██░░░░░░░░░░░░░░░░░░  12%
    Out:       5,000 lines  (609.3 KB)
    --------------------------------------------------
  Stage 2 [BIGGEST FILTER]: grep '" 500 '
    Time:      1.1ms  ███░░░░░░░░░░░░░░░░░  19%
    In:        5,000 lines  (609.3 KB)
    Out:         284 lines  (34.7 KB)
    Filter:    94.3% removed
    ...

Anti-pattern detection

Pipespy automatically flags common mistakes:

pipespy "cat log | sort | grep ERROR | sort | grep FATAL | wc -l" --no-run

 Anti-patterns Detected
------------------------------------------------------------
  [i] useless-cat (stage 1)
      Useless use of cat — sort can read files directly.
      Suggestion: Replace `cat log | sort ...` with `sort ... log`

  [~] sort-before-grep (stage 2)
      sort runs before grep — the sort processes more data than necessary.
      Suggestion: Move grep before sort to filter first, then sort the smaller result set.

  [i] grep-wc (stage 6)
      grep piped to wc -l — grep has a built-in count flag.
      Suggestion: Replace `grep FATAL | wc -l` with `grep -c FATAL`

Optimization suggestions

pipespy "cat access.log | grep 404 | awk '{print \$1}' | sort | uniq -c | sort -rn | head"

 Optimization Suggestions
------------------------------------------------------------
  remove-cat: Remove useless cat — grep can read access.log directly.
    Before: cat access.log | grep 404 | awk '{print $1}' | sort | uniq -c | sort -rn | head
    After:  grep 404 access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head
    Speedup: eliminates one process + pipe

  lc-all-c: Set LC_ALL=C for grep for byte-wise comparison (much faster on ASCII data).
    Before: grep 404
    After:  LC_ALL=C grep 404
    Speedup: ~2-5x for sort/grep on ASCII

JSON output

pipespy "printf 'a\nb\nc\n' | sort | head -2" --json

Returns a structured JSON object with stages, summary, antipatterns, and optimizations arrays — useful for CI integration or building dashboards.

Data flow visualization

The Data Flow section shows exactly how data moves between stages:

 Data Flow
------------------------------------------------------------
           cat -->    5,000 lines (609.3 KB)  (-94.3%)  grep
          grep -->      284 lines (34.7 KB)             awk
           awk -->      284 lines (3.7 KB)              sort
          sort -->      284 lines (3.7 KB)   (-82.7%)   uniq
          uniq -->       49 lines (1.0 KB)              sort
          sort -->       49 lines (1.0 KB)   (-79.6%)   head

Feeding input data

# Feed a file as stdin to the first stage
pipespy "grep ERROR | awk '{print \$NF}' | sort | uniq -c" --input server.log

CLI flags

Flag	Description
`-s`, `--samples`	Show sample data (first/last 5 lines) at each stage
`-j`, `--json`	Output results as JSON
`--no-color`	Disable ANSI color codes
`--no-run`	Static analysis only (no execution)
`-t`, `--timeout`	Per-stage timeout in seconds (default: 60)
`--keep`	Keep intermediate temp files (prints paths to stderr)
`-i`, `--input`	Feed a file as stdin to the first stage
`-V`, `--version`	Show version

Architecture

src/pipespy/
  __init__.py        Package metadata and version
  __main__.py        python -m pipespy entry point
  cli.py             Argument parsing, orchestration
  parser.py          Pipeline string -> list of PipelineStage objects
                     Handles quoting, escapes, subshells, env prefixes
  executor.py        Runs each stage sequentially with intercepted I/O
                     Captures timing, byte counts, line counts, samples
  analyzer.py        Computes aggregate stats: bottleneck, biggest filter,
                     overall reduction, time fractions, data flow edges
  antipatterns.py    8 pattern detectors that flag common pipeline mistakes
  optimizer.py       5 optimization passes that suggest concrete rewrites
  display.py         Renders visual reports (ANSI terminal) or JSON output

The execution model is straightforward: the parser splits the pipeline string by unquoted pipe characters, the executor runs each stage as a subprocess (feeding the previous stage's output file as stdin), and the analyzer/antipatterns/optimizer modules examine the results to produce insights. All intermediate data is written to temporary files that are cleaned up after analysis (unless --keep is passed).

Running Tests

pip install pytest
pytest tests/ -v

78 tests covering the parser, executor, analyzer, anti-pattern detector, optimizer, display renderer, and end-to-end CLI integration.

Running the Demo

bash demo/run_demo.sh

Generates a 5000-line sample access log, runs a realistic pipeline through Pipespy with visual output, JSON output, and static analysis modes.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
demo		demo
src/pipespy		src/pipespy
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipespy

Features

Installation

Quick Start

Usage

Basic profiling

Anti-pattern detection

Optimization suggestions

JSON output

Data flow visualization

Feeding input data

CLI flags

Architecture

Running Tests

Running the Demo

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pipespy

Features

Installation

Quick Start

Usage

Basic profiling

Anti-pattern detection

Optimization suggestions

JSON output

Data flow visualization

Feeding input data

CLI flags

Architecture

Running Tests

Running the Demo

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages