# Pipelines

---

Pipelines, often called pipes, is a way to chain commands and connect output from one command to the input of the next. A pipeline is represented by the pipe character: `|`. It's particularly handy when a complex or long input is required for a command.

```bash
command1 | command2
```

By default pipelines redirects only the standard output, if you want to include the standard error you need to use the form `|&` which is a short hand for `2>&1 |`.

## Basic Pipeline Examples

Imagine you quickly want to know the number of entries in a directory, you can use a pipe to redirect the output of the `ls` command to the `wc` command with option `-l`.

In [1]:
ls / | wc -l

24


Then you want to see only the first 10 results

_Note: head outputs the first 10 lines by default, use option -n to change this behavior_

In [2]:
ls / | head

bin
boot
cdrom
dev
etc
home
lib
lib32
lib64
libx32
boot
cdrom
dev
etc
home
lib
lib32
lib64
libx32


Grep searches for patterns in each file. Patterns is one or more patterns separated by newline characters, and grep prints each line that matches a pattern. Typically patterns should be quoted when grep is used in a shell command.

In [3]:
# This will grab any line/file that has a matching pattern in it
ls / | grep "bin"

[01;31m[Kbin[m[K
s[01;31m[Kbin[m[K
s[01;31m[Kbin[m[K


## More Complex Pipelines

You can chain together more than two commands:

In [4]:
# List files, filter for .conf files, count them
ls /etc | grep "\.conf$" | wc -l

36


In [5]:
# Show processes, filter for bash, show only first 5
ps aux | grep bash | head -5

bigalex+   38969  0.0  0.3 1459550156 106264 ?   Sl   Eki25   0:01 /usr/share/code/code /home/bigalex95/.vscode/extensions/mads-hartmann.bash-ide-vscode-1.43.0/out/server.js --node-ipc --clientProcessId=19844
bigalex+  205949  0.0  0.2 700224 71104 ?        Sl   00:47   0:00 /home/bigalex95/Projects/learn-X/.venv/bin/python3 -m bash_kernel -f /run/user/1000/jupyter/runtime/kernel-v36c44beafa4fdb4aa0b93a3db206e6cdb0682db8b.json
bigalex+  205966  0.0  0.0  14904  6272 pts/0    Ss+  00:47   0:00 /usr/bin/bash --rcfile /home/bigalex95/Projects/learn-X/.venv/lib/python3.13/site-packages/pexpect/bashrc.sh
bigalex+  207034  0.0  0.2 700228 71244 ?        Sl   00:48   0:00 /home/bigalex95/Projects/learn-X/.venv/bin/python3 -m bash_kernel -f /run/user/1000/jupyter/runtime/kernel-v32376d6147823f42099a60ab4d977cc13a304824f.json
bigalex+  207045  0.0  0.0  14904  6400 pts/3    Ss+  00:48   0:00 /usr/bin/bash --rcfile /home/bigalex95/Projects/learn-X/.venv/lib/python3.13/site-packages/pexpect/bashr

## Practical Examples

### Example 1: Find largest files

In [6]:
# Find files in /tmp, sort by size, show top 5
find /tmp -type f 2>/dev/null | xargs ls -lh 2>/dev/null | sort -k5 -hr | head -5

-rw------- 1 bigalex95 bigalex95 3,9M Eki 25 19:49 /tmp/.org.chromium.Chromium.6sHjmu
-rw-rw-r-- 1 bigalex95 bigalex95  49K Eki 25 19:51 /tmp/bigalex95-code-zsh/.zcompdump
-rw------- 1 bigalex95 bigalex95  29K Eki 26 00:59 /tmp/.com.google.Chrome.GU7fCy
-rw-rw-r-- 1 bigalex95 bigalex95  12K Eki 25 22:04 /tmp/dart-code-startup-log-6463.txt
-rw-r--r-- 1 bigalex95 bigalex95 9,5K Eki 26 00:25 /tmp/bigalex95-code-zsh/.zshrc
-rw-rw-r-- 1 bigalex95 bigalex95  49K Eki 25 19:51 /tmp/bigalex95-code-zsh/.zcompdump
-rw------- 1 bigalex95 bigalex95  29K Eki 26 00:59 /tmp/.com.google.Chrome.GU7fCy
-rw-rw-r-- 1 bigalex95 bigalex95  12K Eki 25 22:04 /tmp/dart-code-startup-log-6463.txt
-rw-r--r-- 1 bigalex95 bigalex95 9,5K Eki 26 00:25 /tmp/bigalex95-code-zsh/.zshrc


### Example 2: Count unique entries

In [7]:
# Create test data
cat > /tmp/data.txt << EOF
apple
banana
apple
cherry
banana
apple
EOF

# Count unique occurrences
cat /tmp/data.txt | sort | uniq -c | sort -nr

      3 apple
      2 banana
      1 cherry
      2 banana
      1 cherry


### Example 3: Text processing

In [8]:
# Create log file
cat > /tmp/access.log << EOF
192.168.1.1 - GET /index.html 200
192.168.1.2 - GET /about.html 200
192.168.1.1 - POST /login 401
192.168.1.3 - GET /index.html 200
192.168.1.2 - GET /contact.html 404
EOF

# Extract IPs, count unique ones
echo "Top IP addresses:"
cat /tmp/access.log | awk '{print $1}' | sort | uniq -c | sort -nr

Top IP addresses:
      2 192.168.1.2
      2 192.168.1.1
      1 192.168.1.3
      2 192.168.1.2
      2 192.168.1.1
      1 192.168.1.3


### Example 4: File analysis

In [9]:
# Analyze file types in /etc
echo "File type distribution in /etc:"
find /etc -maxdepth 1 -type f 2>/dev/null | xargs file 2>/dev/null | cut -d: -f2 | sort | uniq -c | sort -nr | head -5

File type distribution in /etc:
     10                       ASCII text
      9                     ASCII text
      9                         ASCII text
      8                        ASCII text
      7                      ASCII text
     10                       ASCII text
      9                     ASCII text
      9                         ASCII text
      8                        ASCII text
      7                      ASCII text


## Exercise

In this exercise, you will need to print the number of processors based on the information in the cpuinfo file (/proc/cpuinfo)

*Hint 1: each processor has a unique number, for instance the first processor will contain the line `processor: 0`*

*Hint 2: you can chain together more than two commands in a row*

In [10]:
# Your code here
cat /proc/cpuinfo | grep "^processor" | wc -l

16


## Advanced Pipeline Techniques

In [11]:
# Using tee to split output
echo "Testing tee command"
echo "Hello Pipeline" | tee /tmp/output.txt | wc -c
echo "Content saved to:"
cat /tmp/output.txt

Testing tee command
15
15
Content saved to:
Content saved to:
Hello Pipeline
Hello Pipeline


In [12]:
# Combining stderr and stdout
# |& is shorthand for 2>&1 |
ls /tmp /nonexistent 2>&1 | grep "cannot access" | wc -l

1
