## Pipeline, Pipes and Filters

Commands seen previously are quiet elementary. They are usually combined together to make more complex operations. Run the following code cells and try to guess what the last cell does.

In [None]:
cd data/molecules
ls -lh .

In [None]:
ls -lhS .

In [None]:
ls -lhS . | head -n 2

In [None]:
ls -lhS . | head -n 2 | tail -n 1

In [None]:
ls -lhS . | head -n 2 | tail -n 1 | cut -d ' ' -f 9

Answer: The last ouput is the name of the item that has the biggest size. 
Details: 

- `ls -lhS .` prints the items by order of size of the current working directory.

- `head -n 2 file` prints the first two lines of `file`.

- `tail -n 1 file` prints the last line of `file`.

- `cut -d ' ' -f 14` splits the line using ` `(empty space) as delimeter and prints the fourteenth field.

Using bash jargon:
- `ls -lhS . | head -n 2 | tail -n 1 | cut -d ' ' -f 14` is a <b>pipeline</b>.
- each command is a <b>filter</b>.
- `|` is a pipe that connects the output of the command on the left to the output of the command on the right.

### Exercise

The following code cell shows how to sort files by the number of words they contain. Modify the code cell to make a pipeline that outputs the file with the most words in `biggest.txt`.

> When a command is a list of arguments instead of one, it returns a list of outputs.

> `cmd2 $(cmd1 arg)` is a command substitution. In other words, it uses the output of `cmd1 arg` as arguments for `cmd2`. The main differences between substitution and piping, is that when using pipes the first output is given in input as a file whereas with substitution it is given as arguments.

> Using `sort file` without any option sorts the lines by alpanumerical order, e.g. 1, 5, 110, 21 --> 1, 110, 21, 5.

> Using `sort -n file` with the `-n` option sorts the lines by numerical order, e.g. 1, 5, 110, 21 --> 1, 5, 21, 110.

> `wc file` returns the number of lines, words and characters in the file (the command is the acronym of word count).

> `wc -w file` returns only the number of words in the file.

In [None]:
wc -w $(ls .) | sort -n

<span style="color:blue">Double click for answer</span>

<script>
Answer: wc -w $(ls .) | sort -n | tail -n 2 | head -n 1 > biggest.txt
</script>

<br /><div style="text-align: right"> [Next section →](./scripting.ipynb) </div>