# Streams, redirections, pipelines, and tees

One of the central UNIX metaphors is that all data is a byte stream.
A byte stream is simply an ordered sequence of bytes. Programs can read from a stream of data, and
programs can write to a stream of data.

<div class="alert alert-block alert-info">
    This code in this notebook assumes that the current working directory is 
    <code>./scripts/permissions</code>. Run the next cell once when using this notebook.
</div>

In [None]:
# run this cell exactly once each time you open this notebook
cd ./scripts/streams

## The standard streams

> Another Unix breakthrough was to automatically associate input and output to terminal keyboard and terminal
> display, respectively, by default--the program (and programmer) did absolutely nothing to establish input and
> output for a typical input-process-output program.
>
> https://en.wikipedia.org/wiki/Standard_streams

UNIX has three standard streams called *standard input* (stdin), *standard output* (stdout), and
*standard error* (stderr).

Standard input is a stream from which data can be read. On UNIX-like systems, standard input is connected to the
keyboard by default.

Standard output is a stream to which normal output data can be written. On UNIX-like systems, standard output is
connected to the terminal by default.

Standard error is a stream to which error or diagnostic output data can be written.
On Unix-like systems, standard error is connected to the terminal by default.

Each standard stream has a unique integer identifier called a *file descriptor*:

* `0` standard input
* `1` standard output
* `2` standard error

## The pipeline analogy

Commands in UNIX are like segments of pipe: Input flows in one end and
output flows out of the other end:

![](images/pipeline1.png)

Input might take the form of a command line argument. For example, the command `ls /bin` uses the command
line argument `/bin` as input to the `ls` command:

![](images/pipeline2.png)

Some commands will accept input from standard input. In such commands, the user types in the command and
presses `Enter`. The user then enters their input data, usually pressing `Enter` to separate lines
of data, and then indicates that they are done entering data by pressing `Ctrl-d`. For example,
we might run the `sort` command:

```sh
sort
```

and then enter some lines of data:

```sh
zebra
elephant
anteater
dog
```

typing `Ctrl-d` after entering the last line. `sort` will then sort the lines of input.

![](images/pipeline3.png)

The `cowsay` program is another program that will read standard input.

Most commands will send their output to standard output, although some might write their output to a file.

If a command encounters an error, then some output usually goes to standard error. For example,
if `ls` is asked to list information about a non-existant file then it will print an error message to
standard error:

![](images/pipeline5.png)

## Redirection

The user can change where a command gets its input from or sends its output to via *redirection*.
Redirecting standard output or standard error allows the user to send the output of a command to a file.
Redirecting standard input allows the user to send the contents of a file as input to a command.

#### Redirecting standard output

To redirect standard output, place


`1> `*output_filename*

anywhere in a command where *output_filename* is the name of the file that you want to write the output to.
The `1` is the file descriptor to redirect. Redirection creates the file if necessary, or overwrites the contents
of the file if it already exists. Redirecting standard output causes any output that would have appeared
on standard output to be redirected to the file.

For example, to list all of the files except `.` and `..` in the current directory and save the output in
`files.txt`:

In [None]:
ls -A 1> files.txt

Use `cat` to view the contents of `files.txt`:

In [None]:
cat files.txt

When redirecting standard output, the file descriptor is optional. We could have written:

```sh
ls -A > files.txt
```

for the previous example.

Any command that sends its output to standard output can have its output redirected to a file:

In [None]:
cowsay -f dragon "Mmm, crunchy knight" > moo.txt

In [None]:
cat moo.txt

Using `>>` instead of `>` will append to a file instead of overwriting it.

#### Redirecting standard error

Redirecting standard error is identical to redirecting standard output except that we (must) use the
file descriptor `2`:

In [None]:
ls zzz 2> error.txt

In [None]:
cat error.txt

#### Redirecting standard output and standard error to separate files

Many shell commands do not stop running when they encounter an error. For example:

In [None]:
ls -A zzz file??

Both standard output and standard error streams can be redirected for the same command:

In [None]:
ls -A zzz file?? > files.txt 2> error.txt

In [None]:
cat files.txt

In [None]:
cat error.txt

#### Redirecting standard output and standard error to the same file

If you want to redirect all of the output of a command to a file you should use a normal standard output
redirection followed by `2>&1`.

`2>&1` means duplicate file descriptor `1` and copy it to file descriptor `2`. This causes output
to standard error to be redirected to wherever standard output has been redirected to:

In [None]:
ls -A zzz file?? > files_all_output.txt 2>&1

In [None]:
cat files_all_output.txt

Note that there are multiple redirections occurring in the previous example. The rule to remember is that
the shell processes the redirections from left to right. If we change the order of the redirections in the
previous example to:

In [None]:
ls -A zzz file?? 2>&1 > files_all_output.txt

In [None]:
cat files_all_output.txt

then file descriptor `1` is duplicated and copied to file descriptor `2` (so standard error now points
to standard output) and then redirect standard output to the file `files_all_output.txt`. Now the error
message is printed to standard output, and the non-error output of `ls` is redirected to the file.

#### Redirecting standard input

To redirect standard input so that it reads the contents of a file (instead of reading the keyboard) write:

`0< `*input_filename*

anywhere in the command where *input_filename* is the name of the file that you want to use as input to the command.
The `0` is the file descriptor for standard input.

Most commands that read standard input already accept files as inputs, but you can still use redirection
if you want. For example, we can sort the lines of the file `unsorted.txt` like so:

In [None]:
sort 0< unsorted.txt

but it is easier to avoid the input redirection:

In [None]:
sort unsorted.txt

The file descriptor is optional when redirecting standard input:

In [None]:
sort < unsorted.txt

If needed, standard output (and standard error) can also be redirected:

In [None]:
sort < unsorted.txt > sorted.txt

In [None]:
cat sorted.txt

## Pipelines

Commands accept inputs and have outputs, so it is natural to ask if
can you connect the output of a command to the input of a second command? The answer is "yes"!

Use the symbol `|` to connect the output of one command to the input of the following command. 
For example, we can use the `fortune` command to generate an input string for the `cowsay` command:

```sh
fortune | cowsay
```

![](images/pipeline7.png)


When listing the contents of a directory containing many files, it is often useful to pipe the output
of `ls` to `less` or `more`:

```sh
ls /bin | less
```

![](images/pipeline6.png)

You can connect as many commands as you require. For example, the directories `/bin` and `/usr/bin` on the
author's machine contain many duplicated files. We can see this easily by listing the contents of the
directories and then sorting the result:

```
ls /bin /usr/bin | sort | less
```

which produces the (partial) output:

```
[
[
aa-enabled
aa-enabled
aa-exec
aa-exec
aa-features-abi
aa-features-abi
aaflip
aaflip
aconnect
aconnect
...
```

We can use the `wc` program to count the number of lines in the output to find the total number of
files in both directories:

```
ls /bin /usr/bin | sort | wc -l
```

which outputs `3621` on the author's machine.

We can use the `uniq` command to remove duplicate adjacent lines to find the total number of unique filenames
in both directories:

```
ls /bin /usr/bin | sort | uniq | wc -l
```

which outputs `1812` on the author's machine.

To get the number of duplicated filenames in the two directories we can  use:

```
ls /bin /usr/bin | sort | uniq -d | wc -l
```

which outputs `1809` on the author's machine.

## Tees

Redirection allows you to send the output of a command to a file, and
piping allows you to send the output of a command to the input of another command.
Can you do both? The answer is "yes!"

The `tee` command reads standard input and copies it to both standard output and to one or more files
(the image shows a single file but you can specify multiple files):

![](./images/tee.png)

The `tee` command is useful for saving intermediate results in a pipeline to a file (or files)
For example, we save the current date and time to a file and print various fields of the output:

In [None]:
# date produces different output on macOS
date

In [None]:
# save output of date to fulldate.txt and print output
date | tee fulldate.txt

In [None]:
# save output of date to fulldate.txt and print the day
date | tee fulldate.txt | cut --delimiter=" " --fields=1

In [None]:
# save output of date to fulldate.txt and print the day
date | tee fulldate.txt | cut -d" " -f1

In [None]:
# save output of date to fulldate.txt and print the month
date | tee fulldate.txt | cut -d" " -f3

In [None]:
# save output of date to fulldate.txt and print the year
date | tee fulldate.txt | cut -d" " -f4

`tee` is a command so it can be inserted anywhere in a pipeline:

In [None]:
fortune | tee fortune.txt | cowsay | tee cow.txt | cowsay -n