# The `tr` command

The `tr` command translates, squeezes, and/or delete characters from standard input, writing to standard output.
The command is similar to a character-based search and replace for strings.

The usage of `tr` is:

```sh
tr [option]... seq1 [seq2]
```

where `[option]` are optional flags, `seq1` is a sequence of characters to convert from, and `seq2` is an
optional sequence of characters to convert to.

`tr` performs translation when both sets `seq1` and `seq2` are given. Each character in the set `seq1`
is translated to its corresponding character in `seq2`. Characters not in `seq1` are passed through to the output
unchanged.

`tr` takes its input from standard input so to apply the command to a string stored in a variable, you
print the string using `printf` (or `echo` if the string contains no unusual character sequences) and
pipe the output to `tr`. For example, to compute the string obtained by replacing all occurrences of `p` with
`t` in a string `x` you can write:

In [None]:
x="sparring with a purple porpoise"
y=$(printf %s "$x" | tr 'p' 't')       # replace all occurrences of p with t
echo $y

Similarly, to compute the string obtained by replacing all occurrences of `.` with `/` in a string `x` you can write:

In [None]:
x="ca.queensu.cs.cisc124"
y=$(printf %s "$x" | tr '.' '/')       # replace all occurrences of . with /
echo $y

### Specifying the sequences `seq1` and `seq2`

The sequences `seq1` and `seq2` can be specified in three different ways:

1. a sequence (string)
    * e.g., `ABCDEFGHIJKLMNOPQRSTUVWXYZ`
2. a range (but this is not portable)
    * e.g. `A-Z`
3. POSIX character class (on POSIX systems)
    * e.g., `[:upper:]`

Usually, the sequences have the same size so that the i'th character in `seq1` is replaced with the i'th character
in `seq2`. For example, to convert letters from lowercase to uppercase:

In [None]:
x="cisc220"
# replace lowercase letters with uppercase
seq1='a-z'
seq2='A-Z'
y=$(printf %s "$x" | tr "$seq1" "$seq2")
echo $y

If `seq1` is shorter than `seq2` then the excess characters in `seq2` are simply ignored:

In [None]:
x="cisc220"
# replace lowercase letters with uppercase
seq1='a-z'
seq2='A-Z0-9'
y=$(printf %s "$x" | tr "$from" "$to")
echo $y

If `seq1` is longer than `seq2` then `seq2` is extended by copying its last character until `seq2` is the same
length as `seq1`, but this is not portable. 
The GNU version of `tr` performs copying of the last character but POSIX says that the result is undefined:

In [None]:
x="cisc220"
# replace all digits with X (GNU tr)
seq1='0-9'
seq2='X'
y=$(printf %s "$x" | tr "$seq1" "$seq2")
echo $y

A more portable way to duplicate a character *c* in `to` is to use the syntax `[`*c*`*]` which duplicates
the character *c* until `seq2` has the same length as `seq1`:

In [None]:
x="cisc220"
# replace all digits with X
seq1='[:digit:]'                 # 0123456789
seq2='[X*]'                      # XXXXXXXXXX
y=$(printf %s "$x" | tr "$seq1" "$seq2")
echo $y

If `seq1` contains duplicated characters then the result is undefined. On the author's machine, the following
prints `SISS220`:

In [None]:
x="cisc220"
# uh oh, duplicated c in from
seq1='cisc'                       # cisc
seq2='CIS'                        # CISS in Bash (probably)
y=$(printf %s "$x" | tr "$seq1" "$seq2")
echo $y

### Deleting characters

Use the `-d` option to delete characters that appear in the sequence `seq1`; the set `seq2` is
not used and an error might result if it is specified:

In [None]:
x="2022-10-21"
seq1='-'
y=$(printf %s "$x" | tr -d "$seq2")
echo $y

### Squeezing repeated characters

Use the `-s` option to squeeze repeated characters that appear in the set `seq1`; the repeated characters are
replaced with a single instance of the character:

In [None]:
x="a     bcd  ef g"
seq1=' '                          # squeeze repeated spaces into a single space
y=$(printf %s "$x" | tr -s "$seq1")
echo $y

Squeezing and deletion can both be specified in which case deletion is first performed using the characters specified in `seq1` and then squeezing is performed using the characters specified in `seq2`: