# Loops

<div class="alert alert-block alert-info">
    You can find all of the scripts in this notebook in the subdirectory containing this notebook:
    <code>./scripts/loops</code>
</div>

Bash has four types of loops:

1. `until` loops as long as a test command has an exit status that is not zero
2. `while` loops as long as a test command has an exit status that is zero
3. a `for` loop that loops over the elements of a list (somewhat like Python's `for` loop)
4. a `for` loop similar to the C-style `for` loop 

The `break` and `continue` commands can be used to control loop execution similar to how `break` and `continue`
behave in Python and Java (although the Bash versions can specify which loop to exit or continue when loops are
nested).

#### `until` and `while`

Recall that in the following `if` statement

```sh
if test-command; then
    commands
fi
```

the exit status of *test-command* is used to determine if the body of the `if` block runs. An exit status 
of zero for *test-command* is considered to be "true" and a non-zero exit status is considered to be "false".

An `until` loop runs *until a test command succeeds* and a `while` loop runs *while a test command succeeds*.
Thus, an infinite loop can be written as:

```sh
until false; do
    # loop body here
done
```

or

```sh
while true; do
    # loop body here
done
```

The `[[ ]]` and `(( ))` constructs can be used as the *test-command*. Thus, an infinite loop can also be 
written as:

```sh
until [[ a == b ]]; do
    # loop body here
done
```

or

```sh
while [[ -n abc ]]; do
    # loop body here
done
```

or 

```sh
until (( 1 + 1 < 1 )); do
    # loop body here
done
```

or

```sh
while (( 1 > 0 )); do
    # loop body here
done
```

Examples of loops that count down to zero starting from 10 are shown below:

In [None]:
count=10
until (( count < 0 )); do
    echo $count
    count=$(( count - 1 ))
done

In [None]:
count=10
while (( count > -1 )); do
    echo $count
    (( count-- ))
done

The [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture) says that if we start with
a positive integer $n$ and repeat the following steps then the value of $n$ will eventually equal $1$:

* if $n$ is even then let $n = n/2$, otherwise
* let $n=3 \times n + 1$

No one knows if the conjecture is true but mathematicians have used computers to test the conjecture up to
values of $2^{68}$ (as of 2020).

A script that prints the sequence of values that $n$ takes on is shown below:

---
```sh
#!/bin/bash

# collatz.sh

if (( $# == 0 )); then
    echo "collatz.sh: missing positive integer argument" >&2
    exit 1
else
    val=$1
fi
if [[ ! $val =~ ^[-+]?[0-9]+$ ]]; then
    echo "collatz.sh: argument is not an integer" >&2
    exit 1
fi
if (( val < 1 )); then
    echo "collatz.sh: argument is not positive" >&2
    exit 2
fi
echo $val
while (( val != 1 )); do
    if (( val % 2 == 0 )); then
        val=$(( val / 2 ))
    else
        val=$(( 3 * val + 1 ))
    fi
    echo $val
done

```
---

In [None]:
./scripts/loops/collatz.sh 27

### `for` loop over elements in a list

There is no list data type in Bash, but the official Bash documentation often mentions the term "list". In Bash,
a list is simply a sequence of whitespace separated strings. A `for` loop over the elements of a list looks
somewhat similar to a Python `for` loop:

```sh
for name in list; do
    commands
done
```

The above `for` loop executes the loop body *commands* once for each string in *list*. Inside the loop body,
*name* is bound to the current string being processed in *list*. An example of a `for` loop that counts down to zero starting from 10 is shown below:

In [None]:
for i in 10 9 8 7 6 5 4 3 2 1 0; do
    echo $i
done

The list of strings can be obtained from an expansion. For example, word splitting after parameter substitution
produces a list: 

In [None]:
str="this is a string   with words          separated by spaces"
for s in $str; do
    echo $s
done

In the above example, `$str` is intentionally unquoted so that the shell performs word splitting.

Filename expansion produces a list of zero or more filenames (but see the following example for what happens
when a filename expansion produces no filenames):

In [None]:
for script in ./scripts/loops/*sh; do
    echo "$script"
done

A potential source of error occurs when a filename expansion fails to match any filenames; in this case, the list
is the string equal to the text of the expansion:

In [None]:
# no pdf files in ./scripts/loops
for script in ./scripts/loops/*pdf; do
    echo "$script"
done

Brace expansion produces a list of strings:

In [None]:
for i in {10..0}; do
    echo $i
done

A command substitution is a common source of lists in a `for` loop:

In [None]:
for i in $(seq 10 -1 0); do
    echo $i
done

<div class="alert alert-block alert-warning">
    You should generally avoid using the <tt>seq</tt> command to perform counting. Use a C-style for loop
    instead (see next section). The shell must spawn a subshell to run the <tt>seq</tt> command which
    is a waste of resources for a task that the shell can perform on its own.
</div>

The following snippet uses `grep` to search for lines containing the keyword `if` in the `collatz.sh` script
and pipes the result to `cut` to keep just the line number:

In [None]:
echo "keyword if appears on lines:"
for linenum in $(grep -En if ./scripts/loops/collatz.sh | cut -f1 -d:); do
    echo $linenum
done

### C-style `for` loop

Bash provides a C-style `for` loop that can be used when the loop control variables are manipulated
using arithmetic:

```sh
for (( expr1; expr2; expr3 )); do
    commands
done
```

*expr1* is an arithmetic expression that is evaluated once before the loop runs. Typically, *expr1* is used
to initialize any loop variables.

*expr2* is an arithmetic expression that is evaluated before each iteration of the loop. Typically, *expr2*
is a condition involving the loop variables. The loop body runs if *expr2* evaluates to a non-zero value
(recall that an arithmetic expression is analogous to `true` if its value is not zero). The loop terminates
when *expr2* evaluates to zero.

*expr3* is an arithmetic expression that is evaluated at the end of each iteration of the loop. Typically,
*expr3* is used to update the loop variables.

An example of a loop that counts down to zero starting from 10 is:

In [None]:
for (( i=10; i >= 0; i-- )); do
    echo $i
done

# Looping over command line arguments

The special parameters `*` and `@` both expand to the positional parameters (the command line arguments
provided to the script) starting at `$1`. Using `@` is almost always the correct thing to do when
sequentially processing the command line arguments to a script.

The following script uses a `for` loop to iterate over all of the command line arguments:

---
```sh
#!/bin/bash

# for_each_arg.sh

i=1
for arg in "$@"; do
    echo "\$${i} : $arg"
    (( i++ ))
done
```
---


In [None]:
./scripts/loops/for_each_arg.sh arg1 "arg2 has some spaces" "arg3 has some spaces, too"

The quotes around `$@` are important in this context. `"$@"` expands to `"$1" "$2" ...` which prevents word
splitting of the arguments. 

An unquoted `$@` expands to `$1 $2 ...` so word splitting of the arguments occur if they contain whitespace.
The script `bad_for_each_arg.sh` is identical to `for_each_arg.sh` except that the 
`$@` is not quoted. Running `bad_for_each_arg.sh` with the same command line arguments as above produces
different output:

In [None]:
./scripts/loops/bad_for_each_arg.sh arg1 "arg2 has some spaces" "arg3 has some spaces, too"

It is also common to see a `while` loop combined with the `shift` builtin used to sequentially process
command line arguments. `shift n` shifts the positional parameters to the left by *n* where *n* is a non-negative
integer value. *n* is assumed to be equal to $1$ if it is missing. The positional parameters are unchanged
if *n* is zero or greater than `$#`. The value of `#` is updated to reflect the updated number of positional
parameters.

`shift` or `shift 1` shifts the value of `$2` to `$1`, `$3` to `$2`, and so on. `$n` is unset. If you imagine
that the positional parameters are stored in a queue, then `shift` or `shift 1` is similar to dequeuing one
element from the queue.

The following script uses a `while` loop combined with `shift` to sequentially process command line arguments:

---
```sh
#!/bin/bash

# shift_arg.sh

while (( $# > 0 )); do
    # do something with first positional parameter
    # e.g., print its value
    echo "$1"

    # now shift positional parameters
    shift
done

```
---

In [None]:
./scripts/loops/shift_arg.sh arg1 "arg2 has some spaces" "arg3 has some spaces, too"

## Reading a file one line at a time

Plain-text files often contain line oriented data where each line contains some structured information and
each line ends with a newline character. Such files can be read one line at a time using a `while` loop,
the builtin `read` command, and a redirection:

```sh
while read -r line; do
    # print the line or do some other processing
    echo "$line"
done < "$file"
```

The `read` builtin command reads one line of text from standard input and stores the text in the variable
`line`. The `-r` option causes `read` to treat the backslash `\` as an ordinary character instead of the
escape character. The exit status of `read` is `0` unless `read` encounters an error when trying to read
the line of input; thus, the `while` loop runs as long as `read` is able to successfully read a line of
text that is terminated by a newline character.

Consider the following file (found in `./scripts/loops/students.txt`) that contains a list of student
information one student per line:

```
# student number, last name, first name
12345,Parr,Jack-Jack
23456,Wazowski,Mike
98765,Best,Lucius
```

The following cell uses the loop shown above to process the file:

In [None]:
file="./scripts/loops/students.txt"
while read -r line; do
    echo "$line"
done < "$file"

We can ignore the first line of the file using some simple logic and a regular expression inside the loop:

In [None]:
file="./scripts/loops/students.txt"
while read -r line; do
    # ignore lines starting with #
    if [[ $line =~ ^# ]]; then
        continue
    fi
    echo "$line"
done < "$file"

`read` is able to split the contents of the line into separate fields and each field can be stored in
a separate variable. `read` uses the characters in `IFS` to perform the splitting. For example,
we can extract the student number, last name, and first name of each student like so:

In [None]:
file="./scripts/loops/students.txt"
OLDIFS=$IFS
IFS=","
while read -r student_number last_name first_name; do
    # ignore lines starting with #
    if [[ $student_number =~ ^# ]]; then
        continue
    fi
    echo "Student number : $student_number"
    echo "Last name      : $last_name"
    echo "First name     : $first_name"
    echo ""
done < "$file"
IFS=$OLDIFS

Instead of saving and restoring the state of `IFS`, we can set `IFS` only for the duration of the
`read` command using the following:

In [None]:
file="./scripts/loops/students.txt"

while IFS="," read -r student_number last_name first_name; do
    # ignore lines starting with #
    if [[ $student_number =~ ^# ]]; then
        continue
    fi
    echo "Student number : $student_number"
    echo "Last name      : $last_name"
    echo "First name     : $first_name"
    echo ""
done < "$file"