# Writing Bash scripts - Good practices

## 2.20. Track the Progress of Your Script and Redirect Script Outputs and Errors

In this step you will learn about some ways to track the progress of your script and to **redirect script output to files**

### Tracking the progress of your script

There are many different ways in which we can track the progress of our scripts. 

The simplest is to **break your script down into sections and output a progress statement when you start and/or finish each section**.

For example, let’s set our name as a variable and count the number of characters it contains.

In [None]:
#!/usr/bin/env bash
 
# Set your name as a variable
name="Victoria"
 
echo "Counting number of characters in name"
printf -- "${name}" | wc -m

Now, while this may seem excessive given the simple example, it’s clear that once we start to build up our scripts, **adding progress statements will be invaluable**. 

Particularly when were discussing loops this week, where it’s possible for your scripts to get stuck in an infinite loop, failing to exit. In those situations, progress statements are absolutely **essential for debugging!**

### I/O redirection

To start understanding how these streams work, let’s look at redirecting the output from a script into a single file.

Example script:

In [None]:
#!/usr/bin/env bash
 
# A script that tries to change directory
 
echo "Changing to a directory that doesn't exist"
cd foo

As you can see, our script returns the printed progress statement and an error that tells us that the directory we’re trying to migrate to doesn’t exist on our filesystem.

In [None]:
./script.sh

In the terminal:

```
Changing to a directory that doesn't exist ## ==> delivered via stdout
script.sh: line 6: cd: foo: No such file or directory ## ==> delivered via stderr
```

These two messages are being delivered to the terminal by two different Linux streams. 

The first message, our progress statement, is delivered via **stdout**. Meanwhile, the error message is delivered via **stderr**.

Now, let’s see what happens when we try to redirect the outputs from that script into a file called output.txt:

In [None]:
./script.sh > output.txt

```
./script.sh: line 6: cd: foo: No such file or directory
```

OK, so, we can see that the **stdout** has been **redirected to our output file** but, the *error is still being displayed*.

In [None]:
cat output.txt

```
Changing to a directory that doesn't exist
```

Why is this? Well, when we use `>` to redirect to a file, by default, the system will **only redirect the stdout**.

But, **what about our errors being delivered via stderr, how can we capture those?**

To simplify things, let’s first look at **how to redirect stdout and stderr to two different files**. 

We’ll use the `>` symbol with our file descriptors (1 for stdout and 2 for stderr) to redirect our outputs to output.txt and our errors to error.txt respectively.

In [None]:
./script.sh 1>output.txt 2>error.txt

This command returns nothing back to our terminal. Using the cat command, we can see that, as expected, our outputs and errors have been written to *output.txt* and *error.txt* respectively.

* Our stdout (progress statement returned using echo): 

In [None]:
cat output.txt 

```
Changing to a directory that doesn’t exist
```

* And our stderr (errors):

In [None]:
cat error.txt

```
./script.sh: line 6: cd: foo: No such file or directory
```

In order to redirect the stdout and the stderr to the same place, we need to use a new term: `2>&1`. 

When we use this, we redirect using the same syntax as before, but add `2>&1` **to the end of our command**.

This is how it works in practice:

In [None]:
./script.sh > combined_output.txt 2>&1

Now, if we look at our combined output file, we can see that we’ve captured both the **stdout** and the **stderr**.

In [None]:
cat combined_output.txt

```
Changing to a directory that doesn't exist ## ==> stdout
./script.sh: line 6: cd: foo: No such file or directory ## ==> stderr
```

## 2.21. Writing Robust Bash Scripts 

### Using `set -e` to catch errors

Fortunately, the `set -e` command comes to our rescue by ensuring that the script will fail whenever an error occurs, no matter the exit code. Try adding `set -e` to the top of your script:

In [None]:
#!/usr/bin/env bash
 
set -e
 
cd foo
ls

Bingo! This time, we can see that the script terminates as soon as it reaches the first error.

```
script.sh: line 5: cd:foo: No such file or directory
```

### Using `set -u` to catch variables that don’t exist

Notice that the system outputs a blank line for echo $foo. 
```
This is because Bash is ignoring $foo as it doesn’t exist.
```
If we want the script to exit with an error instead of continuing on silently, we can add the `set -u` command **at the top of our script**.

In [None]:
#!/usr/bin/env bash
 
set -u
 
echo $foo
echo bar

This will result in our script exiting with the following error:

```
script.sh: line 6:foo: unbound variable
```

Notice, our script terminates before running the second `echo` command.

### Displaying executed commands while script is running with `set -x`

Another default Bash behaviour is to only display results once a script has finished. This can be especially frustrating when you need to debug scripts that take a long time to run.

Let’s take an example script that outputs two simple strings, foo and bar.

In [None]:
#!/usr/bin/env bash
 
echo foo
echo bar

The output from this script would be:

```
foo
bar
```

Now, what if we want to know which command is producing each of the results? 

To find this out, we can use the `set -x` command which **outputs the executed command before printing the command result**.

In [None]:
#!/usr/bin/env bash
 
set -x
 
echo foo
echo bar

Running this script would give the following output:

```
+ echo foo
foo
+ echo bar
bar
```

As you can see, before executing each of the echo commands, the script first prints the command to the terminal, **using a `+` to indicate that the output is a command**. This can be especially handy when you want to debug long scripts.

### Combining set options in a single command: `set -eux`

Most of the time, you will want to use all of these options together. Instead of writing the commands out, one command per line, we can combine the options into a single command:

In [None]:
set -eux

Using the set command is essential to building robust Bash scripts. Not only is it part of good scripting practices but, will also save you a lot of time and frustration!

## Final Exercise - Use Bash Scripting to Parse Biological Data 

In this exercise we’re going to look at using Bash scripts to parse biological data. 

We’ll walk you through and explain the commands for parsing a single data file. 

Then, it will be up to you to write a Bash script to process all of the example data files.

The aim of this exercise is simply to run a program across three example data files to get the number of records it contains using the skills you’ve been developing in Week 1 and Week2.

### Store the output of a command in a variable

We can store the output of a command as a variable using the following syntax:

In [None]:
variable=$(command)

So, for our example command this would be:

In [None]:
alignments=$(samtools view -c sample_10000_11000.bam)

Now, if we echo our variable, you will see it has the expected value:

In [None]:
echo ${alignments}

```
1947
```

### Task

Using a Bash script, get the number of records for each of the three example data files.

Some hints:

* Use comments
* Use the set command
* Check whether each file is empty before running samtools
* Use a loop – i.e. don’t run three samtools commands with hardcoded filenames, use wildcards (e.g. sample*.bam where * matches any string)
* Return the filename and the number of records back to the user


Option 1:

In [None]:
#!/usr/bin/env bash

# stop at the first error
# stop if a variable doesn't exist
# display executed commands
#set -eux

# checking if the files are empty
for file in ../data/week2/*.bam
do
    # returns true if file size is not > 0
    if [[ ! -s "${file}" ]]; then
        echo "The file ${file} is empty"
    #else
    #    echo "The file ${file} is not empty"
    fi
done

# stop at the first error
# stop if a variable doesn't exist
# display executed commands
#set -eux
#set -eu


# count number of alignments of each file using samtools

for file in ../data/week2/*.bam
do
    alignments=$(samtools view -c "${file}")
    echo "The file ${file} has ${alignments} alignments"
done

Option 2:

In [None]:
#!/usr/bin/env bash

# Description:
# This script checks if the file is not empty and then
# counts the number of alingments in each bam file.

# stop at the first error and stop if a variable doesn't exist
set -eu

# main code: 2 things will be done in each iteration of the for loop.

for file in ../data/week2/*.bam
do
    # 1. Check if the file is not empty.
    # i.e. if file size is not > 0, print message about that empty file.
    if [[ ! -s "${file}" ]]; then
        echo "The file ${file} is empty."
    fi

    # 2. Count number of alignments in each bam file
    alignments=$(samtools view -c "${file}")
    # message to the user
    echo "The file ${file} has ${alignments} alignments."

done

**Option 3**: Here, nothing is harcoded. The script only needs as first argument the path to the directory of bam files.

In [None]:
#!/usr/bin/env bash

# Description:
# This scripts takes the path to a directory passed as first argument.
# (To have anything hardcoded)
# Check if the files in that directory are not empty
# Counts the number of alingments in each bam file.

# Path to the directory of bam files
directory=$1

# stop at the first error and stop if a variable doesn't exist
set -eu

# main code: 2 things will be done in each iteration of the for loop.

for file in "${directory}"/*.bam
do
    # 1. Check if the file is not empty.
    # i.e. if file size is not > 0, print message about that empty file.
    if [[ ! -s "${file}" ]]; then
        echo "The file ${file} is empty."
    fi

    # 2. Count number of alignments in each bam file
    alignments=$(samtools view -c "${file}")
    # message to the user
    echo "The file ${file} has ${alignments} alignments."

done

In [None]:
./alig3.sh ../data/week2

```
The file ../data/week2/sample_10000_11000.bam has 1947 alignments.
The file ../data/week2/sample_11000_12000.bam has 123 alignments.
The file ../data/week2/sample_12000_13000.bam has 276 alignments.
```

In [None]:
./alig3.sh "../data/week2"

```
The file ../data/week2/sample_10000_11000.bam has 1947 alignments.
The file ../data/week2/sample_11000_12000.bam has 123 alignments.
The file ../data/week2/sample_12000_13000.bam has 276 alignments.
```

Option 4: Adding a line to track the progress of my script

In [None]:
#!/usr/bin/env bash

# Description:
# This scripts takes the path to a directory passed as first argument.
# (To have anything hardcoded)
# Check if the files in that directory are not empty
# Counts the number of alingments in each bam file.

# Path to the directory of bam files
directory=$1

# stop at the first error and stop if a variable doesn't exist
set -eu

# main code: 2 things will be done in each iteration of the for loop.

for file in "${directory}"/*.bam
do
    # 1. Check if the file is not empty.
    # i.e. if file size is not > 0, print message about that empty file.
    if [[ ! -s "${file}" ]]; then
        echo "The file ${file} is empty."
    fi

    # 2. Count number of alignments in each bam file
    # Tracking the progress of my script
    echo "Processing file ${file}..."
    alignments=$(samtools view -c "${file}")
    # message to the user
    echo "The file ${file} has ${alignments} alignments."

done