# Let the computer do the work for you

In bioinformatics (and many other fields), there's a lot of tedious data processing on a day to day basis. For example, it's not possible for a human to manually look through billions of nucleotide bases to find a single base change.

Even if it were possible, would we really want to spend the time doing the same tedious task again and again? Why not let the computer do five hours worth of work for you? Then you can go on a well deserved coffee break.

In addition to saving you time, you will also make your work reproducible.

To pur your computer to work, you'll need to combine a series of commands into a shell script.

# Writing a script

### Bash header

The bash header line is called the *shebang* and it looks like:

```bash
#!/bin/bash
```

This tells your computer what environment to use.

### Setting behavior rules

You will also want to include the following lines at the top of your shell script:

```bash
set -e
set -u
set -o pipefail
```

`set -e` tells the script to exit when a command fails. By default, a shell script will not exit if a command fails.
- Want the script to exit when there is an error

`set -u` prevents a command from running if it references an unset variable. Here is an example of the default behavior (when not using `set -u`):

In [2]:
echo "rm $UNSET/something"

rm /something


`set -o` tells a script to exit if a command/program failed in the pipe. Bash does not by default do this.

### Variables

Variables store values that would otherwise have to be hardcoded. For example:

A hardcoded line of code would be:

In [5]:
ls /Users/chaochih/GitHub/Bash_Demo

1_bash_commands.ipynb			apt.txt
2_bash_tricks.ipynb			environment.yml
3_bash_and_basic_shell_scripting.ipynb	postBuild
README.md				toy_data


Using a variable to store the path to Bash_Demo:

In [7]:
# Define variable contents
work_dir="/Users/chaochih/GitHub/Bash_Demo"

ls "${work_dir}"

1_bash_commands.ipynb			apt.txt
2_bash_tricks.ipynb			environment.yml
3_bash_and_basic_shell_scripting.ipynb	postBuild
README.md				toy_data


To view content stored in a variable:

In [None]:
echo "${work_dir}"

### Command line arguments

These are user input arguments provided in a similar fashion to built in bash commands. For example:

```bash
bash your_script.sh arg1 arg2 arg3
```

### Conditionals: if else statements

Bash conditionals allows you to have more control over handling your data.

General syntax:

```bash
if grep "pattern" file.txt
then
    # commands to run if pattern exists
else
    # commands to run if pattern does not exist
fi
```

Let's say you want to use `grep` to find if you have chr2 in your BED file and if there isn't, you want to print a message.

In [12]:
if grep "chr2" ~/GitHub/Bash_Demo/toy_data/test.bed > /dev/null
then
    echo "Found chromosome."
else
    echo "Chromosome not found."
fi

Found chromosome.


# Exercise 2

Write a short shell script that takes a list of `.bed` files in the `toy_data` directory and outputs a `.txt` file with the number of lines in each `.bed` file.

You can either write the body of your shell script below or open up your favorite text editor.