In [None]:
# Clean up any old examples

rm -rf Sep2019
rm -f *.sh
rm -f *.out
rm -f *.txt

# Introduction to Shell Scripting

### Resources

https://devhints.io/bash

https://www.shellscript.sh

https://ryanstutorials.net/bash-scripting-tutorial/

## History

The Bourne shell `sh` was developed by Stephen Bourne at Bell Labs in 1979 for Version 7 Unix.

In Lesson 1 of this Command-Line Interface series, this shell was primarly shown as an interactive command interpreter. In addition, the Bourne shell was also written as a *scripting language* and allows for some programming aspects that will the focus of this lesson.

### Bourne-Again Shell

Released in 1989, the Bourne-Again Shell `bash` was written as a free software replacement for the Bourne shell. *Bash* uses a superset of the Bourne shell that incorporates some elements of other shells e.g., `csh` and `ksh`. Many GNU/Linux distributions provide `bash` as the default shell today.

#### What shell does the BioHPC run?

In [None]:
# use command substitution $(command) to find the default shell

ls -l $(which sh) 

In [None]:
# `command` is also a form of command substitution

ls -l `which sh`

In [None]:
# print version info of bash

bash --version

## Building Your First Script

### Interpreter Directive

Every shell script should begin with an *interpreter directive*.

For shell scripts, this will almost always be:

```
#!/bin/sh
```

The `#!` is called the *shebang* which always precedes the *interpreter*.

In [None]:
## building your first script hello.sh

echo '#!/bin/sh' > hello.sh
echo 'echo Hello, World!' >> hello.sh

In [None]:
# printing contents of our script

cat hello.sh

In [None]:
# running hello.sh with your shell command

sh hello.sh

#### What happens if you try to write your script using echo and double quotation marks?

In [None]:
echo "#!/bin/sh" > hello.sh
echo "echo Hello, World!" >> hello.sh

#### Making shell scripts executable

Shell scripts can executed as if they were their own program and `sh` or `bash` does not need to be specifically invoked. The executable bits of the script have to be set.

In [None]:
# What are the permissions of our script?

ls -l hello.sh

In [None]:
# Set hello.sh to be executable by its owner

chmod u+x hello.sh

In [None]:
# Run hello.sh as an executable
# prefixing the filename with a relative path since it is not in $PATH

./hello.sh

In [None]:
# what type of file is hello.sh

file hello.sh

In [None]:
# Setting a script to be executable is not always necessary
# One can have bash read the script, then execute its contents

bash hello.sh

## Passing arguments to a script

A script can behave as a function, where it accepts arguments. The arguments can be accessed within a script through some special variables.

```
$1 - $9 | The first 9 arguments to the Bash script

$#      | How many arguments were passed to the Bash script

$@      | All the arguments supplied to the Bash script
```


In [None]:
# build a script that echos the arguments passed into it

echo '#!/bin/sh' > echo.sh
echo 'echo $1' >> echo.sh
echo 'echo $2' >> echo.sh
echo 'echo $@' >> echo.sh
echo 'echo $#' >> echo.sh

chmod u+x echo.sh

In [None]:
./echo.sh hello audience

#### Other special variables

```
$0        |  The name of the Bash script
$?        | The exit status of the most recently run process
$$        | The process ID of the current script
$USER     | The username of the user running the script
$HOSTNAME | The hostname of the machine the script is running on
$SECONDS  | The number of seconds since the script was started
$RANDOM   | Returns a different random number each time is it referred to
$LINENO   | Returns the current line number in the Bash script
```

In [None]:
# example
# heredoc format is being used to write the script file -- should be easier to copy and paste into a terminal

cat > special_var.sh << "EOF"
#!/bin/sh
echo 'This script is' $0
echo 'The last process exited with exit code' $?
echo 'The process id is' $$
echo 'The script was run by' $USER
echo 'The name of this node is' $HOSTNAME
echo 'The script was started' $SECONDS 'seconds ago'
echo 'The script generated the random number ' $RANDOM
echo 'This is the' $LINENO'th line of the script'
EOF

chmod u+x special_var.sh

In [None]:
cat special_var.sh

In [None]:
./special_var.sh

## Variables

Like other programming languages, data can be saved to the system's memory by assigning variables with the `=` assignment operator.

In [None]:
# Build a script that builds a very small datafile
# The script passes a single argument to the filename and content of the file.

cat > boring_data.sh << "EOF"
#!/bin/sh

# Note the lack of white space in the assignment
PREFIX='data'

echo 'Boring Header' > $PREFIX$1.txt
echo $1 >> $PREFIX$1.txt
EOF

In [None]:
chmod u+x boring_data.sh

./boring_data.sh 90210

In [None]:
cat data90210.txt

### Scope of Variables

Variables assigned in a script are local to the instance of that executed script and should not 'leak' out to other environments.

In [None]:
# Locality -- should this work?

echo $PREFIX

In [None]:
# How about using `export` to assign our variable

cat > boring_data.sh << "EOF"
#!/bin/sh

# Note the lack of white space in the assignment
export PREFIX='data'

echo 'Boring Header' > $PREFIX$1.txt
echo $1 >> $PREFIX$1.txt
EOF

chmod u+x boring_data.sh
./boring_data.sh 90210

In [None]:
# Locality -- should this work? How does export work?

echo $PREFIX

`export` only sets variables in the current environment spawned by the script and any subprocesses run within that environment.

In [None]:
# Locality test -- simple script that contains the variable $PREFIX

cat > prefix.sh << "EOF"
#!/bin/sh
echo $PREFIX
EOF

In [None]:
export PREFIX='UTSW'

bash prefix.sh

## Arithmetic expressions

From `man bash`:

```
The  shell allows arithmetic expressions to be evaluated, under certain circumstances (see the let
and declare builtin commands and Arithmetic Expansion).  Evaluation is done in  fixed-width  inte‐
gers  with  no  check  for overflow, though division by 0 is trapped and flagged as an error.  The
operators and their precedence, associativity, and values are the same as in the C language.   The
following  list of operators is grouped into levels of equal-precedence operators.  The levels are
listed in order of decreasing precedence.

id++ id--
      variable post-increment and post-decrement
++id --id
      variable pre-increment and pre-decrement
- +    unary minus and plus
! ~    logical and bitwise negation
**     exponentiation
* / %  multiplication, division, remainder
+ -    addition, subtraction
```

Mathematical expressions can be evaluated in several ways.

 - let
 - expr
 - double round-brackets 

In [None]:
# let is an internal bash function that only does integer math

cat > let.sh << "EOF"
#!/bin/sh

let a=40+2       #addition
echo $a

let 'a = 40 - 2' #subtraction
echo $a

let a++          #increment
echo $a

let "a = 4 * 5"  #multiplication
echo $a # 20

let "a = 4 / 3"  # division
echo $a

let "a = 2 ** 2" #exponents 
echo $a

let "a = $1 + 30"
echo $a # 30 + first command line argument
EOF

In [None]:
bash let.sh

In [None]:
# expr example - Converting from Unix time to a Unix date.
# Actually, quite useful for sys admins!

cat > unix_date.sh << "EOF"
#!/bin/sh

# date +%s is the number of seconds since 1970-01-01 "Unix time"
expr $(date +%s) / 86400

EOF

In [None]:
bash unix_date.sh

In [None]:
# double round-brackets

cat > arith.sh << "EOF"
#!/bin/sh

echo $((40 + 2))

echo $((40 - 2))

echo $((4 * 5))

echo $((4 / 3))

a=$((2 ** 2)) 
echo $a

((a++))
echo $a

EOF

In [None]:
bash arith.sh

## Looping

Any computing language allows for repetition to save time. `bash` implements looping through the `for`, `while`, and `until` statements.

The `for` statement will iterate over a set of elements.

The `while` statement will loop a set of statements until some condition is false.

The `until` statement will loop a set of statements until some condition is true.

### For loops

In [None]:
# for loop over discrete elements

cat > mkdir_train.sh << "EOF"

#!/bin/sh
for A in train01 train02 train03 train04 train05 train06 train07 train08 train09 train10 train11 train12 train13 train14 train15 train17 train18 train19 train20
do
  mkdir -p Sep2019/$A
  chmod 700 Sep2019/$A
done

EOF

In [None]:
bash mkdir_train.sh

ls -la Sep2019

In [None]:
# cleanup training folders

rm -r Sep2019

In [None]:
# for loop over a range of numbers
# some key differences from previous script
# printf is used to format each iterable

cat > mkdir_train.sh << "EOF"
#!/bin/sh

TRAINDIR=Sep2019
PREFIX=$TRAINDIR/train

for A in {1..20}
do
  B=$(printf "%02d\n" $A)

  mkdir -p $PREFIX$B
  chmod 700 $PREFIX$B
done

# add a shared folder
mkdir -p $TRAINDIR/shared
chmod 750 $TRAINDIR/shared

EOF

In [None]:
bash mkdir_train.sh

ls -la Sep2019

In [None]:
# bash also supports C-like for loops syntax

rm -r Sep2019

cat > mkdir_train.sh << "EOF"
#!/bin/sh

TRAINDIR=Sep2019
PREFIX=$TRAINDIR/train

for ((A=1; A <= 20 ; A++))
do
  B=$(printf "%02d\n" $A)

  mkdir -p $PREFIX$B
  chmod 700 $PREFIX$B
done

# add a shared folder
mkdir -p $TRAINDIR/shared
chmod 750 $TRAINDIR/shared
EOF

In [None]:
bash mkdir_train.sh

ls -la Sep2019

### While loops

While loops execute commands until some condition is tested and evaluated to be false.

A `test` for a condition occurs within square brackets `[]`.

Some of the more operators to evaluate some condition are presented below.

<table class="fancy">
    <tbody><tr>
        <th>Operator</th>
        <th>Description</th>
    </tr>
    <tr>
        <td>! EXPRESSION</td>
        <td>The EXPRESSION is false.</td>
    </tr>
    <tr>
        <td>-n STRING</td>
        <td>The length of STRING is greater than zero.</td>
    </tr>
    <tr>
        <td>-z STRING</td>
        <td>The lengh of STRING is zero (ie it is empty).</td>
    </tr>
    <tr>
        <td>STRING1 = STRING2</td>
        <td>STRING1 is equal to STRING2</td>
    </tr>
    <tr>
        <td>STRING1 != STRING2</td>
        <td>STRING1 is not equal to STRING2</td>
    </tr>
    <tr>
        <td>INTEGER1 -eq INTEGER2</td>
        <td>INTEGER1 is numerically equal to INTEGER2</td>
    </tr>
    <tr>
        <td>INTEGER1 -gt INTEGER2</td>
        <td>INTEGER1 is numerically greater than INTEGER2</td>
    </tr>
    <tr>
        <td>INTEGER1 -lt INTEGER2</td>
        <td>INTEGER1 is numerically less than INTEGER2</td>
    </tr>
    <tr>
        <td>-d FILE</td>
        <td>FILE exists and is a directory.</td>
    </tr>
    <tr>
        <td>-e FILE</td>
        <td>FILE exists.</td>
    </tr>
    <tr>
        <td>-r FILE</td>
        <td>FILE exists and the read permission is granted.</td>
    </tr>
    <tr>
        <td>-s FILE</td>
        <td>FILE exists and it's size is greater than zero (ie. it is not empty).</td>
    </tr>
    <tr>
        <td>-w FILE</td>
        <td>FILE exists and the write permission is granted.</td>
    </tr>
    <tr>
        <td>-x FILE</td>
        <td>FILE exists and the execute permission is granted.</td>
    </tr>
</tbody></table>

In [None]:
rm -r Sep2019

cat > mkdir_train.sh << "EOF"
#!/bin/sh

TRAINDIR=Sep2019
PREFIX=$TRAINDIR/train

# A must be assigned first
A=1

# here is the condition to be tested for
while [ $A -lt 21 ]
do
  B=$(printf "%02d\n" $A)

  mkdir -p $PREFIX$B
  chmod 700 $PREFIX$B
  ((A++))
done

# add a shared folder
mkdir -p $TRAINDIR/shared
chmod 750 $TRAINDIR/shared
EOF

In [None]:
bash mkdir_train.sh

ls -la Sep2019

### Until loops

In [None]:
rm -r Sep2019

cat > mkdir_train.sh << "EOF"
#!/bin/sh

TRAINDIR=Sep2019
PREFIX=$TRAINDIR/train

# A must be assigned first
A=1

# here is the condition to be tested for
until [ $A -eq 21 ]
do
  B=$(printf "%02d\n" $A)

  mkdir -p $PREFIX$B
  chmod 700 $PREFIX$B
  ((A++))
done

# add a shared folder
mkdir -p $TRAINDIR/shared
chmod 750 $TRAINDIR/shared
EOF

In [None]:
bash mkdir_train.sh

ls -la Sep2019

### Looping over a set of files

In [None]:
# loop over each training folder and create a .bashrc, .bash_profile, and SSH key pair

cat > mkdir_train.sh << "EOF"
#!/bin/sh

for i in Sep2019/train*; do
  cp ~/.bashrc $i/
  cp ~/.bash_profile $i/
  mkdir $i/.ssh
  ssh-keygen -t rsa -b 2048 -C "" -P "" -f $i/.ssh/train_key -q
done

EOF

In [None]:
# run script

bash mkdir_train.sh

In [None]:
cat Sep2019/train01/.ssh/train_key

## Arrays

Arrays can be declared by enclosing a set of elements with round brackets. Arrays are iterable.

Elements of an array have a numeric index beginning at zero (like Python), and can be referenced individually.

`${ARRAY[@]}` will list all the elements of an array

`${ARRAY[0]}` prints the first element of the array

In [None]:
# declaring an array and printing its contents

USERS=( s178337 s178722 hatawang rbateman s173217 zpang1 s183990 )

echo ${USERS[@]}   

In [None]:
# printing the 5th element of the array

echo ${USERS[4]}

In [None]:
# printing the first five elements

echo ${USERS[@]:0:5}

In [None]:
# iterating over an array

# Here we retrieve the full name of the user

for i in "${USERS[@]}"; do
    #echo $i
    NAME=$(getent passwd $i | awk 'BEGIN {FS=":"} {print $5}' | awk 'BEGIN {FS=","} {print $1}')
    echo $NAME
done

## Conditional Statements

Conditional statements can be used to control the flow of a script. The condition is contained within a pair of square brackets `[]` or `test` operator.

The `then` clause is executed if the exit code is `0`.

```
if [ <some test> ]; then
    <commands>
fi
```

If the condition does not exit with `0`, then the construct can be ignored completely, or a separate clause of commands can be executed.

```
if [ <some test> ]; then
    <commands>
else
    <commands>
fi
```

If multiple conditions need to be evaluated, then  else-if `elif` checks can be implemented.

```
if [ <some test> ]; then
    <commands>
elif [ <some other test> ]
    <commands>
elif [ <some other test> ]
    <commands>
else
    <commands>
fi
```

In [None]:
# Check we are running under slurm and have a job ID, otherwise exit

cat > slurm_check.sh << "EOF"
#!/bin/sh

#SBATCH -J SLURM_chk                  # Job name
#SBATCH -o SLURM_chk.out              # Name of stdout output file
#SBATCH -p 32GB                       # Queue name
#SBATCH -N 1                          # Total number of nodes requested
#SBATCH -t 00:01:00                   # Run time (hh:mm:ss)

# our conditional if-then statement
# -z checks if the variable has a value
if [ -z $SLURM_JOB_ID ]; then
   echo "No SLURM job ID is set - this script must be run as a SLURM batch job."
   exit 1
fi

echo "Running as job $SLURM_JOB_ID"
echo "Running on node(s) $SLURM_JOB_NODELIST"
EOF

In [None]:
# env -i to ignore the current running environment

chmod u+x slurm_check.sh
env -i ./slurm_check.sh

In [None]:
sbatch slurm_check.sh

In [None]:
cat SLURM_chk.out

In [None]:
# let's illustrate conditional statements with a script that determines the port required for a BioHPC Jupyter session

cat > jupyter_port.sh << "EOF"
#!/bin/sh

# the name of the node
NODE_HOST=$(hostname -s)

# a small pipeline that determines the node number
NODE_NUMBER=`hostname -s | perl -ne 'print $1 if /(\d+)$/;' | sed 's/^0*//'`

# this is set
LOCAL_JUPYTER_DISPLAY=5

# double square brackets is a bash-specific test variation that allows for wildcard expansion
# this feature is not always portable to other shells

if [[ $NODE_HOST = NucleusA* ]]; then
    NODE_GROUP=2300
elif [[ $NODE_HOST = NucleusB* ]]; then
    NODE_GROUP=2600
elif [[ $NODE_HOST = NucleusC* ]]; then
    NODE_GROUP=3000
elif [[ $NODE_HOST = NucluesD* ]]; then
    NODE_GROUP=3300
elif [[ $NODE_HOST = NucleusE* ]]; then
    NODE_GROUP=3600
else
    NODE_GROUP=2000
fi

LOGINPORT=$(($NODE_GROUP+$NODE_NUMBER))$LOCAL_JUPYTER_DISPLAY

echo "You are on" $NODE_HOST"."
echo "This node's group number is" $NODE_GROUP"."
echo "The login port is" $LOGINPORT"."

EOF

In [None]:
bash jupyter_port.sh

### On the condition of an exit-code

Conditional statements rely on the exit code of the condition enclosed within the `test` operator `[]`. By extension, the exit code of a command can also be used in an if-then statement.

```
if <command> then;
    <commands>
fi
```

The command can also be enclosed with round-brackets `(<command>)` to force it to run in a sub-shell. A typical reason for using a subshell like this is to limit side-effects of command if command required variable assignments or other changes to the shell's environment. Such changes do not remain after the subshell completes.

```
if (<command>) then;
    <commands>
fi
```

In [None]:
cat > run_check.sh << "EOF"
#!/bin/sh

PROC=jupyter_port.sh

# let's run our juypter_port script in a subshell and echo if it ran successfully, i.e., exited with code 0
if (bash $PROC); then
    echo "The process ran successfully."
fi
EOF

In [None]:
bash run_check.sh

## Functions

A function can return a value in four ways:

1. Change the state of a variable or variables
2. Use the `exit` command to end the shell script
3. Use the `return` command to end the function, and return the supplied value to the calling section of the shell script
4. `echo` output to `stdout` which will be caught by the caller

`exit` stops the program, and `return` returns control to the caller. A shell function *cannot* change its local parameters, but it can change global parameters.

Functions can be declared in two ways:

```
function_name () {
    <commands>
}
```

or

```
function function_name {
    <commands>
}
```

In [None]:
# let's modify our slurm_check script so that the SLURM id check is called via a function

cat > slurm_check.sh << "EOF"
#!/bin/sh

#SBATCH -J SLURM_chk                  # Job name
#SBATCH -o SLURM_chk.out              # Name of stdout output file
#SBATCH -p 32GB                       # Queue name
#SBATCH -N 1                          # Total number of nodes requested
#SBATCH -t 00:01:00                   # Run time (hh:mm:ss)

# our conditional if-then statement
# -z checks if the variable has a value

# function that checks for a job id -- exits script with code 1 if no id is found
id_check() {
    if [ -z $SLURM_JOB_ID ]; then
       echo "No SLURM job ID is set - this script must be run as a SLURM batch job."
       exit 1
    fi
}

id_check

echo "Running as job $SLURM_JOB_ID"
echo "Running on node(s) $SLURM_JOB_NODELIST"
EOF

In [None]:
# env -i to ignore the current running environment
# the script should break before the final two echo commands

chmod u+x slurm_check.sh
env -i ./slurm_check.sh

In [None]:
cat > jupyter_port.sh << "EOF"
#!/bin/sh

# the name of the node
NODE_HOST=$(hostname -s)

# a small pipeline that determines the node number
NODE_NUMBER=`hostname -s | perl -ne 'print $1 if /(\d+)$/;' | sed 's/^0*//'`

# this is set
LOCAL_JUPYTER_DISPLAY=5

# double square brackets is a bash-specific test variation that allows for wildcard expansion
# this feature is not always portable to other shells

# here return is used to return control back from the function to the script
node_grp() {
    if [[ $NODE_HOST = NucleusA* ]]; then
        NODE_GROUP=2300
    elif [[ $NODE_HOST = NucleusB* ]]; then
        NODE_GROUP=2600
    elif [[ $NODE_HOST = NucleusC* ]]; then
        NODE_GROUP=3000
    elif [[ $NODE_HOST = NucluesD* ]]; then
        NODE_GROUP=3300
    elif [[ $NODE_HOST = NucleusE* ]]; then
        NODE_GROUP=3600
    else
        NODE_GROUP=2000
    fi
    return
}

node_grp

LOGINPORT=$(($NODE_GROUP+$NODE_NUMBER))$LOCAL_JUPYTER_DISPLAY

echo "You are on" $NODE_HOST"."
echo "This node's group number is" $NODE_GROUP"."
echo "The login port is" $LOGINPORT"."

EOF

In [None]:
bash jupyter_port.sh