# Introduction to Bash Scripting

## Chapter 1: From Command-Line to Bash Script

### Introduction and refresher

#### Introduction to the course
This course will cover:
* Moving from command-line to a Bash script
* Variables and data types in Bash scripting
* Control statements
* Functions and script automation

#### Why Bash scripting? (Bash)
Firstly, let's consider why Bash?
* Bash stands for '**B**ourne **A**gain **S**hell' (a pun)
* Developed in the 80's but a very popular shell today. Default in many Unix systems, Macs
* Unix is the internet! (Running ML Models, Datat Pipeline)
    * AWS, Google, Microsoft all have CLI's to their products
    
#### Why Bash scripting? (scripting!)
So why Bash scripting?
* Ease of execution of shell commands (no need to copy-paste every time!)
* Powerful programming constructs

#### Shell commands refersher
Some important shell commands:
* `(e)grep` filters input based on regex pattern matching
* `cat` concatenates file contents line-by-line
* `tail` \ `head` give only the last `-n` (a flag) lines
* `wc` does a word or line count (with flags `-w` `-l`)
* `sed` does pattern-matched string replacement

### Your first Bash script

#### Bash script anatomy
A Bash script has a few key defining features:
* It usually begins with `#!/usr/bash` (on its own line)
    * So your interpreter knows it is a Bash scrip and to use Bash located in `/usr/bash`
    * This coule be a different path if you installed Bash somewhere else (type `which bash` to check)
* Middle lines contain code
    * This may be line-by-line commands or programming constructs
    
To save and run:
* It has a file extention `.sh`
    * Technically not needed if first line has the she-band and path to Bash (`#!/usr/bash`), but a convention
* Can be run in the terminal using `bash script_name.sh`
    * Or if you have mentioned first line (`#!/usr/bash`) you can simply run using `./script_name.sh`
     
#### Bash script example
An example of a full script (called `eg.sh`) is:

In [None]:
#!/usr/bash
echo "Hello world"
echo "Goodbye world"

Could be run with the command `./eg/sh`.

#### Bash and shell commands
Each line of your Bash script can be a shell command.

Therefore, you can also include pipes in your Bash scripts.

Consider a text file (`animals.txt`)

Count the animals in each group.

In shell you could write a chained command in the terminal. Let's instead put that into a script (`group.sh`):

In [None]:
#!/usr/bash
cat animals.txt | cut -d " " -f 2 | sort | uniq -c

Now (after saving the script) run `bash group.sh`.

### Standard Streams and Arguments

#### STDIN-STDOUT-STDERR
In Bash scripting, there are three 'streams' for your program:
* STDIN (standard input). A stream of data into the program
* STDOUT (standard output). A stream of data **out** of the program
* STDERR (standard error). Errors in your program

By default, these streams will come from and write out to the terminal.

Though you may see `2> /dev/null` in script calls; redirecting STDERR to be deleted. (`1> /dev/null` would be STDOUT)

#### STDIN example
Consider a text file (`sport.txt`) with 3 lines of data.

The `cat sports.txt 1> new_sports.txt` command is an example of taking data from the file and writing STDOUT to a new file. See what happends if you `cat new_sports.txt`

#### STDIN vs. ARGV
A key concept in Bash scripting is **arguments**

Bash scripts can take **arguments** to be used inside by adding a space aft er the script execution call.
* ARGV is the array of all the arguments given to the program.
* Each argument can be accessed via the `$` notation. The first as `$1`, the second as `$2` etc.
* `$@` and `$*` give all the arguments in ARGV.

#### ARGV example
Consider an example script (`args.sh`)

In [None]:
#!/usr/bash
echo $1
echo $2
echo $@
echo "There are " $# "arguments"

Now running `bash args.sh one two three four five`, the output is:

In [None]:
# one
# two
# one two three four five
# There are 4 arguments

## Chapter 2: Variables in Bash Scripting

### Basic variables in Bash
Similar to other languages, you can assign variables with the equals notation.

In [None]:
var1="Moon"

Then reference with `$` notation.

In [None]:
echo $var1

#### Assigning string variables
Name your variable as you like (something sensible!):

In [None]:
firstname='Cynthia'
lastname='Liu'
echo "Hi there" $firstname $lastname

#### Missing the $ notation
If you miss the `$` notation - it isn't a variable!

In [1]:
! echo "Hi there" firstname lastname

Hi there firstname lastname


#### (Not) assigning variables
Bash is not very forgiving about spaces in variable creation. Beware of adding spaces!

In [2]:
! var1 = "Moon" | echo $var1


/bin/sh: var1: command not found


#### Single, double, backticks
In Bash, using different quotation markets can mean different things. Both when creating variables and printing.
* Single quotes (`'sometext'`) = Shell interprets what is between literally
* Double quotes (`"sometext"`) = Shell interprets literally **except** using `$` and backticks \``

The last way createds a 'shell-within-a-shell', outlined below. Useful for alling command-line programs. This is done with backticks.
* Backticks(\`sometext\`) = Shell runs the command and captures STDOUT back into a variable

#### Different variable creation
Let's see the effect of different types of variable creation

In [None]:
now_var='NOW'
now_var_singlequote='$now_var'
echo $now_var_singlequote
# ------------
# $now_var

In [None]:
now_var_doublequote="$now_var"
echo $now_var_doublequote
# ------------
# NOW

#### The date program
The `Date` program will be useful for demonstrating backticks

Normal output of this program:

In [None]:
date
# ------------
# Sun Apr  5 20:16:19 PDT 2020 

#### Shell within a shell
Let's use the shell-within-a-shell now:

In [None]:
rightnow_doublequote="The date is `date`."
echo $rightnow_doublequote
# ------------
# The date is Sun Apr  5 20:18:07 PDT 2020.

The date program was called, output captured and combined in-line with the `echo` call.

#### Parenteses vs backtics
There is an equivalent to the backtick notation:

In [None]:
rightnow_doublequote="The date is `date`."
rightnow_parentheses="The date is $(date)."
echo $rightnow_doublequote
echo $rightnow_parentheses
# --------------
# The date is Sun Apr  5 20:18:07 PDT 2020.
# The date is Sun Apr  5 20:18:07 PDT 2020.

Both work the same though usins backticks is older. Parenthesis is used more in modern applications.

### Numeric variables in Bash

#### Numbers in other languages
Numbers are not built in natively to the shell like most REPLs (console) such as R and Python

In Python or R you may do:

In [6]:
1 + 4

5

It will return what you want!

#### Numbers in the shell
Numbers are not natively supported (you will get an error running the same thing as above)

#### Introducing expr
`expr` is a useful utility program (just like `cat` or `grep`)

This will now work (in the terminal):

In [8]:
! expr 1 + 4

5


#### expr limitations
`expr` cannot natively handle decimal places:

In [10]:
! expr 1 + 2.5

expr: not a decimal number: '2.5'


#### Introducting bc
`bc` (basic calculator) is a useful command-line program.

#### Getting numbers to bc
Using `bc` without opening the calculator is possible by piping:

In [11]:
! echo "5 + 7.5" | bc

12.5


#### bc scale argument
`bc` also has a `scale` argument for how many decimal places.

In [13]:
! echo "10 / 3" | bc

3


In [14]:
! echo "scale=3; 10/3" | bc

3.333


Note the use of `;` to separate 'line' in terminal

#### Numbers in Bash scripts
We can assign numeric variables just like string variables:

In [None]:
dog_name='Roger'
dog_age=6
echo "My dog's name is $dog_name and he is $dog_age years old"

Beware that `dog_age="6"` will work, but makes it a string!

#### Double bracket notation
A variant on single bracket variable notation for numeric variables:

In [15]:
! expr 5 + 7

12


In [16]:
! echo $((5 + 7))

12


Beware this method uses `expr`, not `bc` (no decimals!)

#### Shell within a shell revisited
Remember how we called out to the shell in the previous lesson?

Very useful for numeric variables:

In [None]:
model1=87.65
model2=89.20
echo "The total score is $(echo "$model1 + $model2" | bc)"
echo "The average score is $(echo "($model1 + $model2) / 2" | bc)"

### Arrays in Bash

#### What is an array?
Two types of arrays in Bash:
* An array
    * 'Normal' numerical-indexed structure
    * Called a 'list' in Python or 'vector' in R
    
#### Creating an array in Bash
Creation of a numerical-indexed can be done in two ways in Bash.

1. Declare without adding elements

In [None]:
declare -a my_first_array

2. Create an add elements at the same time

In [None]:
my_first_array=(1 2 3)

Remember - no spaces around equlas sign!

#### Be careful of commas!
Commas are not used to separate array elements in Bash.

#### Important array properties
* All array elements can be returns using `array[@]`. Though do note, Bash requires curly brackets around the array name when you want to access these properties.

In [None]:
my_array=(1 3 5 2)
echo ${my_array[@]}

* The length of an array is accessed using `#array[@]`

In [None]:
echo ${#my_array[@]}

#### Manipulating array elements
Accessing array elements using square brackets.

In [None]:
my_first_array=(15 20 300 42)
echo ${my_first_array[2]}
# ------------
# 300

* Remember: Bash uses zero-indexing for arrays like Python (but unlike R!)

Set array elements using the index notation

In [None]:
my_first_array=(15 20 300 42 23 2 4 33 54 67 66)
my_first_array[0]=999
echo ${my_first_array[0]}
# -------------
# 999

* Remember: don't use the `$` when overwriting an index such as `$my_first_array[0]=999`, as this will not work.

Use the notation `array[@]:N:M to 'slice' out a subset of the array.
* Here `N` is the starting index and `M` is how many elements to return.

In [None]:
my_first_array=(15 20 300 42 23 2 4 33 54 67 66)
echo ${my_first_array[@]:3:2}
# -------------
# 42 34

#### Appending to arrays
Append to an array using `array+=(elements)`.

For example:

In [None]:
my_array=(300 42 23 2 4 33 53 67 66)
my_array+=(10)
echo ${my_array[@]}
# ------------
# 300 42 23 2 4 33 53 67 66 10

#### (Not) appending to arrays
What happens if you do not add parentheses around what you want to append? Let's see.

For example:

In [None]:
my_array=(300 42 23 2 4 33 53 67 66)
my_array+=10
echo ${my_array[@]}
# ------------
# 30010 42 23 2 4 33 53 67 66

#### Associative arrays
* An **associative** array
    * Similar to a normal array, bu with key-value pairs, not numerical indexes
    * Similar to Python's dictionary or R's list
    * Note: This is only available in Bash 4 onwards. Some modern macs have old Bash! Check with `bash --version` in terminal
    
#### Creating an associative array
You can only create an associative array using the declare syntax (and uppercase `-A`).

You can either declare first, then add element or do it all on one line.
* Surround 'keys' in square brackets, then associate a value after the equals sign.
    * You may add multiple elements at once.
    
#### Associative array example
Let's make an associative array:

In [None]:
declare -A city_details # Declaire first
city_detaisl=([city_name])="New York" [population]140000000) # Add elements
echo ${city_details[city_name]} # Index using key to return a value
# --------------
# New York

#### Creating an associative array
Alternatively, creat an associative array and assing in one line
* Everything else is the same

In [None]:
declare -A city_details=([city_name]="New York" [population]=1400000)

Access the 'keys' of an associative array with an `!`

In [None]:
echo ${!city_details[@]} # Return all the keys
# ---------------
# city_name city_size

## Chapter 3: Control Statements in Bash Scripting

### IF statements

#### A basic IF statement
A basic IF statement in Bash has the following structure:

In [None]:
if [ CONDITION ]; THEN
    # SOME CODE
else
    # SOME OTHER CODE
fi

Two Tips:
* Spaces between square brackets and conditional elements inside (first line)
* Semi-colon after close-bracket `];`

#### IF statement and strings
We could do a basic comparison in an IF statment:

In [None]:
x="Queen"
if [ $x == "King" ]; then
    echo "$x is a King!"
else
    echo "$x is nto a King!"
fi

#### Arithmetic IF statements (option 1)
Arithmetic IF statements cas use the double-parenthesis structure:

In [None]:
x=10
if (($x > 5)); then
    echo "$x is more than 5!"
fi

#### Arithmetic IF statements (option 2)
Arithmetic IF statements can also use square brackets and arithemtic flag rather than ( `>`, `<`, `=`, `!=` etc.):
* `-eq` for 'equal to'
* `-ne` for 'not equal to'
* `-lt` for 'less than'
* `-le` for 'less than or equal to'
* `-gt` for 'greater than'
* `-ge` for 'greater than or equal to'

#### Arithmetic IF statement example
Here we re-create the last example using square bracket notation:

In [None]:
x=10
if [ $x -gt 5]; then
    echo "$x is more than 5!"
fi

#### Other Bash conditional flags
Bash also comes with a variety of file-related flags such as:
* `-e` if the file exists
* `-s` if the file exists and has size greater than zero
* `-r` if the file exists and is readable
* `-w` if the file exists and is writable

#### Using AND and OR in Bash
To combine conditions (AND) or use an OR statement in Bash you can use the following symbols:
* `&&` for AND
* `||` for OR

#### Multiple conditions
In Bash you can either chain conditionals as follows:

In [None]:
x=10 if [ $x -gt 5 ] && [$x -lt 11 ]; then
    echo "$x is more than 5 and less than 11!"
fi

Or use double-square-bracket notation:

In [None]:
x=1-
if [[ $x -gt 5 && $x -lt 11 ]]; then
    echo "$x is more than 5 and less than 11!"
fi

#### IF and commmand-line programs
You can also use many command-line programs directly in the conditional, removing the square brackets.

For exampl,e if the file `words.txt` has 'Hello World!' inside:

In [None]:
if grep -q 'Hello' words.txt; then
    echo "Hello is inside!"
fi

#### IF with shell-within-a-shell
Or you can call a shell-within-a-shell as well for your conditional.

Let's rewrite the last example, which will product the same result.

In [None]:
if $(grep -q 'Hello' words.txt); then
    echo "Hello is inside!"
fi

### FOR loops & WHILE statements
#### FOR Loop in Bash
The basic structure in Bash is similar as in R or Python:

In [None]:
for x in 1 2 3
do
    echo $x
done

#### FOR loop number ranges
Bash has a sneat way to create a numeric reange called 'brace expansion':
    * `{START..STOP..INCREMENT}`

In [None]:
for x in {1..5..2}
do
    echo $x
done

#### FOR loop three expression syntax
Another common way to write FOR loops is the 'three expression' syntax.
* Surround three expressions with ouble parenthesis
* The first part is the start expression (`x=2`)
* The middle part is the terminating condition (`x<=4`)
* The end part is the increment (or decrement) expression (`x+=2`)

#### Glob expansions
Bash also allows pattern-matching expansions into a for loop using the `*` symbol such as files in a directory.

For example, assume there are two text documents in the folder `/books`:

In [None]:
for book in books/*
do
    echo $book
done

#### Shell-within-a-shell revisited
Remember creating a shell-within-a-shell using `$()` notation?
You can call in-place for a for loop!
Let's assume a folder structure like so:

books/

|--- AirportBook.txt

|--- CattleBook.txt

|--- FiarMarketBook.txt

|--- LOTR.txt

|--- file.csv

#### Shell-within-a-shell to FOR loop
We could loop through the result of a call to shell-within-a-shell:

In [None]:
for book in $(ls books/ | grep -u 'air')
do
    echo $book
done

#### WHILE statement syntax
Similar to a FOR loop. Except you set a condition which is tested at each iteration.

Iterations condinue until this is no longer met!
* Use the word `while` instaed of `for`
* Surround the condtion in square brackets
    * Use of same flags for numerical comparison from IF statemetns (such as `-le`)
* Mulitple conditions can be chained or use double-brackets just like 'IF' statements along with `&&` (AND) or `||` (OR)
* Ensure there is a chance inside the code that will trigger a stop (else you may have an infinite loop!)

#### WHILE statemetn example
Here is a simple example:

In [None]:
x=1
while [ $x -le 3 ];
do
    echo $x
    ((x+=1))
done

#### Beware the infinite loop
Bewrare the infinite WHILE loop, if the braek condition is never met.

In [None]:
x=1
while [ $x -le 3 ];
do
    echo $x
    # don't increment x. It never reaches 3!
    # ((x+=1))
done

### CASE statements

#### The need to CASE statements
Case statements can be more optimal than IF statements when you have multiple or complex conditionals.

Let's say you wanted to test the following condtions and actions:
* If a file contains `sydney`  the move it to the `/sydney` directory
* If a file contains `melbourne` or `brisbane` then delete it
* If a file contains `canberra` then rename it ot `IMPORTANT_filename` where `filename` was the original filename

#### A complex IF statement
You could construct multiple IF statements like so:
* This code calls `grep` on the first ARGV argument for the conditional.

In [None]:
if grep -q 'sydney' $1; then
    mv $1 syndey/
fi
if grep =q 'melbourne|brisbane' $1; then
    rm $1
fi
if grep -q 'canberra' $1; then
    mv $1 "IMPORTANT_$1"
fi

* Seems complex and repetitious

#### Build a CASE statement
* Begin by selecting which variable to string to match against
    * You could call shell-within-a-shell here!
* Add as many possible matches & actions as you like
    * You can use regex fo the `PATTERN`. Such as `Air*` for 'starts with Air' or `*hat*` for 'contains hat'.
* Ensure to separate the pattern and code to run by a close-parenthesis and finish commands with double semi-colon
* `*) DEFAULT COMMAND;;`
    * It is common (but not required) to finish wiht a default command that runs if none of the other patterns match.
* `esac` Finally, the finishing word is 'esac' 
    * This is 'case' spelled backwards!

Basic CASE statement format:

In [None]:
case 'STRING' in
    PATTERN1)
    COMMAND1;;
    PATTERN2)
    COMMAND2;;
    *)
    DEFAULT COMMAND;;
esac

#### From IF to CASE

In [None]:
case $(cat $1) in
    *sydney*)
    mv $1 syndey/ ;;
    *melbourne*|*brisbane*)
    rm $1 ;;
    mv $1 "IMPORTANT_$1" ;;
    *)
    echo "No cities found" ;;
esac

## Chapter 4: Functions and Automation

### Basic functions in Bash

#### Why functions?
If you have used them in R or Python then you are familiar with these advantages:
1. Functions are resusable
2. Functions allow neat, compartmetnalized (modular) code)
3. Functions aid sharing code (you only need to know inputs and outputs to use!)

#### Bash funciton anatomy

Let's break down the function syntax:
* Start by naming the function. This is used to all it later.
    * Make sure it is sensible!
* Add open and close parenthesis after the function name
* Add the code inside the curly brackets. You can use anything you have learned so far (loops, IF, shell-within-a-shell etc)!
* Optionally return something (beware! This is not as it seems)

A Bash function has the following syntax:

In [None]:
function_name () {
    #function_code
    return #something
}

#### Alternate Bash function structure
You can also create a function like so:

In [None]:
function function_name {
    #function_code
    return #something
}

The main differences:
* Use the word `function` to denote starting a function build
* You can drop the parenthesis on the opening line if you like, though may people keep them by convention

#### Calling a Bash function
Calling a Bash function is simply writing the name:

In [None]:
function print_hello () {
    echo "Hello world!"
}
print_hello # here we call the function

#### Fahrenheit to Celsius Bash function
Let's write a function to convert Fahrenheit to Celsuis like you did in a previou lession, using a statitc variable.

In [None]:
temp_f=30
function convert_temp () {
    temp_c=$(echo "scale=2; ($temp_f - 32) * 5 / 9" | bc)
    echo $temp_c
}
convert_temp # call the functoin

### Arguments, return values, and scope

#### Passing arguments into Bash functions
Passing arguments into function is similar to how you pass arguments into a script. Using the `$1` notation.

You also have access to the special `ARGV` properties we previously covered:
* Each argument can be accessed via the `$1`, `$2` notation.
* `$@` and `$*` give all the arguments in the `ARGV`
* `$#` gives the length (number of arguments)

#### Passing arguments example
Let's pass some file names as argumentsinto a function to demonstrate. We will loop through them and print them out.

In [None]:
function print_filename {
    echo "The first file was $1"
    for file in $@
    do 
        echo "This file has name $file"
    done
}
print_filename "LOTR.txt" "mod.txt" "A.py"

#### Scope in programming
'Scope' in programming refers to how accessible a variable is.
* 'Global' means something is accessible anywhere in the program, including inside FOR loops, IF statements, functions etc
* 'Local' means something is only accessible in a certain part of the program.

Why does this matter? If you try and access something that only has local scope - your program may fail with an error!

#### Scope in Bash functions
Unlike most programming languages (eg. Python and R), all variables in Bash are global by default.

In [None]:
function print_filename {
    first_filename=$1
}
print_filename "LOTR.txt" "model.txt"
echo $first_filename

Beware global scope may be dangerous as there is more risk of something unintended happening.

#### Restricting scope in Bash functions
You can use the `local` keyword to restrict variable scope.

In [None]:
function print_filename {
    local first_filename=$1
}
echo $first_filename

Q: Why wasn't there an error, just a blank line?

Answer: `first_filename` got assigned to the **global** first ARGV element (`$1`).

I ran the script with no arguments (`bash script.sh`) so this defaults to a blank element. So be careful!

#### Return values
We know how to get arguments in -how about getting them out?

The `return` option in Bash is only meant to determine if the function was a success (0) or failure (other values 1-255). It is captured in the global variable `$?`

Our options are:
1. Assign to a global variable
2. `echo` what we want back (**last line** in function) and captuer using shell-within-a-shell

#### A return error
Let's see a return error:

In [None]:
function function_2 {
    echlo # An error of 'echo'
}
function_2 # Call the function
echo $? # Print the return value

What happened?
1. There was an error when we called the function
    * The script tried to find 'echlo' as a program but it didn't exist
2. The return value in `$?` was 127 (error)

#### Returning correctly
Let's correctly return a value to be used elsewhere in our script using `echo` and shell-within-a-shell capture:

In [None]:
funciton convert_temp {
    echo $(echo "scale=2 ($1 - 32) * 5 / 9" | bc)
}
converted=$(convert_temp 30)
echo "30F in Celsius is $converted C"

* See how we no longer create the intermediary variable?

### Scheduling your scripts with Cron

#### Why schedule scripts?
There are many situations where scheduling scripts can be useful:
1. Regular tasks that need to be done. Perhaps daily, weekly, multiple times per day.
    * You could set yourself a calendar-reminder, but what if you forget1?
2. Optimal use of resrouces (running scripts in early hours of morning)

Scheduling scripts with `cron` is essential to a working knowledge of modern data infrastructures.

#### What is cron?
Cron has been part of unix-like systems since the 70's.

The name comes from the Greek word for time, *chronos*.

It is driven by something called a `crontab`, which is a file that contains `cronjobs`, which each tell `crontab` what code to run and when.

#### Crontab - the driver of cronjobs
You can see what schedules (`cronjobs`) are currently programmed using the following command:

In [None]:
crontab -l

#### Crontab and cronjob structure
This great image from Wikipedia demonstrates how you construct a `cronjob` inside the `crontab` file. You can have many `cronjobs`, one per line.
* There are 5 stars to set, one for each time unit
* The default, `*` means 'every'

#### Cronjob example
Let's walk through some cronjob examples:
` 5 1 * * * bash myscript.sh`
* Minutes star is 5 (5 minutes past the hour). Hours star is 1 (after 1am). The last three are `*`, so every day and month.
    * Overall: **run everyday at 1:05am.**
    
`15 14 * * 7 bash myscript.sh`
* Minutes star is 15 (15 minutes past the hour)> Hours star is 14 (after 2pm). Next two are `*` (Every day of the month, every month of year). Last star is day 7 (on Sundays).
    * Overall: **run at 2:15pm every Sunday.**
    
#### Advanced cronjob structure
If you wanted to run something multiple times per day or every 'X' time increments this is also possible:
* Use a comma for specific intervals. For example:
    * `15,30,45 * * * *` will run at the 15, 30, 45 minutes mark for whatever hours are specific by the second star. Here it is every hour, every day etc.
* Use a slash for 'every X increment'. For example:
    * `*/15 * * * *` runs every 15 minutes. Also for every hour, day etc.
    
#### Your first cronjob
Let's scehdule a script called `extract_data.sh` to run every morning at 1:30am. Your steps are as follows:
1. In terminal type `crontab -e` to edit your list of cronjobs.
    * It may ask what editor you want to use. `nano` is an easy option and a less-steep learning curve thatn vi (vim).
2. Create the cronjob:
    * `30 1 * * * extract_data.sh`
3. Exist the editor to save it

If this was using `nano` (on Mac) you would use `ctrl` + `o` then `enter` then `ctrl` + `x` to exit.

You will see a message `crontab: installing new crontab`

4. Check it is there by running `crontab -l`.