### Bash scripting

 - Bash stands for Bourne Again Shell (a pun)
 - Developed in the 80's
 - Why Bash scripting?
     - Ease of execution of shell commands (save and run instead of copying shell commands)
     - Powerfull programming constructs
         - Popular shell and the default in unix and mac computer systems
         - Unix is the internet (command line is used by cloud, data pipelines and ml apps)



In [None]:
# Searching a book with shell
cat two_cities.txt | egrep 'Sydney Carton|Charles Darnay' | wc -l

#### Bash script anatomy

Key defining features:
   - First line has a she-bang or hash-bang (on its own line) then the path to where Bash is
      - #!/usr/bash
      - that's to let the interpreter know it is a Bash scirpt and to use Bash located in /usr/bash
      - this could be a different path if you have Bash installed somewhere else such as /bin/bash -> use " which bash " to check
   - Middle of the script:
      - It contains lines of code, such as line-by-line commands or programming constructs

To save and run:
   - Use a file extension .sh
      - But technically if the first line has the she-bang and path to Bash (#!/usr/bash) the file extension is not needed
   - Run by using the command " bash script_name.sh "
      - If the first line is the she-bang (#!/usr/bash) you can simply run the script using " ./script_name.sh "

#### Example

 - Script:<br>
    **#!/usr/bash<br>
    cat animals.txt | cut -d " " -f 2 | sort | uniq -c**



 - Create a single-line pipe to cat the file, cut out the relevant field and aggregate (sort & uniq -c will help!) based on winning team.
 - Save your script and run from the console.

In [9]:
# The character ! allows to run terminal commands in Jupyter notebook
# In the terminal use this without the exclamation mark
# Examine the bash file to check the shell command
# cat assets/bash_scripting/soccer_scores.csv | cut -d "," -f 2 | tail -n +2 | sort | uniq -c

! bash assets/bash_scripting/soccer.sh

  13 Arda
   8 Beroe
   9 Botev
   8 Cherno
  17 Dunav
  15 Etar
   4 Levski
   1 Lokomotiv


 - Create a pipe using sed twice to change the team Cherno to Cherno City first, and then Arda to Arda United.
 - Pipe the output to a file called soccer_scores_edited.csv.
 - Save your script and run from the console. Try opening soccer_scores_edited.csv using shell commands to confirm it worked (the first line should be changed)!

In [11]:
! bash assets/bash_scripting/soccer_edited.sh
! head -n 10 assets/bash_scripting/soccer_scores_edited.csv 

﻿Year,Winner,Winner Goals
1932,Arda United,4
1933,Botev,1
1934,Cherno City,5
1935,Dunav,2
1936,Cherno City,4
1937,Dunav,4
1938,Beroe,5
1939,Botev,2
1940,Beroe,3


#### Standard streams and arguments

 - STDIN (standard input), a stream of data into the program
 - STOUT (standard output), a stream of data out the program
 - STDERR (standard error), errors in the program
 - By default these streams come from and write to the terminal
 - If a script contains 2> /dev/null then STDERR is redirected to be deleted
 - If a script contains 1> /dev/null then STDOUT is redirected to be deleted
 
<img src="assets/bash_scripting/streams.png" style="width: 600px;"/>

#### STDIN vs ARGV
 - A key concept of Bash scripting is arguments
 - Bash scripts take arguments to be used inside by adding a space after the script execution call
 - Each argument can be accessed via the dollar notation
 - dollar1 is the first argument dollar2 the second
     - dollar@ and dollar* give all the arguments in ARGV
     - dollar# gives the length (number) of arguments

In [1]:
# Example:

! bash assets/bash_scripting/args.sh one two three four five six seven eight nine


one
two
one two three four five six seven eight nine
There are  9 arguments


#### Variables

 - You can reference a variable using the dollar notation
 - Bash is not forgiving with spaces when you assign a value to a variable you must not use space around the equal sign. This will work var1="value" but this won't var1 = "value
 - Single, double, backticks:
     - Single quotes ('sometext') = Shell interprets what is between literally
     - Double quotes ("sometext") = Shell interprets literally except using dollar and backticks
     - Backticks (΄sometext΄) Shell runs the command and captures sTDOUT back into a variable (shell-within-a-shell)
     

In [9]:
! bash assets/bash_scripting/var_assignment.sh

Hi there Cynthia Liu
$now_var
NOW
The date is Mon Feb 7 20:32:38 EET 2022.
The date is Mon Feb 7 20:32:38 EET 2022.


#### Numbers

 - Are not natively supported in the Shell
 - epxr is a utility program like cat or grep
 - expr cannot handle decimals, so bs (basic calculator) can be used

In [2]:
! expr 1 + 4

5


In [3]:
! expr 1 + 4.2

expr: not a decimal number: '4.2'


In [4]:
! echo "5 + 7.5" | bc

12.5


In [5]:
! echo "5 / 7.5" | bc

0


In [6]:
# using scale
! echo "scale=2; 5 / 7.5" | bc

.66


In [8]:
# double paranthesis
! echo $((5+7))

12


In [15]:
# model1=87.65
# model2=89.20
# echo "The total score is $(echo "$model1 + $model2" | bc)"
# echo "The average score is $(echo "($model1 + $model2) / 2" | bc)"

! bash assets/bash_scripting/num.sh

The total score is 176.85
The average score is 88


In [17]:
# Convert Fahrenheit to Celcium

! bash assets/bash_scripting/weather.sh 110

43.33


In [28]:
# Shell in a shell example

# # Create three variables from the temp data files' contents
# temp_a=$(cat assets/bash_scripting/temps/region_A)
# temp_b=$(cat assets/bash_scripting/temps/region_B)
# temp_c=$(cat assets/bash_scripting/temps/region_C)

# # Print out the three variables
# echo "The three temperatures were $temp_a, $temp_b, and $temp_c"

! bash assets/bash_scripting/temperature.sh

The three temperatures were 34, 42, and 99


#### Arrays
 - Two ways to create numerical-indexed arrays (lists in Python or R)
     - Decleare without adding elements: declare -a array_name
     - Create and add elements: array_name=(1 2 3)
     - An array can be returned using array[@]
     - Bash requires curly brackets around the array to access these properties

In [35]:
! bash assets/bash_scripting/arrays.sh

Array: 1 2 3 9 10
Length: 5
3rd item in the list: 3
Capital cities array: Sydney Albany Paris
Length: 3
assets/bash_scripting/arrays.sh: line 32: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
assets/bash_scripting/arrays.sh: line 41: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
Associative array: 0.82
assets/bash_scripting/arrays.sh: line 48: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
Associative array keys: 0


In [36]:
! bash assets/bash_scripting/array_temperature.sh

42 99 70.50


#### Conditionals
 - https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html
 
<img src="assets/bash_scripting/simple_if.png" style="width: 600px;"/>

<img src="assets/bash_scripting/if_square_brackets.png" style="width: 600px;"/>

<img src="assets/bash_scripting/if_mult_cond.png" style="width: 600px;"/>

<img src="assets/bash_scripting/if_within_shell.png" style="width: 600px;"/>



In [None]:
# Basic if statement

x="Queen"
if [ $x == "King" ]; then
    echo "$x is a King!"
else
    echo "$x is not a King!"
fi


# Arithmetic statement 1

x=10
if (($x > 5)); then
    echo "$x is more than 5!"
fi

# Arithmetic statement 2

x=10
if [ $x -gt 5 ]; then
    echo "$x is more than 5!"
fi

 - Create a variable, accuracy by extracting the "Accuracy" line (and "Accuracy" value) in the first ARGV element (a file).
 - Create an IF statement to move the file into good_models/ folder if it is greater than or equal to 90 using a flag, not a mathematical sign.
 - Create an IF statement to move the file into bad_models/ folder if it is less than 90 using a flag, not a mathematical sign.

In [None]:
Extract Accuracy from first ARGV element
accuracy=$(grep Accuracy $1 | sed 's/.* //')

Conditionally move into good_models folder
if [ $accuracy -ge 90 ]; then
    mv $1 good_models/
fi

Conditionally move into bad_models folder
if [ $accuracy -lt 90 ]; then
    mv $1 bad_models/
fi

 - Create a variable sfile out of the first ARGV element.
 - Use an IF statement and grep to check if the sfile variable contains SRVM_ AND vpt inside.
 - Inside the IF statement, move matching files to the good_logs/ directory.

In [None]:
# Create variable from first ARGV element
sfile=$1

# Create an IF statement on sfile's contents
if grep -q 'SRVM_' $sfile && grep -q 'vpt' $sfile ; then
  # Move file if matched
  mv $sfile good_logs/
fi


#### Loops


<img src="assets/bash_scripting/basic_loop.png" style="width: 300px;"/>

<img src="assets/bash_scripting/loop_expr.png" style="width: 600px;"/>

<img src="assets/bash_scripting/shell_loop.png" style="width: 600px;"/>

<img src="assets/bash_scripting/while_loop.png" style="width: 400px;"/>

 - Glob expansions: for item in folter/* do echo usd_item done

In [None]:
# Use a FOR loop on files in directory
for file in inherited_folder/*.R
do  
    # Echo out each file
    echo $file
done

 - Use a FOR statement to loop through (using glob expansion) files that end in .py in robs_files/.
 - Use an IF statement and grep (remember the 'quiet' flag?) to check if RandomForestClassifier is in the file. Don't use a shell-within-a-shell here.
 - Move the Python files that contain RandomForestClassifier into the to_keep/ directory.

In [None]:
# Create a FOR statement on files in directory
for file in robs_files/*.py
do  
    # Create IF statement using grep
    if grep -q 'RandomForestClassifier' $file ; then
        # Move wanted files to to_keep/ folder
        mv $file to_keep/
    fi
done

#### Case


<img src="assets/bash_scripting/complex_if.png" style="width: 700px;"/>


In [3]:
! bash assets/bash_scripting/case.sh Wednesday

It is a Weekday!


In [4]:
! bash assets/bash_scripting/case.sh Bla

Not a day!


In [5]:
! bash assets/bash_scripting/case.sh Saturday

It is a Weekend!


- Use a CASE statement to move the tree-based models (Random Forest, GBM, and XGBoost) to the tree_models/ folder, and delete all other models (KNN and Logistic)
 
- Use a FOR statement to loop through (using glob expansion) files in model_out/.

- Use a CASE statement to match on the contents of the file (we will use cat and shell-within-a-shell to get the contents to match against). It must check if the text contains a tree-based model name and move to tree_models/, otherwise delete the file.

- Create a default match that prints out Unknown model in FILE where FILE is the filename then run your script.

In [None]:
# Use a FOR loop for each file in 'model_out'
for file in model_out/*
do
    # Create a CASE statement for each file's contents
    case $(cat $file) in
      # Match on tree and non-tree models
      *"Random Forest"*|*GBM*|*XGBoost*)
      mv $file tree_models/ ;;
      *KNN*|*Logistic*)
      rm $file ;;
      # Create a default
      *) 
      echo "Unknown model in $file" ;;
    esac
done

#### Functions

<img src="assets/bash_scripting/functions.png" style="width: 700px;"/>

 - Or function function_name () {}
 
<img src="assets/bash_scripting/function_example.png" style="width: 700px;"/>

 - Set up a function using the 'function-word' method called upload_to_cloud.
 - Use a FOR statement to loop through (using glob expansion) files whose names contain results in output_dir/ and echo that the filename is being uploaded to the cloud.
 - Call the function just below the function definition using its name.

In [None]:
# Create function
function upload_to_cloud () {
  # Loop through files with glob expansion
  for file in output_dir/*results*
  do
    # Echo that they are being uploaded
    echo "Uploading $file to cloud"
  done
}

# Call the function
upload_to_cloud

 - Set up a function called what_day_is_it without using the word function (as you did using the function-word method).
 - Parse the output of date into a variable called current_day. The extraction component has been done for you.
 - Echo the result.
 - Call the function just below the function definition.

In [None]:
# Create function
function what_day_is_it {

  # Parse the results of date
  current_day=$(date | cut -d " " -f1)

  # Echo the result
  echo $current_day
}

# Call the function
what_day_is_it

#### Arguments
    - variables are global by default

 - Create a function called return_percentage using the function-word method.
 - Create a variable inside the function called percent that divides the first argument fed into the function by the second argument.
 - Return the calculated value by echoing it back.
 - Call the function with the mentioned test values of 456 (the first argument) and 632 (the second argument) and echo the result.

In [7]:
! bash assets/bash_scripting/fun.sh

456 out of 632 as a percent is 72.15%


 - Create a function called get_number_wins using the function-word method.
 - Create a variable inside the function called win_stats that takes the argument fed into the function to filter the last step of the shell-pipeline presented.
 - Call the function using the city Etar.
 - Below the function call, try to access the win_stats variable created inside the function in the echo command presented.

In [9]:
! bash assets/bash_scripting/analytics.sh

The aggregated stats are:   15 Etar


 - Create a function called sum_array and add a base variable (equal to 0) called sum with local scope. You will loop through the array and increment this variable.
 - Create a FOR loop through the ARGV array inside sum_array (hint: This is not dollar_1! but another special array property) and increment sum with each element of the array.
 - Rather than assign to a global variable, echo back the result of your FOR loop summation.
 - Call your function using the test array provided and echo the result. You can capture the results of the function call using the shell-within-a-shell notation.

In [10]:
! bash assets/bash_scripting/sum.sh

The total sum of the test array is 84.84


#### Cron

Create a schedule for 30 minutes past 2am every day<br>
30 2 * * * bash script1.sh

Create a schedule for every 15, 30 and 45 minutes past the hour<br>
15,30,45 * * * * bash script2.sh

Create a schedule for 11.30pm on Sunday evening, every week<br>
30 23 * * 0 bash script3.sh