# Bash programming and Linux command-line tools
(Last modified 03/11/22)

<center>
  <img src="figs/bash.png" style="width: 500px;"/>
</center>
    
In our course **IN3110 – Problemløsning med høynivå-språk** we have so far only used Python (mostly; Cython). In this lecture we will add to the family of languages as we will discuss shell(Bash) scripting. The takehome message is that (simple) scripts combined with other command-line utities can provide elegant solutions and powerful pre/processing pipelines for processing data. 

### A bit of history - there were/are many shells

- 1979: Bourne shell (`sh`)
- 1978: C and TC shell (`csh` and `tcsh`)
- 1989: Bourne Again shell (`bash`)
- Bash derivatives: 
    - 1983: Korn shell (`ksh`), 
    - 1990: Z shell (`zsh`)
    - 2002: Dash (`dash`),  

### Why learn Bash? 

- Learning Bash means learning the roots of scripting 
- Bash, are frequently encountered on Unix systems
- Bash is the dominating command interpreter and scripting language

Shell scripts evolve naturally from a workflow: 
  1. A sequence of commands you use often are placed in a file
  2. Command-line options are introduced to enable different options to be passed to the commands
  3. Introducing variables, if tests, loops enables more complex program flow
  4. At some point pre- and postprocessing becomes too advanced for bash, at which point (parts of) the script should be ported to Python or other tools
  
In this lecture we imagine that we find ourselves working e.g. on some Linux cluster where we cannot get the admin permission to install Python modules or text editors we have available on our machines. We will try to get things done with utilities that are commonly installed by default.

### What Bash is *good* for

- File and directory management
- Systems management (build scripts)
- Combining other scripts and commands
- Rapid prototyping of more advanced scripts
- Very simple output processing, plotting etc.

### Some common tasks in Bash
- file writing and managing files and directories (creation, deletion, renaming)
- for-loops
- running an application
- combining applications (pipes)
- file globbing, testing file types

### What Bash is *not good* for

- Cross-platform portability
- Graphics, GUIs
- Interface with libraries or legacy code
- More advanced post processing and plotting
- Calculations, math etc.

## Installation

- All our examples can be run under Bash, and many in the Bourne shell
- Differences in operating systems:
    - Mac OSX**: `/bin/sh` is just a link to Bash (`/bin/bash`).
    - Ubuntu**: `/bin/sh` is a link to Dash, a minimal, but much faster shell than bash. Alternatively `/bin/bash`
    - Windows**: bash is available through `cygwin` or the Linux-Subsystem in Windows 10.  
    
**Use within jupyter notebooks**: We will use cell magic `%%bash` to run the shell commands in the notebook.

## Bash tutorial
You will see a number of Bash/Unix commands in this lecture. The new commands will be highlighted with a <font color='red'>⚠️ </font>.

In [None]:
%%bash
echo "Hello from bash"

Function is called by giving its named followed by arguments. <font color='red'>⚠️ </font> `echo` prints text to screen.

We could write the above source code into a source file, here `./scripts/hello_world.sh`

#### VIM intermezzo

To stick to our scenario of being stuck on a cluster where there is no VScode/SublimeText and what not let us use VIM for editing. VIM is a powerful text editor (i(M)proving it predecessor VI editor) - here we will only scratch 
its surface. In some sense the philosphy behind VIM is that a painter first picks his instrument (mode selection), places it on the canvas (navigation) before starting to draw (e.g. editing).

_Navigation_
`ESC` to leave the current mode. Then press
- `0` jump to line beginning
- `$` jump to line end
- `h`, `l`, `j`, `k` to move left, right, down or up
- `gg` to jump to the start of the file, or `G` to jump to the end
- `w` to jump forward a word or  `b` to move back a word

_Manipulation/Editing_
- Pressing `i` enters edit mode (you can type as you want)
- Pressing `x`, `dw`, `dd` deletes respectivel a character, word or entire line
- `ctrl+a` jumps to the end of the line and enters edit mode
- `s` (substite) deletes the character under cursor and enters edit mode
- `u` undoes
- 'v' enters visual mode
- `:w` saves the buffer to file

_Search_
- `/` enters search mode. After specifying the pattern pressing `n` will move forward to the next match, while `N` searches backward

[_Exiting_](https://stackoverflow.blog/2017/05/23/stack-overflow-helping-one-million-developers-exit-vim/)
- `:q` or `:q!` 

<center>
  <img src="figs/vim.png" style="width: 800px;"/>
</center>
   

A great reference to learn more about VIM is the book [Practical VIM: Edit Text at the Speed of Thought](https://www.amazon.com/Practical-Vim-Edit-Speed-Thought/dp/1680501275)

#### Back to Bash

In [None]:
%%bash
cat scripts/hello_world.sh

Here the lines starting with hash `#` interpreted as comments. 

Above we have used <font color='red'>⚠️ </font> `cat` command to view the file content. Later we will see that it can be used for reading and writing too. 

Now we could try to run the script only to find that we get and error

In [None]:
!./scripts/hello_world.sh

The issue is that the file is not executable. We can see this with <font color='red'>⚠️ </font> `ls` command (where we specify the "-l" flag to get a long output)

In [None]:
!ls -l ./scripts/hello_world.sh

The permisions are r(ead), w(rite), x(execute).

For fix we use the <font color='red'>⚠️ </font> `chmod` command. In particular, below we add execution permission to the user (group)  

In [None]:
%%bash
chmod u+x ./scripts/hello_world.sh
ls -l ./scripts/hello_world.sh

Now we can finally execute

In [None]:
!./scripts/hello_world.sh

Now that the code run we could ask about who actually run/interpreted it. Bash uses itself as default interpreter, if not otherwise specified. We can be explicit about the interpreter:

In [None]:
!cat scripts/hello_world_bang.sh

Observe that the first line starting with `shebang`, i.e. `#!` specifies the interpreter to use for the script. The second line, starting with the hash, `#`, is a comment. 

__Note__ We could have specified a different interpreter/shell as by giving instead the first line `/usr/bin/sh`. Let's see what sort of shell that is

In [None]:
!man /usr/bin/sh

Using the <font color='red'>⚠️ </font> `man` (manual) command we see that on this mashing `sh` points to the `dash` shell.

For convenience we will use the cell `%%bash` magic in the the rest of the lecture to write our scripts

### Variables

- Assign a variable by `var=value` (__NOTE__ no spaces around `=`!)
- Retrieve the value of the variable by `${var}` or `$var`

In [None]:
%%bash
#!/use/bin/bash

cmd=echo    # Functions can be passed around
greet="Hi"

${cmd} ${greet} world!

# Undefined variables result in empty string

${cmd} ${greet} ${world}!

There are also special variables defined in the environment. By convention their names are all uppercase. As an example, recall that when running `hello_world.sh` 
above we have specified the full path to the script. 
In particular, the following would give an error

In [None]:
!./hello_world.sh

To fix this problem, 
recall the role of `PYTHONPATH` in looking up Python modules by the Python interpreter. In fact `PYTHONPATH` is environmental variable

In [19]:
!echo $PYTHONPATH

/home/mirok/Documents/Software/gmsh-4.9.5-Linux64-sdk/lib:


Similar role is played by the environmental variable `PATH` which 
specifies directories to look for program executables.

In [18]:
!echo $PATH

/home/mirok/Documents/Software/AnaMorph/bin:/home/mirok/Documents/Software/ParaView-5.9.1-MPI-Linux-Python3.8-64bit/bin:/home/mirok/Documents/Software/visit3_2_1.linux-x86_64/bin:/home/mirok/Documents/Software/julia:/home/mirok/Documents/Software/glvis-4.1:/home/mirok/Documents/Software/gmsh-4.9.5-Linux64-sdk/bin:/home/mirok/Documents/Software/miniconda3/envs/in3110/bin:/home/mirok/Documents/Software/miniconda3/condabin:/home/mirok/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin


What we would like to do to run our script just as `hello_world.sh` is to modify the env. Consider the following 

In [None]:
%%bash 
new_PATH="$(pwd)/scripts:$PATH"
echo $new_PATH

Here we have __computed__ the value assigned to `new_PATH` by using <font color='red'>⚠️ </font> `pwd` command and building up the string. Note that we prepend to the list to get higher precedence to our directory. To update the `PATH` we could continue as follows

In [None]:
%%bash 
new_PATH="$(pwd)/scripts:$PATH"
export PATH=$new_PATH

echo $PATH

__NOTE__ There is a pitfall in each notebook cell execution is its own [process](https://stackoverflow.com/questions/67850706/unable-to-export-path-in-jupyterlab). In particular, the exported variables will not be reflected in the next (not child) processes. 

In [None]:
!echo $PATH

So we will do this outside in the terminal/in one running shell session. We can put this process to sleep by `ctrl+z`. After the setting we can bring it back to (f)ore(g)round by <font color='red'>⚠️ </font> `fg`. Alternatively, we can resume the sleeping process in the (b)ack(g)round <font color='red'>⚠️ </font> `bg`.

Some other examples of setting variables on computations


In [21]:
%%bash
weekday=$(date +"%A %Y-%m-%d %H:%M:%S")    # date +"%A" is a bash command to display the day of the week 
echo "Today is $weekday."

Today is onsdag 2022-11-02 18:33:10.


In [None]:
%%bash
# Here we just use a different syntex to get it
files=`ls ..`
echo $files

As said before command <font color='red'>⚠️ </font> `ls` lists content of a directory.

#### Typed variables

By default variables are un-typed, and treated as character arrays

In [26]:
%%bash
x=5
x=$x+5
echo $x

5+5


We can be explicit about the type of variable

In [27]:
%%bash
declare -i b     # define an integer variable b
a=5
b=$a+5
echo $b

10


Or express that the variable is constant/read-only

In [28]:
%%bash
declare -r r=10            
echo $r
r=5

10


bash: line 3: r: readonly variable


CalledProcessError: Command 'b'declare -r r=10            \necho $r\nr=5\n'' returned non-zero exit status 1.

Bash also support `array` type

In [29]:
%%bash
declare -a array=("foo" "bar") # array
echo ${array[0]}  # First array value
echo ${array[@]}  # All array values
echo ${#array}    # !!!Array size
echo ${#array[@]} # But

foo
foo bar
3
2


### Flow control and functions

For flow we shall discuss `if`, `case` and `for` and `while` loops

__`if`__ statement

In [None]:
%%bash
declare name="Joe"
# Here we are comparing 2 strings
if [ $name == "Joe" ]
then
  echo "Joseph"
else
  echo "Don't know"
fi

In [33]:
%%bash
declare -i -r number=10
# Here we are comparing numbers
if [ $number -gt 10 ]
then
  echo "The variable is greater than 10."
else
  echo "The variable is at most 10"
fi

The variable is at most 10


We can do `if`-`elif` branching and the tests can be combined with `&&` (AND) or `||` (OR). Below we also introduce [parameter expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) `{ }` to grab substrings or get length of strings and `(( )` to perform some simple arithmetic

In [84]:
%%bash
declare name="Joey"

if [ $name == "Joe" ]
then
  echo Name is Joe
fi

# AND
if [ ${name: 0:1} == "J" ] && [ ${name: -1:1} == "y" ]
then
  echo The first letter is J and last is y
# OR
elif [ ${name: 0:1} == "A" ] || [ ${#name} -eq $((2+2)) ]
then
  echo The first letter is A or name length is $((1+3))
else
  [ ${#name} -eq 5 ] && echo "Don't know for 5 char long name"
fi
# NOTE: we add this "success" expression so that ipython does not complain
# about notzero exit status
echo $name
# We could also use
exit 0

The first letter is J and last is y


<font color='red'>⚠️ </font> `exit` with status flag/number is used to indicate succesful or failed execution. 0 means success. These is a special variable which captures exit code of the preceeding call

In [91]:
%%bash
name="alex"
[ ${#name} -eq 5 ] && echo "Exec only when name 5"

if [ "$?" == "0" ]
then
  echo There was no problem
else
  echo There was a problem
fi

There was a problem


There are handy tests for existence of files/directories. For example we can check

In [71]:
%%bash

dir='scripts'

if [ -d $dir ]
then
  echo There is $dir directory
  cp -r $dir $dir.bk
  ls .             # . is a current directoy, .. is the one above
  echo
  if [ -x "$dir/hello_world.sh" ]
  then
    echo $dir contains executable
  fi
fi

There is scripts directory
cmdline_bash.ipynb
figs
lecture_old.ipynb
scripts
scripts.bk

It contains executable


Here we have used the copy command <font color='red'>⚠️ </font> `cp` with a `-r` recursive switch.

Other test switches

- `-h` FILE - True if the FILE exists and is a symbolic link.
- `-r` FILE - True if the FILE exists and is readable.
- `-w` FILE - True if the FILE exists and is writable.
- `-x` FILE - True if the FILE exists and is executable.
- `-d` FILE - True if the FILE exists and is a directory.
- `-e` FILE - True if the FILE exists and is a file, regardless of type 
- `-f` FILE - True if the FILE exists and is a regular file (not a directory or device)

__`case`__ statement

To simplify writing nested `if` statements especially if branching is a case analysis/pattern matching we use `case` construct. This will be useful e.g. for parsing command line arguments (see later)

In [83]:
%%bash
place="Oslo"
case $place in
        Oslo)
            m=4;;  # ;; indicates end of case
        Bergen)
            m=5 ;;
        *)
            m=-1
esac
echo $m

4


__`for`__ loop

Consider this setup where we run over bunch of parameters to perform a "simulation" whose result we want to store

In [96]:
%%bash
experiments="first second third"

dir=results

if [ -d $dir ]
then
  echo $dir exists
else
  mkdir $dir
fi

declare -i counter
counter=0

for e in $experiments
do
  echo running $e
  sleep 0.2
  touch $e.txt        # Touch/create empty file with that name
  cp $e.txt $dir 
  ((counter=counter+1))       # Increase the counter
done
echo Performed $counter experiments

results exists
running first
running second
running third
Performed 3 experiments


Here we have used a make directory command <font color='red'>⚠️ </font> `mkdir`, the simulation was mocked up by <font color='red'>⚠️ </font> `sleep` command which delays the execution by arg seconds and finaly the results were created by <font color='red'>⚠️ </font> `touch`.

Previus example illustrates a common situation where the tasks in the loop could execute in parallel as opposed to serial 
as done previosly. Lunching the tasks in parallel can be done with `&`

In [101]:
%%bash
experiments="first second third"

for e in $experiments
do
  sleep 1 && echo Launched $e
done

Launched first
Launched second
Launched third


In contrast the parallel execution as expected runs quicker

In [102]:
%%bash
experiments="first second third"

for e in $experiments
do
  sleep 1 && echo Launched $e&
done

Launched first
Launched second
Launched third


__`while`__ loop

Consider the task of counting lines in a file

In [98]:
%%bash
filename="./data/text.txt"
declare -i count; count=0

echo "Start counting..."
# loop over all lines of  file
while read p
do
    # increase line counter
    ((count++))
done < $filename
echo "done"

echo "Number of lines in $file: $count"

Start counting...
done
Number of lines in : 13


#### Functions

Functions are declared by `function` keyword and called with their name followed by arguments. Note that by default variables inside the function body are global

In [106]:
%%bash
myresult="Nothing"

function greet
{
    echo "greet was called"
    myresult='some value'  # Global
    insideresult="What"    # Global
}

echo $myresult
greet  # Call
echo $myresult $insideresult

Nothing
greet was called
some value What


Arguments of the function can be parsed with special accessors

In [110]:
%%bash 
function foo
{
    echo "foo called with $# arguments"  # $# is the arg count
    echo "The first one is $0" # NOTE the zero argument is not the first one from the user!
                               # $1 $2 etc  
    # Show all of them
    declare -i n; n=1
    for arg in $@; do
      echo "command-line argument no. $n is <$arg>"
      ((n++))
    done
}

foo This
echo
foo This That

foo called with 1 arguments
The first one is bash
command-line argument no. 1 is <This>

foo called with 2 arguments
The first one is bash
command-line argument no. 1 is <This>
command-line argument no. 2 is <That>


Or we can process them in an array-style

In [129]:
%%bash

function bar
{
    while [ $# -gt 0 ]
    do
        option=$1; # load arg into option
        shift;     # move $1 pointer
        case "$option" in
            -n)
                name=$1
                shift
                ;;  
            -a)
                age=$1; shift; ;;  
            *)
                echo "$0: invalid option \"$option\""; exit 1;;
        esac
    done
    echo $name is $age years old
}

bar -n "Jim"
#echo
bar -a 30 -n Ana
echo "Exit status "$?
echo
bar -a 30 -b Ana

Jim is years old
Ana is 30 years old
Exit status 0

bash: invalid option "-b"


CalledProcessError: Command 'b'\nfunction bar\n{\n    while [ $# -gt 0 ]\n    do\n        option=$1; # load arg into option\n        shift;     # move $1 pointer\n        case "$option" in\n            -n)\n                name=$1\n                shift\n                ;;  \n            -a)\n                age=$1; shift; ;;  \n            *)\n                echo "$0: invalid option \\"$option\\""; exit 1;;\n        esac\n    done\n    echo $name is $age years old\n}\n\nbar -n "Jim"\n#echo\nbar -a 30 -n Ana\necho "Exit status "$?\necho\nbar -a 30 -b Ana\n'' returned non-zero exit status 1.

### Combining bash commands

In [None]:
# Contains any executable

## File manipulation utilities

## Text manipulation utilities

## Plotting utilities