# Bash programming and Linux command-line tools

<center>
  <img src="figs/bash.png" style="width: 500px;"/>
</center>
    
In our course **IN3110 – Problemløsning med høynivå-språk** we have so far only used Python (mostly; Cython). In this lecture we will add to the family of languages as we will discuss shell(Bash) scripting. The takehome message is that (simple) scripts combined with other command-line utities can provide elegant solutions and powerful pre/processing pipelines for processing data. 

## A bit of history - there were/are many shells

- 1979: Bourne shell (`sh`)
- 1978: C and TC shell (`csh` and `tcsh`)
- 1989: Bourne Again shell (`bash`)
- Bash derivatives: 
    - 1983: Korn shell (`ksh`), 
    - 1990: Z shell (`zsh`)
    - 2002: Dash (`dash`),  

## Why learn Bash? 

- Learning Bash means learning the roots of scripting 
- Bash, are frequently encountered on Unix systems
- Bash is the dominating command interpreter and scripting language

Shell scripts evolve naturally from a workflow: 
  1. A sequence of commands you use often are placed in a file
  2. Command-line options are introduced to enable different options to be passed to the commands
  3. Introducing variables, if tests, loops enables more complex program flow
  4. At some point pre- and postprocessing becomes too advanced for bash, at which point (parts of) the script should be ported to Python or other tools
  
In this lecture we imagine that we find ourselves working e.g. on some Linux cluster where we cannot get the admin permission to install Python modules or text editors we have available on our machines. We will try to get things done with utilities that are commonly installed by default.

## What Bash is *good* for

- File and directory management
- Systems management (build scripts)
- Combining other scripts and commands
- Rapid prototyping of more advanced scripts
- Very simple output processing, plotting etc.

## Some common tasks in Bash

- file writing and managing files and directories (creation, deletion, renaming)
- for-loops
- running an application
- combining applications (pipes)
- file globbing, testing file types

## What Bash is *not* good for

- Cross-platform portability
- Graphics, GUIs
- Interface with libraries or legacy code
- More advanced post processing and plotting
- Calculations, math etc.

## Installation

- All our examples can be run under Bash, and many in the Bourne shell
- Differences in operating systems:
    - Mac OSX: `/bin/sh` is just a link to Bash (`/bin/bash`).
    - Ubuntu: `/bin/sh` is a link to Dash, a minimal, but much faster shell than bash. Alternatively `/bin/bash`
    - Windows: bash is available through `cygwin` or the Linux-Subsystem in Windows 10.  
    
**Use within jupyter notebooks**: We will use line magic `!` or cell magic `` to run the shell commands in the notebook.

## Bash tutorial
You will see a number of Bash/Unix commands in this lecture. The new commands will be highlighted with a <font color='red'>⚠️ </font>.

In [1]:
echo "Hello from bash"

Hello from bash


Function is called by giving its named followed by arguments. <font color='red'>⚠️ </font> `echo` prints text to screen.

We could write the above source code into a source file, here `./scripts/hello_world.sh`

### VIM intermezzo

To stick to our scenario of being stuck on a cluster where there is no VScode/SublimeText and what not let us use VIM for editing. VIM is a powerful text editor (i(M)proving it predecessor VI editor) - here we will only scratch 
its surface (no macros, advanced search). In some sense the philosphy behind VIM is that a painter first picks his instrument (mode selection), places it on the canvas (navigation) before starting to draw (e.g. editing).

_Navigation_
`ESC` to leave the current mode. Then press
- `0` jump to line beginning
- `$` jump to line end
- `h`, `l`, `j`, `k` to move left, right, down or up
- `gg` to jump to the start of the file, or `G` to jump to the end
- `w` to jump forward a word or  `b` to move back a word

_Manipulation/Editing_
- Pressing `i` enters edit mode (you can type as you want)
- Pressing `x`, `dw`, `dd` deletes respectivel a character, word or entire line
- Pressing NUMBER before the command in general repeats it NUMBER times
- Pressing `.` repeats the previous action
- `ctrl+a` jumps to the end of the line and enters edit mode
- `s` (substite) deletes the character under cursor and enters edit mode
- `u` undoes
- 'v' enters visual mode
- `:w` saves the buffer to file

_Search_
- `/` enters search mode. After specifying the pattern pressing `n` will move forward to the next match, while `N` searches backward

[_Exiting_](https://stackoverflow.blog/2017/05/23/stack-overflow-helping-one-million-developers-exit-vim/)
- `:q` or `:q!` 

<center>
          <img src="figs/vim.png" style="width: 800px;"/>
</center>   

A great reference to learn more about VIM is the book [Practical VIM: Edit Text at the Speed of Thought](https://www.amazon.com/Practical-Vim-Edit-Speed-Thought/dp/1680501275). 

### Back to Bash

In [2]:
cat ./scripts/hello_world.sh

#!/bin/bash
# This is a regular comment line
echo "hello world!"


Here the lines starting with hash `#` interpreted as comments. 

Above we have used <font color='red'>⚠️ </font> `cat` command to view the file content. Later we will see that it can be used for reading and writing too. 

Now we could try to run the script only to find that we get and error

In [3]:
./scripts/hello_world.sh

bash: ./scripts/hello_world.sh: Permission denied


: 126

The issue is that the file is not executable. We can see this with <font color='red'>⚠️ </font> `ls` command (where we specify the "-l" flag to get a long output)

In [4]:
ls -l ./scripts/hello_world.sh

-rw-r--r--  1 minrk  staff  65 Aug 28 13:36 ./scripts/hello_world.sh


The permisions are r(ead), w(rite), x(execute).

For fix we use the <font color='red'>⚠️ </font> `chmod` command. In particular, below we add execution permission to the user (group)  

In [5]:
chmod u+x ./scripts/hello_world.sh
ls -l ./scripts/hello_world.sh

-rwxr--r--  1 minrk  staff  65 Aug 28 13:36 [31m./scripts/hello_world.sh[39;49m[0m


Now we can finally execute

In [6]:
./scripts/hello_world.sh

hello world!


Now that the code run we could ask about who actually run/interpreted it. Bash uses itself as default interpreter, if not otherwise specified. We can be explicit about the interpreter:

In [7]:
cat scripts/hello_world_bang.sh

#!/bin/bash
# This is a regular comment line
echo "hello world!"


Observe that the first line starting with `shebang`, i.e. `#!` specifies the interpreter to use for the script. The second line, starting with the hash, `#`, is a comment. 

__Note__ We could have specified a different interpreter/shell as by giving instead the first line `/usr/bin/sh`. Let's see what sort of shell that is

In [8]:
man sh

SH(1)                       General Commands Manual                      SH(1)

NAME
     sh – POSIX-compliant command interpreter

SYNOPSIS
     sh [options]

DESCRIPTION
     sh is a POSIX-compliant command interpreter (shell).  It is implemented
     by re-execing as either bash(1), dash(1), or zsh(1) as determined by the
     symbolic link located at /private/var/select/sh.  If
     /private/var/select/sh does not exist or does not point to a valid shell,
     sh will use one of the supported shells.

FILES
     /private/var/select/sh

     $HOME/.profile

     /etc/profile

SEE ALSO
     bash(1), dash(1), ksh(1), tcsh(1), zsh(1)

macOS 13.5                     February 8, 2019                     macOS 13.5


Using the <font color='red'>⚠️ </font> `man` (manual) command we see that on this mashing `sh` points to the `dash` shell.

For convenience we will use the cell `` magic in the the rest of the lecture to write our scripts

### Variables

- Assign a variable by `var=value` (__NOTE__ no spaces around `=`!)
- Retrieve the value of the variable by `${var}` or `$var`

In [9]:
#!/usr/bin/bash

cmd=echo    # Functions can be passed around
greet="Hi"

${cmd} ${greet} world!

# Undefined variables result in empty string

${cmd} ${greet} ${world}!

Hi world!
Hi !


There are also special variables defined in the environment. By convention their names are all uppercase. As an example, recall that when running `hello_world.sh` 
above we have specified the full path to the script. 
In particular, the following would give an error

In [10]:
./hello_world.sh

bash: ./hello_world.sh: No such file or directory


: 127

To fix this problem, 
recall the role of `PYTHONPATH` in looking up Python modules by the Python interpreter. In fact `PYTHONPATH` is environmental variable

In [11]:
echo $PYTHONPATH




Similar role is played by the environmental variable `PATH` which 
specifies directories to look for program executables.

In [12]:
echo $PATH

/Users/minrk/dev/mine/git-stuff/bin:/opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/bin:/Users/minrk/.local/bin:/Users/minrk/conda/bin:/usr/local/texlive/2022/bin/universal-darwin:/Users/minrk/Dropbox/bin:/opt/homebrew/bin:/usr/local/MacGPG2/bin:/usr/local/Homebrew/bin:/usr/local/bin:/usr/X11R6/bin:/usr/X11/bin:/usr/bin:/opt/pypy/bin:/Users/minrk/Dropbox/scripts:/opt/homebrew/opt/ruby/bin:/usr/local/opt/ruby/bin:/Users/minrk/dev/mine/git-stuff/bin:/Users/minrk/conda/condabin:/Users/minrk/Dropbox/bin:/System/Cryptexes/App/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Applications/Wireshark.app/Contents/MacOS:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin


What we would like to do to run our script just as `hello_world.sh` is to modify the env. Consider the following 

In [13]:
new_PATH="$PWD/scripts:$PATH"
echo $new_PATH

/Users/minrk/dev/simula/in3110/site/lectures/command-line/scripts:/Users/minrk/dev/mine/git-stuff/bin:/opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/bin:/Users/minrk/.local/bin:/Users/minrk/conda/bin:/usr/local/texlive/2022/bin/universal-darwin:/Users/minrk/Dropbox/bin:/opt/homebrew/bin:/usr/local/MacGPG2/bin:/usr/local/Homebrew/bin:/usr/local/bin:/usr/X11R6/bin:/usr/X11/bin:/usr/bin:/opt/pypy/bin:/Users/minrk/Dropbox/scripts:/opt/homebrew/opt/ruby/bin:/usr/local/opt/ruby/bin:/Users/minrk/dev/mine/git-stuff/bin:/Users/minrk/conda/condabin:/Users/minrk/Dropbox/bin:/System/Cryptexes/App/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Applications/Wireshark.app/Contents/MacOS:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/

Here we have __computed__ the value assigned to `new_PATH` by using <font color='red'>⚠️ </font> `pwd` command and building up the string. Note that we prepend to the list to get higher precedence to our directory. To update the `PATH` we could continue as follows

In [14]:
new_PATH="$PWD/scripts:$PATH"
export PATH=$new_PATH  # PATH is set

echo $PATH

# Navigate somewhere else so that we don't get lucky
cd $HOME
# Call
echo
hello_world.sh

/Users/minrk/dev/simula/in3110/site/lectures/command-line/scripts:/Users/minrk/dev/mine/git-stuff/bin:/opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/bin:/Users/minrk/.local/bin:/Users/minrk/conda/bin:/usr/local/texlive/2022/bin/universal-darwin:/Users/minrk/Dropbox/bin:/opt/homebrew/bin:/usr/local/MacGPG2/bin:/usr/local/Homebrew/bin:/usr/local/bin:/usr/X11R6/bin:/usr/X11/bin:/usr/bin:/opt/pypy/bin:/Users/minrk/Dropbox/scripts:/opt/homebrew/opt/ruby/bin:/usr/local/opt/ruby/bin:/Users/minrk/dev/mine/git-stuff/bin:/Users/minrk/conda/condabin:/Users/minrk/Dropbox/bin:/System/Cryptexes/App/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Applications/Wireshark.app/Contents/MacOS:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/

Here we have used the command <font color='red'>⚠️ </font> `cd` to change the directory to `HOME` which is an environment variable holding the user home directory, here

In [15]:
echo $HOME

/Users/minrk


__NOTE__ There is a pitfall in each notebook cell execution is its own [process](https://stackoverflow.com/questions/67850706/unable-to-export-path-in-jupyterlab). In particular, the exported variables will not be reflected in the next (not child) processes. 

In [16]:
echo $PATH   

/Users/minrk/dev/simula/in3110/site/lectures/command-line/scripts:/Users/minrk/dev/mine/git-stuff/bin:/opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/bin:/Users/minrk/.local/bin:/Users/minrk/conda/bin:/usr/local/texlive/2022/bin/universal-darwin:/Users/minrk/Dropbox/bin:/opt/homebrew/bin:/usr/local/MacGPG2/bin:/usr/local/Homebrew/bin:/usr/local/bin:/usr/X11R6/bin:/usr/X11/bin:/usr/bin:/opt/pypy/bin:/Users/minrk/Dropbox/scripts:/opt/homebrew/opt/ruby/bin:/usr/local/opt/ruby/bin:/Users/minrk/dev/mine/git-stuff/bin:/Users/minrk/conda/condabin:/Users/minrk/Dropbox/bin:/System/Cryptexes/App/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Applications/Wireshark.app/Contents/MacOS:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/

So we will do this outside in the terminal/in one running shell session. We can put this process to sleep by `ctrl+z`. After the setting we can bring it back to (f)ore(g)round by <font color='red'>⚠️ </font> `fg`. Alternatively, we can resume the sleeping process in the (b)ack(g)round <font color='red'>⚠️ </font> `bg`.

Some other examples of setting variables on computations


In [17]:
weekday=$(date +"%A %Y-%m-%d %H:%M:%S")    # date +"%A" is a bash command to display the day of the week 
echo "Today is $weekday."

Today is Monday 2023-08-28 14:23:32.


In [18]:
# Here we just use a different syntax to get it
files=`ls ..`
echo $files

Shared benjaminrk hugo kira minrk


As said before command <font color='red'>⚠️ </font> `ls` lists content of a directory.

### Typed variables

By default variables are un-typed, and treated as character arrays

In [19]:
x=5
x=$x+5
echo $x

5+5


We can be explicit about the type of variable

In [20]:
declare -i b     # define an integer variable b
a=5
b=$a+5
echo $b

10


Or express that the variable is constant/read-only

In [21]:
declare -r r=10            
echo $r
r=5

10
bash: r: readonly variable


: 1

Bash also support `array` type

In [22]:
declare -a array=("foo" "bar") # array
echo ${array[0]}  # First array value
echo ${array[@]}  # All array values
echo ${#array}    # !!!Array size
echo ${#array[@]} # But

foo
foo bar
3
2


### Flow control and functions

For flow we shall discuss `if`, `case` and `for` and `while` loops

__`if`__ statement

In [23]:
declare name="Joe"
# Here we are comparing 2 strings
if [ $name == "Joe" ]
then
  echo "Joseph"
else
  echo "Don't know"
fi

Joseph


__Note__ `[` is not a bracket(for grouping)

In [24]:
declare -i -r number=10
# Here we are comparing numbers
if [ $number -gt 10 ]    # -eq -le
then
  echo "The variable is greater than 10."
else
  echo "The variable is at most 10"
fi

The variable is at most 10


We can do `if`-`elif` branching and the tests can be combined with `&&` (AND) or `||` (OR). Below we also introduce [parameter expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) `{ }` to grab substrings or get length of strings and `(( )` to perform some simple arithmetic

In [25]:
declare name="Blpha"

if [ $name == "Joe" ]
then
  echo Name is Joe
fi

# AND
if [ ${name: 0:1} == "J" ] && [ ${name: -1:1} == "y" ]
then
  echo The first letter is J and last is y
# OR
elif [ ${name: 0:1} == "A" ] || [ ${#name} -eq $((2+2)) ]
then
  echo The first letter is A or name length is $((1+3))
else
  [ ${#name} -eq 5 ] && echo "Don't know for 5 char long name"
fi
# NOTE: we add this "success" expression so that ipython does not complain
# about notzero exit status
echo $name
# We could also use
exit 0

Don't know for 5 char long name
Blpha
exit
Restarting Bash


<font color='red'>⚠️ </font> `exit` with status flag/number is used to indicate succesful or failed execution. 0 means success. These is a special variable which captures exit code of the preceeding call

In [26]:
name="alex"
[ ${#name} -eq 5 ] && echo "Exec only when name 5"

if [ "$?" == "0" ]
then
  echo There was no problem
else
  echo There was a problem
fi

There was a problem


There are handy tests for existence of files/directories. For example we can check

In [27]:
dir='scripts'

if [ -d $dir ]
then
  echo There is $dir directory
  cp -r $dir $dir.bk
  ls .             # . is a current directoy, .. is the one above
  echo
  if [ -x "$dir/hello_world.sh" ]
  then
    echo $dir contains executable
  fi
fi

There is scripts directory
Bash - interactive lecture.ipynb       first.txt
Bash - interactive lecture.slides.html [34mhello-world[39;49m[0m
Makefile                               hw.sh
Untitled.ipynb                         [34mresults[39;49m[0m
cmdline_bash.ipynb                     [31mrun_and_test.sh[39;49m[0m
[34mdata[39;49m[0m                                   [34mscripts[39;49m[0m
[34mfigs[39;49m[0m                                   [34mscripts.bk[39;49m[0m

scripts contains executable


Here we have used the copy command <font color='red'>⚠️ </font> `cp` with a `-r` recursive switch.

Other test switches

- `-h` FILE - True if the FILE exists and is a symbolic link.
- `-r` FILE - True if the FILE exists and is readable.
- `-w` FILE - True if the FILE exists and is writable.
- `-x` FILE - True if the FILE exists and is executable.
- `-d` FILE - True if the FILE exists and is a directory.
- `-e` FILE - True if the FILE exists and is a file, regardless of type 
- `-f` FILE - True if the FILE exists and is a regular file (not a directory or device)

__`case`__ statement

To simplify writing nested `if` statements especially if branching is a case analysis/pattern matching we use `case` construct. This will be useful e.g. for parsing command line arguments (see later)

In [28]:
place="Drammen"
case $place in
        Oslo)
            m=4;;  # ;; indicates end of case
        Bergen)
            m=5 ;;
        *)
            m=-1
esac
echo $m

-1


__`for`__ loop

Consider this setup where we run over bunch of parameters to perform a "simulation" whose result we want to store

In [None]:
experiments="first second third"

dir=results

if [ -d $dir ]
then
  echo $dir exists
else
  mkdir $dir
fi

declare -i counter
counter=0

for e in $experiments
do
  echo running $e
  sleep 0.2
  touch $e.txt        # Touch/create empty file with that name
  cp $e.txt $dir      # Back it up
  rm -vf $e.txt           # Remove the original
  ((counter=counter+1))       # Increase the counter
done
echo Performed $counter experiments

Here we have used a make directory command <font color='red'>⚠️ </font> `mkdir`, the simulation was mocked up by <font color='red'>⚠️ </font> `sleep` command which delays the execution by arg seconds and the results were created by <font color='red'>⚠️ </font> `touch`. Finally we removed the original results by <font color='red'>⚠️ </font> `rm`.

Previus example illustrates a common situation where the tasks in the loop could execute in parallel as opposed to serial 
as done previosly. Lunching the tasks in parallel can be done with `&`

In [30]:
experiments="first second third"

for e in $experiments
do
  sleep 1 && echo Launched $e
done

Launched first
Launched second
Launched third


In contrast the parallel execution as expected runs quicker

In [31]:
experiments="first second third"

for e in $experiments
do
  sleep 1 && echo Launched $e &
done

[1] 20307
[2] 20308
[3] 20310


__`while`__ loop

Consider the task of counting lines in a file

In [32]:
filename="./data/text.txt"
declare -i count; count=0

echo "Start counting..."
# loop over all lines of  file
while read p
do
    # increase line counter
    ((count++))
done < $filename
echo "done"

echo "Number of lines in $file: $count"

Start counting...
done
Number of lines in : 13


Color printing by setting terminal [properties](https://linuxcommand.org/lc3_adv_tput.php)

In [33]:
declare -i index; index=1

normal=$(tput setaf 9)

while [ $index -le 4 ]
do
    tput setaf $index          # Foreground
    tput setab $((index+1))          # Background
    echo Index is $index
    tput setaf 9   # Restore
    ((index++))
done

Launched first
Launched second
Launched third
[1]   Done                    sleep 1 && echo Launched $e
[2]-  Done                    sleep 1 && echo Launched $e
[3]+  Done                    sleep 1 && echo Launched $e
[31m[42mIndex is 1
[91m[32m[43mIndex is 2
[91m[33m[44mIndex is 3
[91m[34m[45mIndex is 4
[91m


#### Functions

Functions are declared by `function` keyword and called with their name followed by arguments. Note that by default variables inside the function body are global

In [34]:
myresult="Nothing"

function greet
{
    echo "greet was called"
    myresult='some value'  # Global
    insideresult="What"    # Global
}

echo $myresult
greet  # Call
echo $myresult $insideresult

Nothing
greet was called
some value What


Arguments of the function can be parsed with special accessors

In [35]:
function foo
{
    echo "foo called with $# arguments"  # $# is the arg count
    echo "The first one is $0" # NOTE the zero argument is not the first one from the user!
                               # $1 $2 etc  
    # Show all of them
    declare -i n; n=1
    for arg in $@; do
      echo "command-line argument no. $n is <$arg>"
      ((n++))
    done
}

foo This
echo
foo This That

foo called with 1 arguments
The first one is /opt/homebrew/bin/bash
command-line argument no. 1 is <This>

foo called with 2 arguments
The first one is /opt/homebrew/bin/bash
command-line argument no. 1 is <This>
command-line argument no. 2 is <That>


Or we can process them in an array-style

In [36]:
function bar
{
    while [ $# -gt 0 ]
    do
        option=$1; # load arg into option
        shift;     # move $1 pointer
        case "$option" in
            -n)
                name=$1
                shift
                ;;  
            -a)
                age=$1; shift; ;;  
            *)
                echo "$0: invalid option \"$option\""; exit 1;;
        esac
    done
    echo $name is $age years old
}

bar -n "Jim"
#echo
bar -a 30 -n Ana
echo "Exit status "$?
echo
# bar -a 30 -b Ana

Jim is years old
Ana is 30 years old
Exit status 0



### Combining bash commands

Unix processes uses the following three standard streams as preconnected input and output communication channels:

<center>
<!--<img src="figs/bash_process_codes.jpg" style="width: 500px;"/>-->
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Stdstreams-notitle.svg/646px-Stdstreams-notitle.svg.png" style="width: 500px;"/>
</center>

- user input is passed to the standard input `STDIN` stream
- normal information is passed to the standard output `STDOUT` stream
- error information is passed to the standard error `STDERR` stream.

The streams can be redireced

__`STDOUT` to file__

Bash redirects `>` pass `STDOUT` to a file:

```bash
./myscript.sh > myfile.txt   
```
same as above, but appends output to an existing file
```bash
./myscript.sh >> myfile.txt   
```

In [37]:
chmod u+x ./scripts/hello_world_bang.sh
./scripts/hello_world_bang.sh > ./data/foo.txt
cat ./data/foo.txt

echo

for i in {1..5}
do
    ./scripts/hello_world_bang.sh >> ./data/foo.txt
done
cat ./data/foo.txt

hello world!

hello world!
hello world!
hello world!
hello world!
hello world!
hello world!


__File to `STDIN`__
Use the `<` redirect to send a file to `STDIN`:

In [38]:
wc -w < ./data/text.txt # Count the number of words and print to STDOUT 
echo

wc -w < ./data/text.txt > ./data/word_stat.txt # Same as above, but save STDOUT output to file
wc -l < ./data/text.txt > ./data/line_stat.txt 
wc -m < ./data/text.txt > ./data/char_stat.txt 
echo

cat ./data/word_stat.txt ./data/line_stat.txt ./data/char_stat.txt

      35


      35
      13
     239


<font color='red'>⚠️ </font> `wc` prints the word(`-w`), line(`-l`) or character(`-m`) counts for a file 

You can specify which stream to redirect. `[STREAM]>`. Valid values for `STREAM` is `1` for stdout, `2` for stderr and `&` for both.

```bash
./compile_model.sh                 # stdout and stderr are displayed on the terminal
./compile_model.sh 1> out.txt      # Redirect stdout to file, same as >
./compile_model.sh 2> err.txt      # Redirect stderr to file
./compile_model.sh &> outerr.txt   # Redirect stdout and stderr to file
```

__Combining bash commands: Pipes__

<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Pipeline.svg/560px-Pipeline.svg.png
" style="width: 500px;"/>
</center>

The bash pipe `|` connects `STDOUT` of one command to `STDIN` of another. Let's look at some pipeline examples

1. _Print the file content (here single column data) in a sorted way_

In [39]:
# Look first how many
wc -l < ./data/names.txt
cat ./data/names.txt | sort

      40
aaapath
autonomy
autonomy
biscuit
charismatic
cruel
daughter
decrease
decrease
demonstrator
demonstrator
drawer
excavate
facade
joke
joke
journal
laaaandscape
letter
liability
liability
lineage
lung
magnitude
mall
man
maniac
manipulation
maximum
maximum
missile
noble
paaalace
paaaot
rank
reign
relieve
straaaeam
suggest
suggest


Note that we get a possibly a very long list. To only look at a selection we could extend the pipilene with calls to 
<font color='red'>⚠️ </font> `head`, <font color='red'>⚠️ </font> `tail` and <font color='red'>⚠️ </font> `more` which "zoom" on beginning, end or yield chunks of the text. 

In [40]:
cat ./data/names.txt | sort | head -3

aaapath
autonomy
autonomy


In [41]:
cat ./data/names.txt | sort | tail -3

straaaeam
suggest
suggest


In [42]:
# NOTE: not notebook friendly as it expects some user interaction - run in terminal
# cat ./data/names.txt | sort | more -2

2. _Introduce T junction_

Buiding on the previous example we might want to only get the count of unique words. This can be accomplised by adding  <font color='red'>⚠️ </font> `uniq` to the pipiline

In [43]:
cat ./data/names.txt | sort | uniq | wc -l

      33


However, wouldn't it be useful to have the list of unique words too? This is where <font color='red'>⚠️ </font> `tee` comes in, introducing a T junction in the pipeline redirecting the partial output to a file

In [44]:
cat ./data/names.txt | sort | uniq | tee ./data/unique_name.txt | wc -l
echo
head -2 ./data/unique_name.txt

      33

aaapath
autonomy


3. _Combine with variables_

As an example we wish to build news app. We begin by retrieving the data using <font color='red'>⚠️ </font> `curl` running in `-s` silent mode. Let's see what we work with

In [45]:
# myvar=`curl -s "https://www.nrk.no/"` 
# Fall back if net is down
cat ./data/nrk_data.txt | grep newsfeed__message-title

          <h3 class="kur-newsfeed__message-title">To pågrepet etter alvorlig voldshendelse</h3>
          <h3 class="kur-newsfeed__message-title">Fire døde etter påkjørsel ved bryllup i Madrid</h3>
          <h3 class="kur-newsfeed__message-title">To til sykehus etter eksplosjonsartet brann</h3>
          <h3 class="kur-newsfeed__message-title">Sperret av hus i Skien etter mulig voldshendelse</h3>
          <h3 class="kur-newsfeed__message-title">Fly styrtet i Victoriasjøen - 15 reddet</h3>
          <h3 class="kur-newsfeed__message-title">Drapsmistenkt nordmann fremstilles for fengsling</h3>
          <h3 class="kur-newsfeed__message-title">Norsk overvåkingstårn fikk strøm fra Russland</h3>
          <h3 class="kur-newsfeed__message-title">Mann falt i sjøen i Alver - fløyet til sykehus </h3>
          <h3 class="kur-newsfeed__message-title">Gresk avis: Tidligere statsminister og statsråder ble avlyttet</h3>
          <h3 class="kur-newsfeed__message-title">Enebolig nedbrent på Berger 

Our next step is to extract information from this text. Specifically, we are after the first headline. One possibility is to split (as in Python string) based on some delimiter and working with "fields" / elemnts of the resulting array. This is the functionality of <font color='red'>⚠️ </font>`cut -d DELIMITER -fINDEX` 

In [46]:
cat ./data/nrk_data.txt | grep newsfeed__message-title | head -1 | cut -d '>' -f2 

To pågrepet etter alvorlig voldshendelse</h3


Following the same logic we can get

In [47]:
title=`cat ./data/nrk_data.txt | grep newsfeed__message-title | head -1 | cut -d '>' -f2 | cut -d '<' -f1`
echo $title

To pågrepet etter alvorlig voldshendelse


At this point we know the basics and are in position to "glue" different programs together. We have seen a few already, e.g. `cut`, `sort`. In the following we cover a few more which could are useful in the scientific workflow.

## Text manipulation utilities - `grep`, `awk` and `sed`

### `grep` global regular expression print

Grep _searches_ input file, looks at them line by line, prints if there is a match until there are no more lines. Recall our list or words 

In [48]:
head -10 ./data/names.txt

journal
lineage
excavate
charismatic
rank
missile
biscuit
reign
letter
paaalace


By using grep we can answer questions like:

1. _Are there lines containing "ma"?__

In [49]:
grep "ma" ./data/names.txt

charismatic
magnitude
man
mall
maniac
maximum
manipulation
maximum


2. _What are the lines and line numbers containing "ma"?_ (`-n`)

In [50]:
grep -n "ma" ./data/names.txt

4:charismatic
16:magnitude
21:man
23:mall
25:maniac
26:maximum
27:manipulation
34:maximum


Or which do not (`-v` flag for lines that do not match)

In [52]:
grep -v "ma" ./data/names.txt

journal
lineage
excavate
rank
missile
biscuit
reign
letter
paaalace
paaaot
straaaeam
aaapath
laaaandscape
drawer
lung
noble
relieve
facade
daughter
cruel
suggest
decrease
demonstrator
joke
autonomy
liability
suggest
decrease
demonstrator
joke
autonomy
liability


3. _How many lines match ?_ (`-c`)

In [53]:
grep -c "ma" ./data/names.txt

8


Of course we now know that the same could have been accomplised e.g. with pipes

In [54]:
grep "ma" ./data/names.txt | wc -l

       8


There is support for regular expression in the search word. By default it is limited. 

In [55]:
# Use -l to print files containing lines with regexp nu* in them
grep -l "nu*" ../*/*.ipynb

../12-production/environments.ipynb
../12-production/sphinx-docs.ipynb
../13_scikit_learn/scikit-learn-1-presentation.ipynb
../13_scikit_learn/scikit-learn-1.ipynb
../13_scikit_learn/scikit-learn-2.ipynb
../14-julia-ml/julia_examples.ipynb
../14-julia-ml/python_examples.ipynb
../14-julia-ml/stokes_pinns.ipynb
../about/About the course.ipynb
../about/Introduction to git.ipynb
../about/Scripting vs regular programming.ipynb
../best_practices/Best practices.ipynb
../command-line/Bash - interactive lecture.ipynb
../command-line/Untitled.ipynb
../command-line/cmdline_bash.ipynb
../mixed-programming/Numba.ipynb
../mixed-programming/Profiling and Optimizing with IPython.ipynb
../mixed-programming/Untitled.ipynb
../mixed-programming/mixed_programming_cython.ipynb
../mixed-programming/mixed_programming_introduction.ipynb
../numerical-python/exercises.ipynb
../numerical-python/numerical_python.ipynb
../numerical-python/python_profiling.ipynb
../pandas/API-exercises.ipynb
../pandas/Pandas.ipynb
.

With `egrep` we have the full power

In [56]:
egrep -l "np|numpy|python|import" ../*/*.ipynb

../12-production/environments.ipynb
../12-production/sphinx-docs.ipynb
../13_scikit_learn/scikit-learn-1-presentation.ipynb
../13_scikit_learn/scikit-learn-1.ipynb
../13_scikit_learn/scikit-learn-2.ipynb
../14-julia-ml/julia_examples.ipynb
../14-julia-ml/python_examples.ipynb
../14-julia-ml/stokes_pinns.ipynb
../about/About the course.ipynb
../about/Introduction to git.ipynb
../about/Scripting vs regular programming.ipynb
../best_practices/Best practices.ipynb
../command-line/Bash - interactive lecture.ipynb
../command-line/Untitled.ipynb
../command-line/cmdline_bash.ipynb
../mixed-programming/Numba.ipynb
../mixed-programming/Profiling and Optimizing with IPython.ipynb
../mixed-programming/Untitled.ipynb
../mixed-programming/mixed_programming_cython.ipynb
../mixed-programming/mixed_programming_introduction.ipynb
../numerical-python/exercises.ipynb
../numerical-python/numerical_python.ipynb
../numerical-python/python_profiling.ipynb
../pandas/API-exercises.ipynb
../pandas/Pandas.ipynb
.

__`awk`__  is a text pattern scanning and processing language. It operates on lines of the input file which it sees as being made of fields marked by a separator. This allows to extract information and do further processing.

Let's use `awk` to extract the file permission column 

In [57]:
# Unpack this
ls -l | awk '{print $1}'

total
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
drwxr-xr-x
drwxr-xr-x
-rw-r--r--
drwxr-xr-x
-rw-r--r--
drwxr-xr-x
-rwxr-xr-x
drwxr-xr-x
drwxr-xr-x


Combined with grep we can get the total number of execubles 

In [58]:
ls -l | awk '{print $1}' | egrep -c "x." 

7


and cout their size in bytes

In [59]:
# Unpack
ls -l | awk '{print $1, $5}' | egrep "x." | awk 'BEGIN {sum=0} {sum=sum+$2} END {print sum}'

3359


Of course the delimiter can be specified. For example with a CSV file from the Pandas lecture we would work with a comma separator

In [60]:
awk -F "," '{print $1}' ./data/used_car_sales.csv | head -10

"ID"
"158856"
"165573"
"164846"
"165408"
"129794"
"165147"
"165327"
"33291"
"74172"
awk: write error on /dev/stdout
 input record number 3942, file ./data/used_car_sales.csv
 source line number 1


__`sed`__ stream editor allows us to do text transformation on the input stream, e.g. filter, perform substitutions. Here we will run with `-e` to embed `sed`.

The first usecase we consider is `sed -e 's/pattern/substitute/' file` where we run ins `s` substitution mode. `sed` with consume the stream and for each mathc on a line peform the substition.

In [61]:
sed -e 's/ma*/XXX/g' ./data/names_columns.txt

journal       XXX
lineage	      lineage	      
excavate      excavate      
charisXXXtic   charisXXXtic   
rank	      XXX
XXXissile	      XXXissile	      
biscuit	      biscuit	      
reign	      reign	      
letter	      letter	      
paaalace      paaalace      
paaaot	      paaaot	      
straaaeaXXX     straaaeaXXX     
aaapath	      aaapath	      
laaaandscape  laaaandscape  
drawer	      drawer	      
XXXgnitude     XXXgnitude     
lung	      lung	      
noble	      noble	      
relieve	      relieve	      
facade	      facade	      
XXXn	      XXXn	      
daughter      daughter      
XXXll	      XXXll	      
cruel	      cruel	      
XXXniac	      XXXniac	      
XXXxiXXXuXXX	      XXXxiXXXuXXX	      
XXXnipulation  XXXnipulation  
suggest	      suggest	      
decrease      decrease      
deXXXonstrator  deXXXonstrator  
joke	      joke	      
autonoXXXy      autonoXXXy      
liability     liability     
XXXxiXXXuXXX	      XXXxiXXXuXXX	      
suggest	      suggest	      
decrease  

Note that `/g` above stands for `greedy` execution.

We can redirect the output to a new file with 
```bash
sed -e 's/ma*/XXX/g' ./data/names_columns.txt > ./data/names_modif.txt
```
or perform the substituion in place
```bash
sed -e -i 's/ma*/XXX/g' ./data/names_columns.txt
```

The patterns can be full on regular expressions. Let's use `sed` to hide numbers from the phone book (where we pretend that all numbers have only 3 digits)

In [62]:
sed -e 's/[0-9][0-9][0-9]/xxx/g' ./data/contacts.txt

# This 
# is a 
# comment
xxx joe
xxx miro
ana
xxx peter
lucy xxx


Another usecase is to perform an action on a match. First action we will use is `p` for print. Let's print all the directories using `sed`

In [63]:
ls -l | sed -n -e '/^d/ p'

drwxr-xr-x  15 minrk  staff     480 Aug 28 13:48 data
drwxr-xr-x   9 minrk  staff     288 Aug 28 13:36 figs
drwxr-xr-x  45 minrk  staff    1440 Aug 28 13:36 hello-world
drwxr-xr-x   3 minrk  staff      96 Aug 28 13:45 results
drwxr-xr-x  14 minrk  staff     448 Aug 28 13:36 scripts
drwxr-xr-x  15 minrk  staff     480 Aug 28 14:23 scripts.bk


Another action is `d` for delete. Suppose you would like to remove all the comments (from say your python code)

In [64]:
sed -e '/^#/ d' ./data/contacts.txt
# We could redirect with > or -i for inplace

123 joe
333 miro
ana
233 peter
lucy 222


`sed` also understand line numbers so we could for example delete some 10 lines of the long CSV file

In [65]:
echo size before `ls -lrvt ./data/used_car_sales.csv | awk '{print $5}'`
sed -i -e '2,20 d' ./data/used_car_sales.csv
echo size after `ls -lrvt ./data/used_car_sales.csv | awk '{print $5}'`

size before 13055021
size after 13053080


For more information see the nice [summary](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjSha6_mZH7AhUPSPEDHa58A48QFnoECA8QAQ&url=https%3A%2F%2Fwww-users.york.ac.uk%2F~mijp1%2Fteaching%2F2nd_year_Comp_Lab%2Fguides%2Fgrep_awk_sed.pdf&usg=AOvVaw0np7_TlTZOLKf4aQk99DfX) by Matt Probert. We forgot to emphasize that <font color='red'>⚠️ </font> `grep`, `awk` and `sed` are more additions to our family of seen programs/commands.

## File manipulation utilities - `find`, `tar` and `gzip`

Assume that we have run some analysis on remote machine. When the computations are done we would like to gather the data and compress them for easier transfer.

<font color='red'>⚠️ </font> `find` visits all files in a directory tree and can execute one or more commands for every file
```bash
find source [specifiers]
```

We can specify the `name` and `type` (regular (f)ile, (d)irectory)

In [66]:
find ./scripts/ -name hello* -type f 

In case the source tree is very deep it is good idea to limit the depth of the tree traveral

In [67]:
find $HOME -maxdepth 1 -name *square* -type f 

The name specifier can combine several filters

In [68]:
# Or find all log and PDF files
find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f

/Users/minrk/.bzr.log


We can also run a command for each file:
```bash
find rootdir -name filenamespec -exec command {} \; -print
# {} is the current filename
```

Let's use this to print a more detailed info about the file

In [69]:
find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f  -exec ls -lrvt {} \;

-rw-r--r--  1 minrk  staff  15868 Sep 24  2020 /Users/minrk/.bzr.log


We can perform several actions. Below we copy `cp` the file in addition to printing some more info.  

In [70]:
find $HOME -maxdepth 1 \( -name '*.log' -o -name '*.pdf' \) -type f -size +30k  -exec ls -lrvt {} \; -exec cp  "{}" . \;
ls *.pdf

ls: *.pdf: No such file or directory


: 1

Note that we have narrowed the search by a `size` specifier. The unit above is k(ilobytes). 

Now that we can find things. Let's compress them. 

The <font color='red'>⚠️ </font>`tar` command can pack single files or  all files in a directory tree into one file, which can be unpacked later.

```bash
tar -cvf myfiles.tar mytree file1 file2
#           dest     sources
# options:
# c: pack, v: list name of files, f: pack into file

# unpack the mytree tree and the files file1 and file2:
tar -xvf myfiles.tar

# options:
# x: extract (unpack)
```

The tarfile can be compressed with <font color='red'>⚠️ </font>`gzip` 
```bash
gzip mytar.tar
# result: mytar.tar.gz
```

Let's deal with these PDFs that are lying around

In [71]:
[ -e allPDFs.tar ] && rm allPDFs.tar
[ -e allPDFs.tar.gz ] && rm allPDFs.tar.gz 

tar -cvf allPDFs.tar `find . -name '*.pdf' -print`
gzip -k allPDFs.tar
echo
ls -lrvth allPDFs.*

tar: no files or directories specified
gzip: can't stat: allPDFs.tar (allPDFs.tar): No such file or directory

ls: allPDFs.*: No such file or directory


: 1

Here we have ran `gzip` with `-k` keep flag, otherwise the tar file would be removed.

We started this section assuming the scenario that we find ourselves on some remote machine. How do we get there?

### Remote connection utilities

Here are some commands that come in handy when working with remote machines. They are all <font color='red'>⚠️ </font>

- `ping` is the machine connected?
- `ssh` to connect over SSH, `-X` or `-Y`  switch for window forwarding, i.e. graphics
```bash
ssh username@hostname
```
- `scp` secured copy, `-r` for directories
```bash
ssh username@hostname:/path/to/source destination
```
- `hostname` how is the machine called?
- `whoami` what is my user name
- `who` who else is logged in
- `ps` what are the running processes
- `top` see how much resources are used
- `which` see which executable is used when calling (`which python` - do you have the right interpreter?)

We demo most of the above commands outside in the terminal. We make one exception below to see some of the concepts discussed today in action

In [72]:
 # evalApply is name of my machine and I have SSH server runing on it 
ping google.com -c 1 &> /dev/null

if [ $? -gt 0 ]; then
    echo Internet not connected
else
    echo Internet connected
fi

Internet connected


<center>
          <img src="figs/one-ping-only.gif"" style="width: 500px;"/>
</center>

## Plotting utilities

Now that we have data we may want to do some visual exploration. One option is to [GNUPlot](http://www.gnuplot.info/docs_4.2/). Note that the program does not ship with Ubuntu by default and needs to be installed. Gnuplot offers interactive plotting (somewhat like building up the plot in ipython). It can also exacure scripts. For example, below is a rather intuite way of producing a plot from data

```bash
plot "data1_leg.txt" using 1:2 title 'L0' with linespoints lt 3 lc rgb 'red', \
     "data2_leg.txt" using 1:3 title 'L1' with linespoints
```

This can be entered on a prompt when gnuplot is running 
```bash
gnuplot
gnuplot> COMMANDS HERE
```

or if we have stored the source in a file, say `foo.txt`, we can get the plot by `gnuplot -p foo.txt`. Nice feature of GNUPlot is the ability to generate plots for LaTex. 

Note that GNUPlot is not limited to line plots, cf. the gallery of [examples](https://gnuplot.sourceforge.net/demo/)

In [None]:
./scripts/tori.gplot

<center>
  <img src="figs/tori_gnuplot.svg"" style="width: 500px;"/>
</center>