# Shell Scripting

## The Shell
* The shell is generally considered to be the interface between the user and the operating system
    * Graphical User Interface
    * Command Line Interface


## A Little History
* Shells in command line interfaces have been programmable in a limited form since at least the first UNIX shell
* The UNIX shell was completely rewritten in the late 1970s by Steve Bourne
    * A shell modeled after C was also written around this time
* UNIX isn't open source, so an open source implementation of the UNIX shell was developed, known as the Bourne again shell, or **bash**

## Shells Today
* **bash** is the default shell on most Linux operating systems as well as macOS
    * Ubuntu and Debian use a shell known as **dash** for some startup scripts
    * Korn Shell (**ksh**) and Z Shell (**zsh**) are other common Bourne-like shells
* The C shell (**csh**) is another common shell
    * The default shell on GL at UMBC is **tcsh** or Turbo C Shell

## Non-Scripting Features of Shells
* Tab Completion
* History 
    * Global (most shells)
    * Context-based (**fish**)
* Prompt Customization

## Bash
* For this class we will be using **bash**
* Even if a system does not use bash as the default shell, almost all system have it
    * This makes scripts written in **bash** very portable
* **bash** has been managed since it's creation by the GNU Project
    * Code is open source, and can be contributed to at https://git.savannah.gnu.org/cgit/bash.git
    

## Unix Utilities
* Bash scripts commonly rely on many simple programs to accomplish various tasks
* These programs are sometimes called Unix Utilities
    * Usually do only one thing
    * Most operate on STDIN and STDOUT by default
* macOS has many of these, but some are only available in the GNU Core Utils library

## Utilities You Already Use
* ls
* rm
* mv
* cp
* mkdir
* pwd

## echo
* Echo is the most commonly used command to print something to the screen
* By default, newlines and other escapes are not "translated" into the proper character
    * Use the `-e` flag to accomplish this
    * To suppress the newline at the end of echo use the `-n` flag
* Echo can take multiple arguments, and will separate them by a space by default
    * To prevent separation by a space, use the `-s` flag

In [None]:
echo "This will print as expected"
echo This will too
echo "This\ndoesn't\nhave\nnewlines"
echo -e "This\ndoesn't\nhave\nnewlines"

## cat
* cat is used to con**cat**enate files together
* It is also used by lazy programmers (me included) to display the contents of a file to a screen, but usually there are better utilities for that
    * less
    * more


In [None]:
cat anchored.pl

In [None]:
cat -n anchored.pl

In [None]:
cat anchored.pl unanchored.pl

## sort
* sort sorts the lines of a file! 
* By default this is done lexicographically 
    * By using flags you can sort by numbers, months, etc.
* The `-r` flag will sort in revers order
* By using the `-u` flag, each unique line will be printed only once

In [None]:
sort to_sort1.txt

In [None]:
sort -n to_sort2.txt

In [None]:
sort -nu to_sort2.txt

In [None]:
sort -nur to_sort2.txt

In [None]:
sort -n to_sort3.txt

In [None]:
sort -h to_sort3.txt

## uniq
* `uniq` in its default form accomplishes the same as `sort -u`
* Input to `uniq` is assumed to be sorted already
* `uniq` is useful to:
    * Count the number of times each unique line occurs
    * Ignore case when comparing lines
    * Only compare the first N characters of a line

In [None]:
sort -n to_sort2.txt | uniq -c

In [None]:
sort -n to_sort2.txt | uniq -c

In [None]:
sort to_sort4.txt | uniq -c

In [None]:
sort to_sort4.txt | uniq -c -w1

## shuf
* `shuf` randomly permutes the lines of a file
* This is extremely useful in preparing datasets

In [None]:
shuf to_sort4.txt

## head & tail
* The `head` and `tail` commands display the first 10 or last 10 lines of a file by default
    * You can change the number of lines displayed using the `-n` option
    * The value passed to `-n` when using `head` can be negative. This means return everything but the last n lines

In [None]:
cat to_sort3.txt

In [None]:
head -n1 to_sort3.txt

In [None]:
tail -n1 to_sort3.txt

In [None]:
head -n-1 to_sort3.txt

## cut
* The cut command extracts columns from a file containing a dataset
* By default the delimiter used is a tab
    * Use the `-d` argument to change the delimiter
* To specify which columns to return, use the `-f` argument    

In [None]:
cut -f1 regex_starter_code/food_facts.tsv | head

In [None]:
cut -f1 -d, regex_starter_code/states.csv | head

## paste
* `paste` does the opposite of `cut`
* Each line of every file is concatenated together, separated by a tab by default
    * Use the `-d` flag to change the delmiter

In [None]:
paste to_sort1.txt to_sort2.txt

In [None]:
paste -d, to_sort1.txt to_sort2.txt

## find
* `find` is like an extremely powerful version of `ls`
* By default, `find` will list all the files under a directory passed as an argument
    * Numerous tests can be passed to find as arguments and used to filter the list that is returned 

In [None]:
find . | head

In [None]:
find . -type d | head

In [None]:
find . -maxdepth 1 -type d 

In [None]:
find . -name "*ipynb"

## wc
* In some cases, it is convenient to know basic statistics about a file
* The `wc` or word count command returns the number of lines, words, and characters in a file
    * To only print ones of these, use the `-l`, `-w` or `-m` flags respectively 

In [None]:
wc to_sort1.txt

In [None]:
wc -l to_sort1.txt

## Other Helpful Utilities
* arch
* uname
* whoami
* yes

## Shell Script Setup
* A shell script in the simplest form is just a list of commands to execute in sequence
* Is run using sh (or bash if you are not sure what shell you are in) script_file

In [None]:
bash hello_simple.sh

## Shebang Line
* On UNIX-like systems, if the first line of a file starts with `#!`, that line indicates which program to use to run the file
* Can be used with most any interpreted language
* Must be the full path of the command
```bash
#!/bin/bash
#!/bin/python
#!/bin/perl
```
* File must be executable

```chmod +x FILE```

In [None]:
./hello.sh

## Variables
* Variables in bash can hold either scalar or array
    * Arrays are constructed using parentheses ()
* To initialize a variable, use the equals sign **with no spaces**

## Declaring Variables Examples

In [None]:
a_scalar=UMBC
another_scalar="This needs quotes"
more_scalars=40
even_more=3.14
an_array=(letters "s p a c e s" 1.0)
#Don't do this
bad= "not what you want"

## Accessing Variables
* To access a variable a dollar sign (**$**) must be prepended to its name
* To access an array element, the variable name and index must occur inside of curly braces (**{}**)
    * Scalar values can be accessed this way to, but it is optional

## Accessing Variables Examples

In [None]:
echo $a_scalar

In [None]:
echo ${a_scalar}

In [None]:
echo $more_scalars

In [None]:
echo $even_more

In [None]:
echo ${an_array[1]}

In [None]:
#Don't Do This
echo $an_array

In [None]:
echo ${an_array[@]}

In [None]:
echo ${an_array[*]}

## String Interpolation
* Variables will be interpolated into strings when double quotes are used
    * If there are spaces, curly braces aren't needed, but its a good habit

In [None]:
echo 'This class is at ${a_scalar}'

In [None]:
echo "This class is at $a_scalar"

In [None]:
echo "The schools website is www.$a_scalar.edu"

In [None]:
echo "The athletics website is www.$a_scalarretrievers.com"

In [None]:
echo "The athletics website is www.${a_scalar}retrievers.com"

## String Operations
* Bash has numerous built in string operators allowing for
    * Accessing the length (**\${#string}**)
    * Accessing a substring (**\${#string:pos}**)
    * Performing a search and replace on a substring (**\${#string/pattern/substitution}**)
    * Removing substrings

## String Operation Examples

In [None]:
echo ${a_scalar} ${#a_scalar}

In [None]:
echo ${a_scalar} ${a_scalar:1}
echo ${a_scalar} ${a_scalar:2:2}
echo ${a_scalar} ${a_scalar::2}

In [None]:
echo ${a_scalar} ${a_scalar/U/u}
echo ${a_scalar} ${a_scalar/V/u}
echo ${another_scalar} ${another_scalar/e/x}
echo ${another_scalar} ${another_scalar//e/x}
echo ${another_scalar} ${another_scalar//[a-z]/x}

In [None]:
#From the front of the string
echo ${another_scalar} "->" ${another_scalar#T*s}
#Longest possible match
echo ${another_scalar} "->" ${another_scalar##T*s}

#From the back of the string
echo ${another_scalar} "->" ${another_scalar%e*s}
#Longest possible match
echo ${another_scalar} "->" ${another_scalar%%e*s}

## Default Values
* Bash also allows default values to be used when the variable is **accessed**
    * Can either use just for that statement
    * Or set to be default for all future statements

## Default Value Examples

In [None]:
an_empty_var= 
echo "1." $an_empty_var
echo "2." ${an_empty_var:-Default}
echo "3." $an_empty_var
echo "4." ${an_empty_var:=Default}
echo "5." $an_empty_var

## Environmental Variables
* Environmental Variables are global variables in the widest sense
    * Used by all processes in the system for a user
    * Often set in initialization scripts or during boot
* Shells may modify but more often than not simply access them
* By convention, environmental variables are written in all uppercase letters

## Environmental Variable Examples

In [None]:
echo "Your home dir is: $HOME"
echo "You are logged into: $HOSTNAME"

echo "Your shell is: $SHELL"
echo "Your path is: $PATH"
echo "Your terminal is set to: $TERM"

## Command Line Arguments
* Command line arguments are placed in the special variables \$1 through \$9
    + You can have more arguments, but they need to be accessed like \${10}
* The name of the script being executed in stored in \$0
* The number of arguments is stored in \$#

## Command Line Argument Examples

In [None]:
cat cla_examples.sh

In [None]:
./cla_examples.sh --some-flag a_path additional_options

## Special Variables
* bash uses many other special variables to refer to convenient values to have
    * \$\$ is the process id of the currently executing script
    * \$PPID is the process id of the process that the script was launched from
    * \$? is the status of the last command executed

In [None]:
echo "Process ID (PID) is: $$"
echo "Parent PID (PPID) is: $PPID"
whoami
echo "Status of last command: $?"
