# Shell Scripting

## The Shell
* The shell is generally considered to be the interface between the user and the operating system
    * Graphical User Interface
    * Command Line Interface


## A Little History
* Shells in command line interfaces have been programmable in a limited form since at least the first UNIX shell
* The UNIX shell was completely rewritten in the late 1970s by Steve Bourne
    * A shell modeled after C was also written around this time
* UNIX isn't open source, so an open source implementation of the UNIX shell was developed, known as the Bourne again shell, or **bash**

## Shells Today
* **bash** is the default shell on most Linux operating systems as well as macOS
    * Ubuntu and Debian use a shell known as **dash** for some startup scripts
    * Korn Shell (**ksh**) and Z Shell (**zsh**) are other common Bourne-like shells
* The C shell (**csh**) is another common shell
    * The default shell on GL at UMBC is **tcsh** or Turbo C Shell

## Non-Scripting Features of Shells
* Tab Completion
* History 
    * Global (most shells)
    * Context-based (**fish**)
* Prompt Customization

## Bash
* For this class we will be using **bash**
* Even if a system does not use bash as the default shell, almost all system have it
    * This makes scripts written in **bash** very portable
* **bash** has been managed since it's creation by the GNU Project
    * Code is open source, and can be contributed to at https://git.savannah.gnu.org/cgit/bash.git
    

## Unix Utilities
* Bash scripts commonly rely on many simple programs to accomplish various tasks
* These programs are sometimes called Unix Utilities
    * Usually do only one thing
    * Most operate on STDIN and STDOUT by default
* macOS has many of these, but some are only available in the GNU Core Utils library

## Utilities You Already Use
* ls
* rm
* mv
* cp
* mkdir
* pwd

## echo
* Echo is the most commonly used command to print something to the screen
* By default, newlines and other escapes are not "translated" into the proper character
    * Use the `-e` flag to accomplish this
    * To suppress the newline at the end of echo use the `-n` flag
* Echo can take multiple arguments, and will separate them by a space by default
    * To prevent separation by a space, use the `-s` flag

In [12]:
echo "This will print as expected"
echo This will too
echo "This\ndoesn't\nhave\nnewlines"
echo -e "This\ndoesn't\nhave\nnewlines"

This will print as expected
This will too
This\ndoesn't\nhave\nnewlines
This
doesn't
have
newlines


## cat
* cat is used to con**cat**enate files together
* It is also used by lazy programmers (me included) to display the contents of a file to a screen, but usually there are better utilities for that
    * less
    * more


In [13]:
cat anchored.pl

$string = "This is a string";
for (my $i=0; $i <= 10000000; $i++) {
	if($string =~ /^not/){
		$found = 1;
	}
}
print $found;


In [14]:
cat -n anchored.pl

     1	$string = "This is a string";
     2	for (my $i=0; $i <= 10000000; $i++) {
     3		if($string =~ /^not/){
     4			$found = 1;
     5		}
     6	}
     7	print $found;


In [15]:
cat anchored.pl unanchored.pl

$string = "This is a string";
for (my $i=0; $i <= 10000000; $i++) {
	if($string =~ /^not/){
		$found = 1;
	}
}
print $found;
$string = "This is a string";
for (my $i=0; $i <= 10000000; $i++) {
	if($string =~ /not/){
		$found = 1;
	}
}
print $found;


## sort
* sort sorts the lines of a file! 
* By default this is done lexicographically 
    * By using flags you can sort by numbers, months, etc.
* The `-r` flag will sort in revers order
* By using the `-u` flag, each unique line will be printed only once

In [16]:
sort to_sort1.txt


A
D
F
G
H
J
K
L
S


In [19]:
sort -n to_sort2.txt


0
1
2
2
3
4
4
5
6
6
7


In [25]:
#sort -nu to_sort2.txt
cat -n to_sort2.txt | sort -nu --key=2

    10	0
     6	1
     5	2
     7	3
     1	4
     2	5
     3	6
     4	7


In [26]:
sort -nur to_sort2.txt

7
6
5
4
3
2
1
0


In [27]:
sort -n to_sort3.txt

1B
2G
4K
100MB


In [28]:
sort -h to_sort3.txt

1B
4K
100MB
2G


## uniq
* `uniq` in its default form accomplishes the same as `sort -u`
* Input to `uniq` is assumed to be sorted already
* `uniq` is useful to:
    * Count the number of times each unique line occurs
    * Ignore case when comparing lines
    * Only compare the first N characters of a line

In [29]:
sort -n to_sort2.txt | uniq -c

      1 
      1 0
      1 1
      2 2
      1 3
      2 4
      1 5
      2 6
      1 7


In [30]:
sort -n to_sort2.txt | uniq -c

      1 
      1 0
      1 1
      2 2
      1 3
      2 4
      1 5
      2 6
      1 7


In [31]:
sort to_sort4.txt | uniq -c

      1 a 
      1 can
      2 count
      1 for 
      1 hack
      1 is 
      1 It 
      1 nice
      1 NLP 
      1 processing
      1 to 
      1 uniq
      1 using
      1 We 
      1 words
      1 words 


In [32]:
sort to_sort4.txt | uniq -c -w1

      1 a 
      3 can
      1 for 
      1 hack
      1 is 
      1 It 
      1 nice
      1 NLP 
      1 processing
      1 to 
      2 uniq
      1 We 
      2 words


## shuf
* `shuf` randomly permutes the lines of a file
* This is extremely useful in preparing datasets

In [37]:
shuf to_sort4.txt

count
a 
using
words 
nice
for 
uniq
words
processing
to 
is 
count
can
hack
We 
It 
NLP 


## head & tail
* The `head` and `tail` commands display the first 10 or last 10 lines of a file by default
    * You can change the number of lines displayed using the `-n` option
    * The value passed to `-n` when using `head` can be negative. This means return everything but the last n lines

In [38]:
cat to_sort3.txt

100MB
4K
2G
1B


In [39]:
head -n1 to_sort3.txt

100MB


In [40]:
tail -n1 to_sort3.txt

1B


In [41]:
head -n-1 to_sort3.txt

100MB
4K
2G


## cut
* The cut command extracts columns from a file containing a dataset
* By default the delimiter used is a tab
    * Use the `-d` argument to change the delimiter
* To specify which columns to return, use the `-f` argument    

In [1]:
#head regex_starter_code/food_facts.tsv
cut -f1,2 regex_starter_code/food_facts.tsv | head

Juicy Juice Apple	08/10/2015
Malt O Meal Cereal Frosted Flakes	04/14/2017
Orange Juice	08/12/2017
Salt and pepper pistachios	03/05/2017
Pasta Sauce, Four Cheese	04/05/2017
Sunny fruit 	09/19/2016
Rich & Creamy Lowfat Half and Half	03/09/2017
Chocolate Covered Peppermint Pattie	03/08/2016
Confit de dinde	12/08/2016
Light Ice Cream, Chocolate	03/09/2017


In [6]:
cut -f1-4,10 -d, regex_starter_code/states.csv | head
#man cut

Wyoming,07/10/1890,WY,Cheyenne,America/Denver
Wisconsin,05/29/1848,WI,Madison,America/Chicago
West Virginia,06/20/1863,WV,Charleston,America/New York
Virginia,06/25/1788,VA,Richmond,America/New York
Vermont,03/04/1791,VT,Montpelier,America/New York
Utah,01/04/1896,UT,Salt Lake City,America/Denver
Texas,12/29/1845,TX,Austin,America/Chicago
Tennessee,06/01/1796,TN,Nashville,America/New York
South Dakota,11/02/1889,SD,Pierre,America/Chicago
South Carolina,05/23/1788,SC,Columbia,America/New York


## paste
* `paste` does the opposite of `cut`
* Each line of every file is concatenated together, separated by a tab by default
    * Use the `-d` flag to change the delmiter

In [7]:
paste to_sort1.txt to_sort2.txt

A	4
S	5
D	6
F	7
G	2
H	1
J	3
K	4
L	6
	0
	2
	


In [8]:
paste -d, to_sort1.txt to_sort2.txt

A,4
S,5
D,6
F,7
G,2
H,1
J,3
K,4
L,6
,0
,2
,


## find
* `find` is like an extremely powerful version of `ls`
* By default, `find` will list all the files under a directory passed as an argument
    * Numerous tests can be passed to find as arguments and used to filter the list that is returned 

In [9]:
find . | head

.
./fb_verify.png
./re_example.pl
./hello_simple.sh
./registers.png
./to_sort3.txt
./hello.sh
./airline_tweets.tsv
./Lecture01.html
./433Fall17
find: `standard output': Broken pipe
find: write error


In [10]:
find . -type d | head

.
./433Fall17
./433Fall17/.git
./433Fall17/.git/info
./433Fall17/.git/logs
./433Fall17/.git/logs/refs
./433Fall17/.git/logs/refs/remotes
./433Fall17/.git/logs/refs/remotes/origin
./433Fall17/.git/logs/refs/heads
./433Fall17/.git/objects


In [11]:
find . -maxdepth 1 -type d 

.
./433Fall17
./regex_starter_code
./.git
./.ipynb_checkpoints


In [12]:
find . -name "*ipynb"

./433Fall17/Lecture01.ipynb
./433Fall17/Lecture02.ipynb
./433Fall17/Lecture00.ipynb
./433Fall17/Lecture03.ipynb
./Lecture01.ipynb
./Lecture02.ipynb
./Untitled.ipynb
./Lecture04.ipynb
./Lecture00.ipynb
./.ipynb_checkpoints/Lecture04-checkpoint.ipynb
./.ipynb_checkpoints/Lecture02-checkpoint.ipynb
./.ipynb_checkpoints/Lecture00-checkpoint.ipynb
./.ipynb_checkpoints/Lecture03-checkpoint.ipynb
./.ipynb_checkpoints/Lecture01-checkpoint.ipynb
./.ipynb_checkpoints/Untitled-checkpoint.ipynb
./Lecture03.ipynb


## wc
* In some cases, it is convenient to know basic statistics about a file
* The `wc` or word count command returns the number of lines, words, and characters in a file
    * To only print ones of these, use the `-l`, `-w` or `-m` flags respectively 

In [13]:
wc to_sort1.txt

10  9 19 to_sort1.txt


In [20]:
wc -l to_sort1.txt

10 to_sort1.txt


## Other Helpful Utilities
* arch
* uname
* whoami
* yes

## Shell Script Setup
* A shell script in the simplest form is just a list of commands to execute in sequence
* Is run using sh (or bash if you are not sure what shell you are in) script_file

In [21]:
bash hello_simple.sh

Hello World


## Shebang Line
* On UNIX-like systems, if the first line of a file starts with `#!`, that line indicates which program to use to run the file
* Can be used with most any interpreted language
* Must be the full path of the command
```bash
#!/bin/bash
#!/bin/python
#!/bin/perl
```
* File must be executable

```chmod +x FILE```

In [22]:
./hello.sh

Hello World


## Variables
* Variables in bash can hold either scalar or array
    * Arrays are constructed using parentheses ()
* To initialize a variable, use the equals sign **with no spaces**

## Declaring Variables Examples

In [25]:
a_scalar=UMBC
another_scalar="This needs quotes"
more_scalars=40
even_more=3.14
an_array=(letters "s p a c e s" 1.0)
#Don't do this
bad= "not what you want"

not what you want: command not found


: 127

## Accessing Variables
* To access a variable a dollar sign (**$**) must be prepended to its name
* To access an array element, the variable name and index must occur inside of curly braces (**{}**)
    * Scalar values can be accessed this way to, but it is optional

## Accessing Variables Examples

In [26]:
echo $a_scalar

UMBC


In [27]:
echo ${a_scalar}

UMBC


In [29]:
echo $more_scalars

40


In [30]:
echo $even_more

3.14


In [31]:
echo ${an_array[1]}

s p a c e s


In [32]:
#Don't Do This
echo $an_array

letters


In [39]:
echo ${an_array[1]}

s p a c e s


In [None]:
echo ${an_array[*]}

## String Interpolation
* Variables will be interpolated into strings when double quotes are used
    * If there are spaces, curly braces aren't needed, but its a good habit

In [40]:
echo 'This class is at ${a_scalar}'

This class is at ${a_scalar}


In [41]:
echo "This class is at $a_scalar"

This class is at UMBC


In [42]:
echo "The schools website is www.$a_scalar.edu"

The schools website is www.UMBC.edu


In [43]:
echo "The athletics website is www.$a_scalarretrievers.com"

The athletics website is www..com


In [44]:
echo "The athletics website is www.${a_scalar}retrievers.com"

The athletics website is www.UMBCretrievers.com


## String Operations
* Bash has numerous built in string operators allowing for
    * Accessing the length (**\${#string}**)
    * Accessing a substring (**\${#string:pos}**)
    * Performing a search and replace on a substring (**\${#string/pattern/substitution}**)
    * Removing substrings

## String Operation Examples

In [45]:
echo ${a_scalar} ${#a_scalar}

UMBC 4


In [46]:
echo ${a_scalar} ${a_scalar:1}
echo ${a_scalar} ${a_scalar:2:2}
echo ${a_scalar} ${a_scalar::2}

UMBC MBC
UMBC BC
UMBC UM


In [47]:
echo ${a_scalar} ${a_scalar/U/u}
echo ${a_scalar} ${a_scalar/V/u}
echo ${another_scalar} ${another_scalar/e/x}
echo ${another_scalar} ${another_scalar//e/x}
echo ${another_scalar} ${another_scalar//[a-z]/x}

UMBC uMBC
UMBC UMBC
This needs quotes This nxeds quotes
This needs quotes This nxxds quotxs
This needs quotes xxxx xxxxx xxxxxx


In [48]:
#From the front of the string
echo ${another_scalar} "->" ${another_scalar#T*s}
#Longest possible match
echo ${another_scalar} "->" ${another_scalar##T*s}

#From the back of the string
echo ${another_scalar} "->" ${another_scalar%e*s}
#Longest possible match
echo ${another_scalar} "->" ${another_scalar%%e*s}

This needs quotes -> needs quotes
This needs quotes ->
This needs quotes -> This needs quot
This needs quotes -> This n


## Default Values
* Bash also allows default values to be used when the variable is **accessed**
    * Can either use just for that statement
    * Or set to be default for all future statements

## Default Value Examples

In [51]:
an_empty_var= 
echo "1." $an_empty_var
echo "2." ${an_empty_var:-Default}
echo "3." $an_empty_var
echo "4." ${an_empty_var:=Default}
echo "5." $an_empty_var

1.
2. Default
3.
4. Default
5. Default


## Environmental Variables
* Environmental Variables are global variables in the widest sense
    * Used by all processes in the system for a user
    * Often set in initialization scripts or during boot
* Shells may modify but more often than not simply access them
* By convention, environmental variables are written in all uppercase letters

## Environmental Variable Examples

In [52]:
echo "Your home dir is: $HOME"
echo "You are logged into: $HOSTNAME"

echo "Your shell is: $SHELL"
echo "Your path is: $PATH"
echo "Your terminal is set to: $TERM"

Your home dir is: /home/bryan
You are logged into: janus
Your shell is: /usr/bin/fish
Your path is: /usr/local/bin:/home/bryan/perl5/bin:/home/bryan/p5-Devel-IPerl/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
Your terminal is set to: xterm-256color


## Command Line Arguments
* Command line arguments are placed in the special variables \$1 through \$9
    + You can have more arguments, but they need to be accessed like \${10}
* The name of the script being executed in stored in \$0
* The number of arguments is stored in \$#

## Command Line Argument Examples

In [53]:
cat cla_examples.sh

#!/bin/bash
echo "The name of the file is $0"
echo "You passed $# arguments"

echo "The first argument is $1"
echo "The second argument is $2"

echo "All the arguments are $@"


In [57]:
./cla_examples.sh --some-flag a_path additional_options another_one

The name of the file is ./cla_examples.sh
You passed 4 arguments
The first argument is --some-flag
The second argument is a_path
All the arguments are --some-flag a_path additional_options another_one


## Special Variables
* bash uses many other special variables to refer to convenient values to have
    * \$\$ is the process id of the currently executing script
    * \$PPID is the process id of the process that the script was launched from
    * \$? is the status of the last command executed

In [58]:
echo "Process ID (PID) is: $$"
echo "Parent PID (PPID) is: $PPID"
whoami
echo "Status of last command: $?"


Process ID (PID) is: 19328
Parent PID (PPID) is: 19318
bryan
Status of last command: 0
