# Shell scripts

This is a tutorial on Shell Command Language (shell scripts) for the [KIPAC computing boot camp](http://kipac.github.io/BootCamp).

Author: [Yao-Yuan Mao](http://yymao.github.io), [Chris Davis](chris.pa.davis@gmail.com)

Everything we type into the unix/linux command line interface is interpreted as "shell script", or "shell command language". There are [many different implementations](https://en.wikipedia.org/wiki/Comparison_of_command_shells#General_characteristics). Two of the most common ones are `bash` and `(t)csh`. 

Apparently (sadly?), despite some limited common features, different implementations would have differnt syntax, 

## Define and set variables

Some common features:

- To _call_ a variable, always start with the dollar sign (&#36;). For example, `$x` or `${x}`.
- To _set_ a variable, the dollar sign is not needed. 
- Variable names are case-sensitive.

In [None]:
#bash

x=1
echo $x

In [None]:
#csh 

set x 2
echo $x

You can also set variable content to string. Note the different quotation marks used here. When double quote is used, the variables inside the quotation would be expanded. 

In [None]:
#bash

x='world'
y="hello $x"
z='hello $x'

echo $y
echo $z

In [None]:
#csh

set x 'world'
set y "hello $x"
set z 'hello $x'

echo $y
echo $z

These variables are local variables. That means they are not accessbile from other subshells. 

To make them "enviornment variables", you have to "export" them.

In [None]:
#bash

export X="abc"

In [None]:
#csh

setenv X "abc"

## appending variables

sometimes we want to append or prepend variables. This works the same way in both csh and bash. Here is an example how:

In [None]:
#bash

export x=world
echo $x
export x="hello ${x}"
echo $x

## PATH

Many programs use specific shell variables. For example, when you type a command, the shell looks through the PATH variable for directories where the command might be located. It will stop when it finds the command. Therefore, if you write some new executable in a new directory, and you want to call it elsewhere, you need to add that directory to the PATH variable.

### Exercise

Add the wget and tar executables from exercise 1 to your PATH. Instead of breaking our old wget and tar variables, rename them to wget_bootcamp and tar_bootcamp first

In [None]:
#bash

export PATH=$PATH:/path/to/unix/directory

## startup scripts

One place where you have to use shell scripts is the start script. The start script is usually called:

    .bashrc
    .bash_profile
    .cshrc
    .login
    .profile

and it sits in your "home directory". 

**Reminder**: To go to your home directory, type `cd ~`.

In most cases you want to put the script that sets enviornment variables in the startup script.

If you modify a startup script and want to see the results of that modification without opening a new window, you can `source` the file, e.g. `source .bashrc`. **Note**: Startup scripts are often sourced in a specific order. If you source out of order, you might get a different result from what you would get if you just created a new window!


### Exercise: Edit your your startup script so that you can use wget_bootcamp in a new window


## alias

Another useful thing to add in the startup script is alias, with which you can create shortcut to some commonly used commands.

Below, I have created an alias `ll` which will display a directory's conents in long form, including permissions, size, and last time modified.

In [None]:
#bash
alias ll='ls -l'

In [None]:
#csh
alias ll 'ls -l'

## functions

Functions are like more customizable aliases. You can run multiple commands at once, and can take in arguments using numbers like `${1}`. Here is an example function that will compile a latex file, run bibtex, and then run xelatex a couple more times for things like table of contents. If you are in a directory and want to compile a file named `my_paper.tex`, you would write:

    CompileLatex my_paper

You can also specify default arguments using := . Here I have said that, if no argument is provided, use main. Thus,

    CompileLatex
  
by itself will look for `main.tex` and compile that.

#### Sadly, csh does not support functions. Instead, you must write extended aliases.

In [None]:
#bash

function CompileLatex()
{
    latex -file-line-error -interaction=nonstopmode ${1:=main}.tex
    bibtex ${1:=main}.aux
    latex -file-line-error -interaction=nonstopmode ${1:=main}.tex
    latex -file-line-error -interaction=nonstopmode ${1:=main}.tex
}

You can also specify multiple arguments using `@`. Below I give an example that will download files from a directory on the sherlock computing cluster specified by `${1}` to local directory `${2}` using flags (if any) specified from the third argument on:

In [None]:
#bash

function downsherlock() { rsync -rav ${@:3} cpd@sherlock.stanford.edu:${1} ${2} ;}

### Exercise

As Stanford affiliates, we have access to the Stanford computing clusters. Thus, we can access the rye01 computer via ssh with:

    ssh cpd@rye01.stanford.edu

Write a function `rye` which lets you specify which rye computer you want to log on to, but specifies 01 as the default.

### hint:

Sometimes macs have trouble sshing onto rye. In that case, I have found adding the following commands helps:

    ssh -K -o GSSAPIKeyExchange=no cpd@rye01.stanford.edu
    
I _think_ this has been fixed.

In [None]:
#bash

function rye(){ ssh cpd@rye${1:=01}.stanford.edu ;}

## scripting in bash or csh

Sometimes you need to make a bunch of directories for output of files. You have learned how to make directories with `mkdir`, but maybe you want to automate the scheme. My advice is this:

### If you need to script in bash or csh, script in python instead!

There is a very simple reason for this: python is simply a much more flexible language. In particular, manipulation of strings in shell is a huge pain, but is very easy in python. Shell will have _beautiful_ one line answers to a lot of simple manipulations (see the previous Unix session), but more complicated tasks tend to be quite cumbersome to implement in shell.

I have found that the following packages are really useful:

- os
- sys
- glob
- subprocess

Any python script you write can always (after appropriate chmod) be executed from shell. Just put

    #!/usr/bin/env python

at the head of your script, and you should be able to just type the name of the file and execute it. If you get `permission denied`, then you just need to change the permissions for the file (remember how from Unix (1)? `chmod u+x your_file`).

Below I show a couple examples of things you can do in python to make shell scripting easier.

In [None]:
# example python function for making a set of directories:

def check_make(path_check):
    from os import path, makedirs
    """
    Convenience routine to avoid that annoying 'can't make directory; already
    present!' error.
    """
    if not path.exists(path_check):
        makedirs(path_check)

In [None]:
# find all the names in directory that begin with uppercase A

def find_A(directory):
    from glob import glob
    return glob(directory + '/A*')

In [None]:
# call to the shell a command in python
from subprocess import call

# spaces in command correspond to entries in python list
# NOTE that all entries must be strings!
call(['echo', '"hello world"'])

### Exercise

In this folder are two log files: `Unix_2_bsub_00243636.log` and `Unix_2_bsub_00239721.log`. The 8 digits in the file name represent the exposure number. Write a python script that extracts the Max Memory and writes a file with two columns separated by a comma: the exposure number, and the maximum memory.

In [None]:
import glob
import os

# find the files
files = glob.glob('/Users/cpd/Desktop/BootCamp/Unix/Unix_2_bsub_*.log')

# extract their max memory
max_memories = []
exposure_numbers = []
for f in files:
    # extract the exposure_number
    file_name = f.split('/')[-1]
    exposure_number = int(file_name.split('.log')[0].split('_')[3])
    exposure_numbers.append(exposure_number)
    
    # now read the file and extract the value.
    # We note that the Max Memory column always looks like:
    #    Max Memory :                                 
    with open(f) as opened_file:
        # we know it's near the end of the logfile, so go through the file from the back
        # note that opened_file.readlines() returns a list of ALL the lines in the file.
        found_max_memory = False
        for file_line in reversed(opened_file.readlines()):
            if found_max_memory:
                break
            if 'Max Memory :' in file_line:
                # now we have to find the number. We note that the right side is always something like "4031 MB"
                max_memory = int(file_line.split(' ')[-2])
                max_memories.append(max_memory)
                found_max_memory = True
                
# write the file
output_file = '/Users/cpd/Desktop/Bootcamp/Unix/Unix_2_bsub_max_memory.csv'
# remove it if it already exists
if os.path.exists(output_file):
    os.path.remove(output_file)
with open(output_file, 'w') as opened_file:
    header = 'exposure,max_memory\n'
    opened_file.write(header)
    
    for exposure_number, max_memory in zip(exposure_numbers, max_memories):
        entry = '{0:08d},{1}\n'.format(exposure_number, max_memory)
        opened_file.write(entry)