# Unix Variables

## Safety - Disable default variables

The default behavior is to assign the empty string to undeclared variables. This is a problem because it makes mis-spelled variables hard to detect.

In [1]:
echo $DOES_NOT_EXIST




In [2]:
set -u

In [3]:
echo $DOES_NOT_EXIST

bash: DOES_NOT_EXIST: unbound variable


: 1

## Assigning variables

In [4]:
FILENAME="temp.txt"

In [5]:
echo $FILENAME

temp.txt


In [6]:
echo "Some stuff" > $FILENAME

In [7]:
cat $FILENAME

Some stuff


## Common mistakes

You must not have spaces on either side of `=` in a variable assignment.

In [8]:
FILENAME = "temp.txt"

bash: FILENAME: command not found


: 127

Unix interprets this as: run a command called `FILENAME`

In [9]:
FILENAME ="temp.txt"

bash: FILENAME: command not found


: 127

Unix interprets this as: Assign space to the variable FILENAME then run a program called `temp.txt`

In [10]:
FILENAME= "temp.txt"

bash: temp.txt: command not found


: 127

## Using a variable

In [11]:
PREFIX="Gene"

In [12]:
echo $PREFIX

Gene


In [13]:
echo $PREFIX001

bash: PREFIX001: unbound variable


: 1

If you surround the variable name with curly braces, you can concatenate names.

In [14]:
echo ${PREFIX}001

Gene001


## Assigning command outputs to variables

To caputre the output of a command, use `$(command)`

In [15]:
FILES=$(ls)

In [16]:
echo $FILES

a.txt bgp.fasta b.txt Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf c.txt data figs hello.md5 hello.txt lsd1.txt lsd2.txt lsd3.txt MD5_CHECKSUM MD5SUM notebooks.tar.gz R00_Review_Basics.ipynb R00_Review_Basics_Scratch.ipynb R01_Data_Manipulation.ipynb R01_Data_Manipulation_Scratch.ipynb R01_Data_Manipulation_Solutions.ipynb R01_Manipulating_Data_In_R.ipynb R02_Tidying_Data_In_R.ipynb R02_Tidying_Data.ipynb R02_Tidying_Data_Solutions.ipynb R03_FileIO.ipynb R04_Unsupervised_Learning.ipynb R04_Unsupervised_Learning_Scratch.ipynb R05_Unsupervised_Learning_More_Examples.ipynb R06_Graphics_Overview.ipynb R07_Graphics_Base.ipynb R08_Graphics_ggplot2.ipynb R09_Graphics_Exercise.ipynb R09_Graphics_Exercise_Solutions.ipynb seqs temp.txt Unix01_File_And_Directory.ipynb Unix01_File_And_Directory_Solutions.ipynb Unix02_FileIO.ipynb Unix02_FileIO_Solutions.ipynb Unix03_File_Storage.ipynb Unix03_File_Storage_Solutions.ipynb Unix04_Text_Manipulation.ipynb Unix04_Text_Manipulation_Solutinos.ipynb 

In [17]:
grep -in "unix" $FILES | head -5

grep: data: Is a directory
grep: figs: Is a directory
lsd1.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
lsd2.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
lsd3.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
MD5_CHECKSUM:19:5fce58fcf8e1fb4b07b68240d06679fb  Unix01_File_And_Directory.ipynb
MD5_CHECKSUM:20:eb62c94ab276c8d3414e436e3ee0505f  Unix01_File_And_Directory_Solutions.ipynb
grep: write error: Broken pipe


We can also use the anonumous caputre form.

In [18]:
grep -in "unix" $(ls) | head -5

grep: data: Is a directory
grep: figs: Is a directory
lsd1.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
lsd2.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
lsd3.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
MD5_CHECKSUM:19:5fce58fcf8e1fb4b07b68240d06679fb  Unix01_File_And_Directory.ipynb
MD5_CHECKSUM:20:eb62c94ab276c8d3414e436e3ee0505f  Unix01_File_And_Directory_Solutions.ipynb
grep: write error: Broken pipe


You may sometimes see this old backticks form. It is equivalent although modern usage seems to favor the `$(command)` from.

In [19]:
grep -in "unix" `ls` | head -5

grep: data: Is a directory
grep: figs: Is a directory
lsd1.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
lsd2.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
lsd3.txt:1:Two of the most famous products of Berkeley are LSD and Unix. 
MD5_CHECKSUM:19:5fce58fcf8e1fb4b07b68240d06679fb  Unix01_File_And_Directory.ipynb
MD5_CHECKSUM:20:eb62c94ab276c8d3414e436e3ee0505f  Unix01_File_And_Directory_Solutions.ipynb
grep: write error: Broken pipe


## Assigning results of  an arithmetic expression (integers only)

To do integer arithmetic, use `$(( expression ))`.

In [20]:
echo $((2 + 3))

5


This does not work for floating point numbers.

In [21]:
echo $((2.2 + 3.3))

bash: 2.2 + 3.3: syntax error: invalid arithmetic operator (error token is ".2 + 3.3")


: 1

You need to invoke the `bc` calculator program to deal with floating point numbers.

In [22]:
echo 2.2 + 3.3 | bc 

bash: bc: command not found


: 127

## Using variables in loops

In [23]:
for FILE in $(ls *txt)
do
    wc -c $FILE
done

6 a.txt
6 b.txt
6 c.txt
45 hello.txt
107 lsd1.txt
107 lsd2.txt
107 lsd3.txt
11 temp.txt


In [24]:
for FIB in 1 1 2 3 5
do
    echo $FIB
done

1
1
2
3
5


In [25]:
for N in $(seq 1 10)
do
    if [[ $N -le 5 ]]
    then
        echo $N
    else
        echo $((3*N))
    fi
done

1
2
3
4
5
18
21
24
27
30


## Fibonacci series

Just for fun.

In [26]:
a=1
b=1
for i in $(seq 1 10)
do
    echo -n ${a}","
    tmp=$a
    a=$b
    b=$((tmp+b))
done

1,1,2,3,5,8,13,21,34,55,

## Single and double quotes

Variables are not evaluated within single quotes, but they are within double quotes.

In [27]:
FOO=42
echo '$FOO'

$FOO


In [28]:
FOO=42
echo "$FOO"

42


### Testing and branching

Simple example - not use of `-lt`, `-gt`, `&&` and use of parentheses within the test `[[ condition ]]`

In [29]:
if [[ (2 -gt 1) && (1 -lt 2)]]
then
    echo '2 > 1'
else
    echo 'WTF?'
fi

2 > 1


Check if a file or its uncompressed version exists before downloading.

In [30]:
URL='ftp://ftp.ensemblgenomes.org/pub/release-39/fungi/gtf/fungi_basidiomycota1_collection/cryptococcus_neoformans_var_grubii_h99/Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf.gz'

FILENAME=$(basename $URL)
echo ${FILENAME}
echo ${FILENAME%.*}

# Download if file does not exist
if [[ ! ((-f ${FILENAME}) || (-f ${FILENAME%.*}))  ]]
then
    wget $URL
    gunzip $URL
else
    echo "File exists"
fi

Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf.gz
Cryptococcus_neoformans_var_grubii_h99.CNA3.39.gtf
File exists


Using regular expression matching in a test.

In [31]:
for FILE in $(ls)
do
    if [[ $FILE =~ .*Bash.*ipynb$ ]]
    then
        echo $FILE
    fi
done

Unix06_Bash_Bioinformatics.ipynb


## Environment variables

You can see what variables are visible in the environment with `env`

In [32]:
env | head -5

LC_ALL=en_US.UTF-8
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=

In [33]:
echo $HOME

/home/jovyan


To make a variable visible in the general environment so that other programs can use it, you need to `export` it.

In [34]:
env | grep EXPORTED_VARIABLE

: 1

In [35]:
export EXPORTED_VARIABLE="Hello, Unix"

In [36]:
env | grep EXPORTED_VARIABLE

EXPORTED_VARIABLE=Hello, Unix


Now remove the environment variable.

In [37]:
unset EXPORTED_VARIABLE

In [38]:
env | grep EXPORTED_VARIABLE

: 1

## Brace expansion

Brace expansions create lists of strings. It can also generate ranges.

In [39]:
echo file.{c,cpp,py,ipynb,csv,txt}

file.c file.cpp file.py file.ipynb file.csv file.txt


In [40]:
echo {a..c}{1..3}.txt

a1.txt a2.txt a3.txt b1.txt b2.txt b3.txt c1.txt c2.txt c3.txt


In [41]:
for NUM in {1..3}; do
    echo mkdir EXPT-${NUM}
done

mkdir EXPT-1
mkdir EXPT-2
mkdir EXPT-3


## Shell scripts

A shell script is just a collection of shell commands that you are now familiar with put into a file that can be executed from the command line. There are a few steps to make a shell script.

1. The first line often contains instructions to use a shell
`#!/bin/bash`
2. The other lines contain standard variable declarations, shell commands, loops etc
3. Save the file with the extension (`.sh`)
4. Make the file executable by `chmod +x <FILENAME>.sh`

Now you can run the shell script as though it were a shell command.

### First shell script

Here we will show the mechanics of creating a shell script.

Use an editor to write `script01.sh`

```bash
#!/bin/bash

echo "Hello bash!"
```

In [42]:
cat > script01.sh << EOF
#!/bin/bash

echo "Hello bash!"
EOF

In [43]:
ls *sh

script01.sh


Change permission to make file executable.

In [44]:
chmod +x script01.sh

In [45]:
ls -l *sh

-rwxr-xr-x 1 jovyan users 32 Jun 26 09:18 [0m[01;32mscript01.sh[0m


In [46]:
./script01.sh

Hello bash!


### Second shell script

Here we see how to pass arguments to a script in `script02.sh`

```bash
#!/bin/bash                                                                     

echo '$# gives $ of arguments     :' $#
echo '$@ gives arguments as array :' $@
echo '$* gives arguments as string:' $*

echo '$1, $2, $3 give firs, second, third arguments etc'
echo '$1:' $1
echo '$2:' $2
echo '$2:' $3

echo 'Evaluating "$@"'
for ARG in "$@"
do
    echo ${ARG}
done
echo 'Evaluating "$*"'
for ARG in $*
do
    echo ${ARG}
done
```

In [48]:
chmod +x script02.sh

In [49]:
./script02.sh a b "c d"

$# gives $ of arguments     : 3
$@ gives arguments as array : a b c d
$* gives arguments as string: a b c d
$1, $2, $3 give firs, second, third arguments etc
$1: a
$2: b
$2: c d
Evaluating "$@"
a
b
c d
Evaluating "$*"
a
b
c
d


## Exercises

1. Write a shell script that accepts an arbitrary number of filenames as arguments (possibly given by `ls`), and outputs the total number of words in those files.

```bash
#!/bin/bash

TOTAL=0
for FILE in "$@"
do
    N=$(wc -w < $FILE)
    TOTAL=$((TOTAL + N))
done
echo $TOTAL
```