# Laboratorio 4
**Tutorial and best practices of bash scripting**


Authors:
    
- Prof. Marco A. Deriu (marco.deriu@polito.it)
- Lorenzo Pallante (lorenzo.pallante@polito.it)
- Eric A. Zizzi (eric.zizzi@polito.it)
- Marcello Miceli (marcello.miceli@polito.it)
- Marco Cannariato (marco.cannariato@polito.it)

Other credits:

- Some information is obtained from the excellent tutorials by Ryan Chadwick (https://ryanstutorials.net/), which you should definitely check out!

# Table of Contents

1. Bash recap
2. Scripts = how to automate boring tasks
3. Practical applications

**Learning outcomes:** 
- understand the technical aspects of scripting
- awareness of the dangers of offensive scripting
- be able to use scripting to solve real-world problems

# 1. Bash conditionals and cycles

## 1.1 The IF statement
Just as in other laguages, such as Python or JavaScript, bash has many tools available, including conditionals and cycles, which you should know before starting your scripting journey. Let's briefly review them

The general syntax for an IF conditional is:
<div class="alert alert-block alert-info">
<b>if</b> test <br>
<b>then</b> <br>
 do stuff <br>
<b>else</b> <br>
 do something else <br>
<b>fi</b>
</div>

<b>EXAMPLE</b>

```bash
$ n=10
$ if [$n –gt 5]
$ then
$ echo “your number is greater than 5”
$ else
$ echo “your number is lower or equal to 5”
$ fi
```

## 1.2 Test operators

The test operator in bash can be in one of the following formats:
```bash
$ if <test>
$ if [<test>]
$ if [[<test>]]
```

Some possible test include the following:
| Syntax | Description |
| --- | --- |
| -n VAR | True if the length of VAR is greater than zero. |
| -z VAR | True if the VAR is empty. |
| STRING1 = STRING2 | True if STRING1 and STRING2 are equal. |
| STRING1 != STRING2 | True if STRING1 and STRING2 are not equal. |
| INTEGER1 -eq INTEGER2 | True if INTEGER1 is equal to INTEGER2. |
| INTEGER1 -lt INTEGER2 | True if INTEGER1 is less than INTEGER2. |
| INTEGER1 -ge INTEGER2 | True is equal or greater than INTEGER2. |
| INTEGER1 -le INTEGER2 | True is equal or less than INTEGER2. |
| -h FILE | True if the FILE exists and is a symbolic link. |
| -d FILE | True if the FILE exists and is a directory. |
| -f FILE | True if the FILE exists and is a regular file (e.g. not a directory). |
| -e FILE | True if the FILE exists and is a file, regardless of type (node, directory, socket, etc.).|


## 1.3 The FOR statement

The general syntax for the for cycle is:
<div class="alert alert-block alert-info">
<b>for</b> variable <b>in</b> list <br>
<b>do</b> <br>
 do stuff <br>
<b>done</b> <br>
</div>

<b>EXAMPLE</b>

```bash
$ for i in $(ls)
$ do
$ echo $i
$ done
```

# 2. Scripting

First, the basics. What the hell is a script anyway?<br>
We learned that bash is an interface to your computer: it is a way to interact with the operating system, just as the graphical interface, meaning you can move files, create files and folders, launch applications, etc..<br>
So, any command you enter in the Linux terminal (so, any bash command) is an instruction for your operating system. So what about scripts?<br>
The word "script" reminds of the thing actors use in a theater play, telling them what to do/say and when. You can think of bash scripts in the same way: they are documents telling your bash shell what to do, and in what order.<br>
Any command you execute by typing it into the shell can also be put in a script, without changing anything!

**RECAP ON COMPUTER PROGRAMS AND PROCESSES**<br>
There is one thing you should remember, which is essential especially if you start to write more and more complex scripts. Generally speaking a <b><u>computer program</b></u> is a set of instructions the CPU should execute, and is stored on the hard disk. When you launch a program, these instructions are copied into the memory (RAM), and the copied program is now called a <b><u>process</b></u>. The important point here is that you can have multiple processes of the same program (e.g. open two windows of the same program, or launch the same command in two terminal windows). When you open a terminal window, a bash **process** is started, giving you the interactive bash intepreter where you can type commands. If you then launch a script, a <u>different bash process will be started</u> just for the script, within the bash process that was started with your terminal. This is a detail, but has actual consequences on things such as sharing variables and so on.


**IN A NUTSHELL**<br>
In summary, scripts are text files which contain a series of commands that will be executed in sequence automatically. This is very useful in many scenarios, e.g. when a series of commands have to be executed many times in the same way, or when a data elaboration pipeline is particularly long and unpractical to execute one command at a time

## 2.1 Syntax basics

In order to be executed as a script, a text file has to contain a specific string in the first line of the file.
In the case of bash, this is usually
```bash
#!/bin/bash
```
While in the case of python scripts, the line is usually
```python
#!/usr/bin/env python3
```
These one-liners are called <b>shebangs</b> and tell the computer that the text file should be executed as a script using a specific interpreter.<br>
The extension of bash scripts is <b>.sh</b><br>
Instead, the extension of python scripts is <b>.py</b><br>
In principle, any text file can be executed as a script if you explicitly specify the interpreter to use before the name of the file:

<b>EXAMPLE</b><br>
```bash
$ bash myscript.sh
```
or<br>
```bash
$ bash anotherscript.txt
```

**REMINDER: Linux is an <u>extensionless system</u>, so the .sh extension is just a convention and Linux doesn't really care how the filename ends!**

However, specifying the <b>shebang</b> at the beginning of the file means that you don't need to specify the interpreter, and you can just launch the script using its filename:
```bash
$ ./myscript.sh
```
instead of:
```bash
$ bash myscript.sh
```

<div class="alert alert-block alert-warning">
<b>WARNING:</b> A file must be flagged as executable in Linux in order to be able to execute it without specifying the interpreter.<br>
This might require to use the <b>chmod</b> command, for example:<br>
$ chmod +x myscript.sh<br>
The "+x" flag is telling Linux to add the "eXecutable" flag on the file, i.e. to treat it as something that can be executed directly, which is disabled by default for security reasons.

</div>

### 2.1.1 Try to stay on the right $PATH
Wait, why was the script executed as:
```bash
$ ./myscript.sh
```
instead of simply typing
```bash
$ myscript.sh
```

This is a small but important detail. The question is: when you enter any command into the command line, how does Bash know where the program you are trying to invoke actually is? Remember: any program is a bunch of instructions written somwhere on the hard disk. 

So if you enter:
```bash
$ pwd
```
how does the shell know where the program "pwd" is on the hard disk?<br>
It knows thanks to an <b><u>environment variable called $PATH</b></u>, which is a list of directories where useful system programs are located. Since it is just a variable, you can look at the contents of the PATH variable by issuing the command:
```bash
echo $PATH
```
As you can see, PATH contains a list of folders. If you type a command without specifying its folder, Bash will look for that program in these directories. If the command you issued does not exist in any of these folders, you will get the famous
```bash
$ command not found
```
error.<br>

Let's go back to our original question: why do I have to write
```bash
$ ./myscript.sh
```
instead of 
```bash
$ myscript
```
The answer is that if you write "./myscript.sh", you are specifying the path of the executable (in this case, the current folder "./"). If you just type "myscript" or "myscript.sh", no path is specifyed and Bash will look for that executable in the $PATH, and likely fail to find it!

## 2.2 Let's get started with scripts then

<b>EXAMPLE</b>
Let's see the overall structure of a script that prints the name of the current folder (working directory), named currentfolder.sh:
```bash
#!/bin/bash
# This is a comment and will not be executed
# same for this line
echo "the current folder is $(pwd)" # this will be executed
# Graceful exit:
exit 0
```

A couple of things to notice here:
* \# is the comment symbol, except in the shebang (1st line)
* To execute a command and use its results, e.g. within a string, it must be placed into $(...). This is called <b><u>COMMAND SUBSTITUTION</b></u> and an important thing to remember to store e.g. the results of a command in a variable and re-use it in your script!
* Comments can be inline with commands
* Exit codes are useful for debugging! 0 usually means execution ok

### 2.2.1 Recap on variables:

As you might remember, variables are a useful way to temporarily store information that you might need during the execution of the script (usually <u>strings</u>). Remember that when you need to access a variable (i.e. read its content), you must use the "\$" sign before the name of the variable. On the other hand, when you set the value of a variable (i.e., you write to it), you shouldn't use the "\$" sign.

In the context of scripting, there are a couple of <b>standard variables</b> that you should be aware of:
* The variable <b>\$0</b> is the name of the script being executed
* The arguments you pass after the script name on the command line can be accessed from within the script as <b>\$1, \$2, \$3, etc. </b>
* The variable <b>\$#</b> contains the number of arguments passed to the script
* The variable <b>\$@</b> is the actual list of those passed arguments
* <b>\$?</b> is a variable that stores the exit code of the last executed process. Remember: 0 means execution without errors, anything else is an error code.
* <b>$!!</b> contains the last command that was issued on the command line

**EXAMPLE**<br>
Such variables can be very useful. For example, if you want to follow two different branches within a script based on the successful/unsuccessful execution of a given command, you can use something like:
```bash 
$ ...
$ command_which_might_fail <argument1> <argument2>
$ if [ $? -eq 0 ]; then
$   echo "Last command was succesful! Grab a beer 🍺"
$ else
$   echo "The command above failed. You will never graduate 😈"
$ fi
$ ...
```

**CAUTION**<br>
Remember that while you can set a variable as:
```bash
$ var1=Ciao
```
this becomes problematic if you have e.g. spaces:
```bash
$ var1=Ciao Ciao
```
which will result in:
```bash
$ -bash: Ciao: command not found
```
You must use quotation marks to set variables containing spaces:
```bash
$ var1="Ciao Ciao"
```


### 2.2.2 Exercise
Use all your current and past knowledge to do the following exercise:<br>
Create a script named "proteinstats.sh" which does the following:<br>
* Reads all the files in a folder having a ".pdb" extension
* Creates a file named "stat.csv" containing a row for each read pdb file, formatted as: "\<filename\>,\<no. of residues\>"
* Creates and updates a single file called "maxres.stat" which contains a single row formatted as: "\<filename> has the highest number of residues (\<number\>)." with the name of the pdb file containing most residues and the corresponding number.

<div class="alert alert-block alert-warning">
<b>HINTS</b><br>
for, if, >, >>
</div>


**Example of expected result**<br>
```bash
$ bash proteinstast.sh
$ cat stat.csv
TAS1R1.pdb,305
TAS1R2.pdb,398
...
TAS2R3.pdb,320
$ cat maxrest.stat
TAS1R2 has the highest number of residues equal to 398
```