In [1]:
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

In [2]:
cd ~/shell-course

In [3]:
echo $SHELL

/bin/bash


# A brief introduction to the bash shell
Presented for QLSC 612, [Fundamentals of Neuro Data Science 2021](https://neurodatasci-course-2020.netlify.app) by [Sebastian Urchs](https://surchs.com).

Based (very extensively) on the [excellent lecture with the same name](https://github.com/neurodatascience/course-materials-2020/tree/master/lectures/11-may/03-intro-to-shell) by [Ross Markello](https://rossmarkello.com) from 2020 and the Software Carpentries ["Introduction to the Shell"](https://swcarpentry.github.io/shell-novice/) course.

## Before we get started...
    
We're going to be working with a dataset from https://swcarpentry.github.io/shell-novice/data/data-shell.zip.

Download that file and unzip it in your home directory:
- on a Mac: /Users/your-user-name
- on Linux or WSL: /home/your-user-name

## What is the Shell

- A shell is a program that **interpretes user input** into something the computer can understand

- A command-line shell runs inside a terminal that let's you type text
    - we call this a **command-line interface** (CLI), because you type commands in a line of text
    - this is in contrast to a **graphical user interface** (GUI) that you typically use
- command-line shell programs expect you to write **commands in a scripting language**

![shell-explained](images/shell_explained.png)

### But what's this "bash shell"?

It's one of many available shells!

* `sh` - Bourne **SH**ell
* `ksh` - **K**orn **SH**ell
* `dash` - **D**ebian **A**lmquist **SH**ell
* `csh` - **C** **SH**ell
* `tcsh` - **T**ENEX **C** **SH**ell
* `zsh` - **Z** **SH**ell
* `bash` - **B**ourne **A**gain **SH**ell  <-- We'll focus on this one!

### WHY SO MANY?!

* They all have different strengths / weaknesses
* You will see many of them throughout much of neuroimaging software, too!
    * `sh` is most frequently used in FSL
    * `csh`/`tcsh` is very common in FreeSurfer and AFNI

### So we're going to focus on the bash shell?

Yes! It's perhaps **the most common** shell, available on almost every OS:

* It's **the default** shell on most Linux systems
* It's the default shell in the Windows Subsytem for Linux (WSL)
* It's the default shell on Mac <=10.14
    * `zsh` is the new default on Mac Catalina (for licensing reasons 🙄)
    * But `bash` is still available!!

### Alright, but why use the shell at all?

Isn't the GUI good enough?

- The GUI is great, but the shell is **very powerful**
- Some tasks take many "clicks" in a GUI, the shell is often extremely good at automating these
- You can write sequences of shell commands to connect the outputs of programs to other programs (pipelines)
- You can store the shell commands you used in a script file and execute them again later
    - this is a great way to document what you have done
    - it makes your work reproducible in a way that describing the "clicks" could not
- Also, you need to use the shell to accesss remote machine / high-performance computing environments

**NOTE:** We will not be able to cover all (or even most) aspects of the shell today. 

But, we'll get through some _basics_ that you can build on in the coming weeks.

## The (bash) shell

Now, let's open up your terminal!

* **Windows**: Open the Ubuntu application
* **Mac/Linux**: Open the Terminal application

![shell](images/bash_linux.png)


When the shell is first opened, you are presented with a prompt, indicating that the shell is waiting for input:


```
$
```


The shell typically uses `$` as the prompt, but may use a different symbol.



**IMPORTANT:** When typing commands, either in this lesson or from other sources, **do not type the prompt**, only the commands that follow it!

### Am I using the bash shell?

Let's check! You can use the following command to determine what shell you're using:

In [6]:
echo $SHELL

/bin/bash


If that doesn't say something like `/bin/bash`, then simply type `bash`, press Enter, and try running the command again.

Voila! You're now in the bash shell.

**Note**: We just ran our first shell command!

The `echo` command does exactly what its name implies: it simply echoes whatever we provide it to the screen!

(It's like `print` in Python / R or `disp` in MATLAB or `printf` in C or ...)

### What's with the `$SHELL`?

* Things prefixed with `$` in bash are (mostly) **environmental variables** 
    * All programming languages have variables!
* We can assign variables in bash but when we want to reference them we need to add the `$` prefix
* We'll dig into this a bit more later, but by default our shell comes with some preset variables
    * `$SHELL` is one of them and it stores the path to the shell program that currently interprets our commands

## Navigating Files and Directories

* The **file system** is the part of our operating system for managing files and directories
* There are a lot of shell commands to create/inspect/rename/delete files + directories
    * Indeed these are perhaps the most common commands you'll be using in the shell!


### So where are we right now?

* When we open our terminal we are placed *somewhere* in the file system!
    * At any time while using the shell we are in exactly one place
* Commands mostly read / write / operate on files wherever we are, so it's important to know that!
* We can find our **current working directory** with the following command:

In [7]:
pwd

/home/surchs/shell-course


* Many bash commands are acronyms or abbreviations (to try and help you remember them).
    * The above command, `pwd`, is an acronym for "**p**rint **w**orking **d**irectory">

### Let's look at the file system

Let's take a look at an example file-system (for a Macintosh):

<img src="http://swcarpentry.github.io/shell-novice/fig/filesystem.svg" width="400px" style="margin-bottom: 10px;">

* The top (`/`) is the **root directory**, which holds the ENTIRE FILE SYSTEM.
* Inside are several other directories:
    * `bin` contains some built-in programs
    * `data` is where we store miscellaneous data files
    * `Users` is where personal user directories are
    * `tmp` is for temporary storage of files

**Note**: The filesystem on a Linux machine will have slightly different directory names (e.g. `/Users` is typically `/home`) but the same principles apply.

#### The `/` character has two meanings:

1. At the beginning of the path, it refers to the root directory
2. Inside a path, it is used as a separator between directories

So let's remind ourselves how to see where we are and figure out what's in our directory:

In [8]:
pwd

/home/surchs/shell-course


We are inside the `home` directory (e.g. `User` on Mac) for the user `surchs` (me) and in a sub-directory called `shell-course`.

Let's see what is in here

The `ls` command will list the contents of the directory we are currently in (i.e. the **current working directory**):

In [9]:
ls

[0m[01;34mdir_of_doom[0m  [01;34mflying_circus[0m  helloworld.txt  [01;34minteresting_files[0m  [01;34mnotes[0m


`ls`, as we saw before, prints the contents of your **current working directory**. 

We can make it tell us a bit more information about our directory by providing an **option** to the `ls` command

### General syntax of a shell command

Consider this command (where we are looking inside the `interesting_files` directory) as a general example:

In [10]:
ls -F interesting_files

[0m[01;32mi_can_see_variables.sh[0m*  [01;32mi_make_the_dir_of_doom.sh[0m*  [01;32mrun_me_too.sh[0m*
[01;32mi_make_many_files.sh[0m*    [01;32mrun_me.sh[0m*                  the_meaning_of_life.txt


We have:

1. A **command** (`ls`), 
2. An **option** (`-F`), also called a **flag** or a **switch**, and
3. An **argument** (`interesting_files`)

#### Options (a.k.a. flags, switches)

* Options change the behavior of a command
* They generally start with either a `-` or `--`
* They are case sensitive!

For example, `ls -s` will display the size of the contents of the provided directory:

In [11]:
ls -s interesting_files

total 24
4 [0m[01;32mi_can_see_variables.sh[0m     4 [01;32mrun_me.sh[0m
4 [01;32mi_make_many_files.sh[0m       4 [01;32mrun_me_too.sh[0m
4 [01;32mi_make_the_dir_of_doom.sh[0m  4 the_meaning_of_life.txt


`ls -l` is another option that displays the directory contents as a list. We can combine several options in one command:

In [12]:
ls -sl interesting_files

total 24
4 -rwxrwxr-x 1 surchs surchs 142 Jul 14 01:58 [0m[01;32mi_can_see_variables.sh[0m
4 -rwxrwxr-x 1 surchs surchs 326 Jul 13 02:08 [01;32mi_make_many_files.sh[0m
4 -rwxrwxr-x 1 surchs surchs 626 Jul 13 22:41 [01;32mi_make_the_dir_of_doom.sh[0m
4 -rwxrwxr-x 1 surchs surchs  87 Jul 14 01:16 [01;32mrun_me.sh[0m
4 -rwxrwxr-x 1 surchs surchs 124 Jul 14 01:29 [01;32mrun_me_too.sh[0m
4 -rw-rw-r-- 1 surchs surchs   3 Jul 13 12:54 the_meaning_of_life.txt


#### Arguments (a.k.a parameters)

* These tell the command what to operate on!
* They are only *sometimes* optional (as with `ls`)
    * In these cases, providing them will also change the behavior of the command!
    
compare:

In [13]:
ls

[0m[01;34mdir_of_doom[0m  [01;34mflying_circus[0m  helloworld.txt  [01;34minteresting_files[0m  [01;34mnotes[0m


In [14]:
ls flying_circus

brian.txt  dangerous_rabbits.txt  [0m[01;34mmy_lines[0m  the_holy_grail.txt


If we do not give `ls` an argument, it will list the contents of the current working directory.

#### So many options, where can I find help

Either `man ls` or `ls --help`!

This will vary depending on: (1) the command and (2) your operating system!

Generally try `man` first:

In [15]:
man ls

LS(1)                            User Commands                           LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort  is  speci‐
       fied.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print C-style escapes for nongraphic characters

       --block-size=SIZE
              with  -l,  scale  sizes  by  SIZE  when  printing  them;   e.g.,
              '--block-size=M'; see SIZE format below

       -B, --ignore-backups
              do not list implied entries ending with ~

       -c     with -lt: s

### You can ask the shell what a command does

Sometimes you don't want to read the entire `man` page, but really just want to remember what a command does. A really useful helper tool is `whatis`. 

In [16]:
whatis

whatis what?


: 1

It **does** expect an argument. "what do you want to know about". Let's see what `ls` does:

In [17]:
whatis ls

ls (1)               - list directory contents


Not every command has a description:

In [18]:
whatis cd

cd: nothing appropriate.


: 16

### We can do more than list directories

So many interesting things to see, let's change to a different working directory so we can do things there. 

- The `cd` or "change directory" command will let us do that

**Note**: This is analogous to clicking and opening a directory in your graphical file explorer

In [19]:
ls

[0m[01;34mdir_of_doom[0m  [01;34mflying_circus[0m  helloworld.txt  [01;34minteresting_files[0m  [01;34mnotes[0m


In [20]:
cd flying_circus

Let's confirm that we have indeed changed directory by calling `pwd`:

In [21]:
pwd

/home/surchs/shell-course/flying_circus


What happens if we run cd without any arguments?

In [22]:
cd

In [23]:
pwd

/home/surchs


We are back in our home directory! This is *incredibly* useful if you've gotten lost.

- `cd` without arguments brings you to your home directory
- the `~` (tilde) character is a shorthand for your home directory. So `cd ~` also brings you there
- the `-` (dash) character is a shorthand for the previous directory you were in. So `cd -` brings you back to where you just were

Let's get back to the `flying_circus` directory again.

In [24]:
cd shell-course/flying_circus

In [25]:
pwd

/home/surchs/shell-course/flying_circus


We can string together paths with the `/` separator instead of changing one directory at a time! Because the path we gave to `cd` did not start with the file system root directory (`/`), it was interpreted as a relative path, i.e. in reference to the home directory that we called `cd` from.

### Relative versus absolute paths

So far, we have been using **relative** paths to change directories and list their contents.

- A **relative** path is **relative to the current working directory**. It does **not** begin with the file system root (`/`).
- An **absolute** path includes the entire path beginning with the file system root directory (`/`).

`pwd` prints the **absolute** path of the current working directory:

In [26]:
pwd

/home/surchs/shell-course/flying_circus


Let's take a look around in this directory

In [27]:
ls

brian.txt  dangerous_rabbits.txt  [0m[01;34mmy_lines[0m  the_holy_grail.txt


On second thought, let's not go here, t'is a silly place.

How do we go back? There's a special notation to move one directory up:

In [28]:
cd ..

Here, `..` refers to "the directory containing this one". This is also called the **parent** of the current directory.

Let's check that we are where we think we are:

In [29]:
pwd

/home/surchs/shell-course


### Seeing the unseen

`ls` is supposed to list the contents of our directory, but we didn't see `..` anywhere in the listings from before, right?

`..` is a special directory that is normally hidden. We can provide an additional argument to `ls` to make it appear:

In [30]:
ls -a

[0m[01;34m.[0m   [01;34mdir_of_doom[0m    helloworld.txt  [01;34minteresting_files[0m
[01;34m..[0m  [01;34mflying_circus[0m  .i_am_hidden    [01;34mnotes[0m


The `-a` argument (show **a**ll contents) will list ALL the contents of our current directory, including special and hidden files/directories, like:

* `..`, which refers to the parent directory
* `.`, which refers to the current working directory

### Hidden files

The last command also revealed a `.i_am_hidden` file:

In [31]:
ls -aFl

total 28
drwxrwxr-x 6 surchs surchs 4096 Jul 14 10:02 [0m[01;34m.[0m/
drwxrwxr-x 6 surchs surchs 4096 Jul 14 10:02 [01;34m..[0m/
drwxrwxr-x 4 surchs surchs 4096 Jul 14 10:02 [01;34mdir_of_doom[0m/
drwxrwxr-x 3 surchs surchs 4096 Jul 13 23:58 [01;34mflying_circus[0m/
-rw-rw-r-- 1 surchs surchs   11 Jul 13 01:58 helloworld.txt
-rw-rw-r-- 1 surchs surchs    0 Jul 13 12:29 .i_am_hidden
drwxrwxr-x 2 surchs surchs 4096 Jul 14 01:58 [01;34minteresting_files[0m/
drwxrwxr-x 2 surchs surchs 4096 Jul 14 03:07 [01;34mnotes[0m/


The `.` prefix is usually reserved for configuration files, and prevents them from cluttering the terminal when you use `ls`.

### Summary


* The file system is responsible for managing information on the disk
* Information is stored in files, which are stored in directories (folders)
* Directories can also store other (sub-)directories, which forms a directory tree
* `cd path` changes the current working directory
* `ls path` prints a listing of a specific file or directory; `ls` on its own lists the current working directory.
* `pwd` prints the user’s current working directory
* `/` on its own is the root directory of the whole file system
* A relative path specifies a location starting from the current location
* An absolute path specifies a location from the root of the file system
* `..` means "the directory above the current one"; `.` on its own means "the current directory"

## Modifying files and directories

So far we have mainly looked at contents of files and directories. But we can of course also make changes. Let's first see again where we are:

In [32]:
pwd

/home/surchs/shell-course


In [33]:
ls

[0m[01;34mdir_of_doom[0m  [01;34mflying_circus[0m  helloworld.txt  [01;34minteresting_files[0m  [01;34mnotes[0m


### Creating a directory
Let us create a subdirectory called `notes`. We can use a program called `mkdir` for this.

In [34]:
mkdir notes

mkdir: cannot create directory ‘notes’: File exists


: 1

Since we provided a relative path, we can expect that to have been creating in our current working directory:

In [35]:
ls -F

[0m[01;34mdir_of_doom[0m/  [01;34mflying_circus[0m/  helloworld.txt  [01;34minteresting_files[0m/  [01;34mnotes[0m/


(You could have also opened up the file explorer and made a new directory that way, too!)

### Good naming conventions

1. Don't use spaces
2. Don't begin the name with `-`
3. Stick with letters, numbers, `.`, `-`, and `_`
    - That is, avoid other special characters like `~!@#$%^&*()`

### Creating a text file

Let's
- navigate into our (empty) `notes` directory (with `cd`)
- confirm that it is in fact empty (with `ls`)
- and create a new file. For this we can use `nano`

In [36]:
cd notes

In [37]:
# nano my_note.txt

`nano` is a useful command-line **text editor**. It only works with plain text (i.e., no graphs, figures, tables, or images!)

(You may be familiar with graphical editors like Gedit, Notepad, or TextEdit, or other command line editors like Emacs or Vim.)

![nano](images/nano_note.png)
`nano` uses the Control (CTRL) and ALT key to make changes. The command help along the bottom of the editor window refers to these keys with abbreviations:

- `^` for CTRL: `^G`  means "press and hold CTRL together with the `G` key"
- `M` for ALT:  `M-U` means "press and hold ALT together with the `U` key"

Let's save our note with `^O`, i.e. `CTRL+O` (the letter o)


`nano` doesn't print anything to screen, so let's make sure our file exists:

In [38]:
ls -F

To check that we have indeed written to this file, let's display its contents. We can do this with `cat`

In [41]:
cat my_note.txt

cat: my_note.txt: No such file or directory


: 1

### Moving files and directories

Let's first go back up to our `shell-course` directory

In [42]:
cd ~/shell-course

In [43]:
ls

[0m[01;34mdir_of_doom[0m  [01;34mflying_circus[0m  helloworld.txt  [01;34minteresting_files[0m  [01;34mnotes[0m


Let's look into this `dir_of_doom`.

In [44]:
cd dir_of_doom

In [45]:
ls -F

big_file_with_no_purpose.txt  [0m[01;34mthe_right_dir[0m/  [01;34mthe_wrong_dir[0m/


In [46]:
ls -F the_wrong_dir

my_file1.txt  my_file2.txt  my_file3.txt  my_file4.txt


All of these files are in the wrong directory. 

Let's move the files in `the_wrong_dir` to `the_right_dir`. We can use the `mv` command for this!

In [47]:
mv the_wrong_dir/my_file1.txt the_right_dir

The first argument of `mv` is the file we're moving, and the last argument is where we want it to go!

Let's make sure that worked:

In [48]:
ls the_right_dir

my_file1.txt


We can provide more than two arguments to `mv`, as long as the final argument is a directory! That would mean "move all these things into this directory".

In [49]:
mv the_wrong_dir/my_file2.txt the_wrong_dir/my_file3.txt the_right_dir

We can make our life easier by using wildcards! Wildcards are simple patterns that can match any character in a file name:
- `*` (the asterisk) will match any character 0 or more times. i.e. `*.txt` will match both `a.txt` and `any.txt` (any file ending in `.txt`)
- `?` (the questionmark) will match any character exactly once. i.e. `?.txt` will match only `a.txt` but not `any.txt`

We can use wildcards to move any file that fits our pattern so we don't have to type each individual file name.

In [50]:
ls the_right_dir/my_file?.txt

the_right_dir/my_file1.txt  the_right_dir/my_file3.txt
the_right_dir/my_file2.txt


**Note**: `mv` is **quite dangerous**, because it will silently overwrite files if the destination already exists! Refer to the `-i` flag for "interactive" moving (with warnings!).

### Copying files and directories

The `cp` (**c**o**p**y) command is like `mv`, but copies instead of moving!
Let's use it to make a backup of the files in `the_right_dir`

In [51]:
mkdir backup

In [52]:
cp the_right_dir/my_file1.txt backup

Let's confirm we have copied the file into `backup` and it is also still in `the_right_dir`. We could run two `ls` commands, but we can also just use a wildcard to look inside all directories!

In [53]:
ls */my_file1.txt

backup/my_file1.txt  the_right_dir/my_file1.txt


Let's copy the complete `the_right_dir` to `backup`

In [54]:
cp the_right_dir backup

cp: -r not specified; omitting directory 'the_right_dir'


: 1

To copy directories and all of its contents, we have to use the `-r` (**r**ecursive) flag:

In [55]:
cp -r the_right_dir backup

In [56]:
ls backup

my_file1.txt  [0m[01;34mthe_right_dir[0m


### Removing files

There is a large and useless file in our directory. Let's remove it. We can use `rm` to **r**e**m**ove it:

In [57]:
ls -lhS

total 24K
-rw-rw-r-- 1 surchs surchs  10K Jul 14 10:02 big_file_with_no_purpose.txt
drwxrwxr-x 3 surchs surchs 4.0K Jul 14 10:03 [0m[01;34mbackup[0m
drwxrwxr-x 2 surchs surchs 4.0K Jul 14 10:03 [01;34mthe_right_dir[0m
drwxrwxr-x 2 surchs surchs 4.0K Jul 14 10:03 [01;34mthe_wrong_dir[0m


In [58]:
rm big_file_with_no_purpose.txt

The `rm` command deletes files. Let's check that the file is gone:

In [59]:
ls big_file_with_no_purpose.txt

ls: cannot access 'big_file_with_no_purpose.txt': No such file or directory


: 2

### Deleting is **FOREVER** 💀💀

* The shell DOES NOT HAVE A TRASH BIN.
* You CANNOT recover files that have been deleted with `rm`
* But, you can use the `-i` flag to do things a bit more safely!
    * This will prompt you to type `Y` or `N` before every file that is going to be deleted.

### Removing directories

Let's try and remove the `the_wrong_dir`:

In [60]:
rm the_wrong_dir

rm: cannot remove 'the_wrong_dir': Is a directory


: 1

`rm` only works on files, by default, but we can tell it to **r**ecursively delete a directory and all its contents with the `-r` flag:

In [61]:
rm -r the_wrong_dir

Because **deleting is forever 💀💀**, the `rm -r` command should be used with GREAT CAUTION.

### Summary

* `cp old new` copies a file
* `mkdir path` creates a new directory
* `mv old new` moves (renames) a file or directory
* `rm path` removes (deletes) a file
* `*` matches zero or more characters in a filename, so `*.txt` matches all files ending in `.txt`
* `?` matches any single character in a filename, so `?.txt` matches `a.txt` but not `any.txt`
* The shell does not have a trash bin: once something is deleted, it’s really gone

## Finding things with the shell

Oftentimes, our file system can be quite complex, with sub-directories inside sub-directories inside sub-directories.

What happens in we want to find one (or several) files, without having to type `ls` hundreds or thousands of times?

First, let's navigate back to `shell-course` directory:

In [62]:
cd ~/shell-course

Let's get our bearings with `ls`:

In [63]:
ls

[0m[01;34mdir_of_doom[0m  [01;34mflying_circus[0m  helloworld.txt  [01;34minteresting_files[0m  [01;34mnotes[0m


Unfortunately, this doesn't list any of the files in the directories. But we know from our previous exploration that there there are files and sub-directories. We can display the full sub-directory tree with the `tree` command:

In [64]:
tree

[01;34m.[00m
├── [01;34mdir_of_doom[00m
│   ├── [01;34mbackup[00m
│   │   ├── my_file1.txt
│   │   └── [01;34mthe_right_dir[00m
│   │       ├── my_file1.txt
│   │       ├── my_file2.txt
│   │       └── my_file3.txt
│   └── [01;34mthe_right_dir[00m
│       ├── my_file1.txt
│       ├── my_file2.txt
│       └── my_file3.txt
├── [01;34mflying_circus[00m
│   ├── brian.txt
│   ├── dangerous_rabbits.txt
│   ├── [01;34mmy_lines[00m
│   └── the_holy_grail.txt
├── helloworld.txt
├── [01;34minteresting_files[00m
│   ├── [01;32mi_can_see_variables.sh[00m
│   ├── [01;32mi_make_many_files.sh[00m
│   ├── [01;32mi_make_the_dir_of_doom.sh[00m
│   ├── [01;32mrun_me.sh[00m
│   ├── [01;32mrun_me_too.sh[00m
│   └── the_meaning_of_life.txt
└── [01;34mnotes[00m
    └── my_note.txt

8 directories, 18 files


`tree` has options to display additional information, only show a certain depth of the tree and even filter certain file names. But if we are searching for a certain file name pattern, there is a better tool for us: 

`find`

In [65]:
find . -name 'my_*'

./flying_circus/my_lines
./notes/my_note.txt
./dir_of_doom/backup/my_file1.txt
./dir_of_doom/backup/the_right_dir/my_file1.txt
./dir_of_doom/backup/the_right_dir/my_file3.txt
./dir_of_doom/backup/the_right_dir/my_file2.txt
./dir_of_doom/the_right_dir/my_file1.txt
./dir_of_doom/the_right_dir/my_file3.txt
./dir_of_doom/the_right_dir/my_file2.txt


Remember, `.` means "the current working directory". 

Here, `find` begins the search in the current working directory and then traverses the entire directory structure. With the `-name` option, we specify a pattern that includes a wildcard to specify the names we are looking for.

One of the results here is a directory. We can filter the results further by specifying that we only want to see **f**ile matches.

In [66]:
find . -name 'my_*' -type f

./notes/my_note.txt
./dir_of_doom/backup/my_file1.txt
./dir_of_doom/backup/the_right_dir/my_file1.txt
./dir_of_doom/backup/the_right_dir/my_file3.txt
./dir_of_doom/backup/the_right_dir/my_file2.txt
./dir_of_doom/the_right_dir/my_file1.txt
./dir_of_doom/the_right_dir/my_file3.txt
./dir_of_doom/the_right_dir/my_file2.txt


### Finding things inside of files

Searching for files and directories based on their names and meta-data is helpful, but often it is interesting to search inside a file as well. 

For this, we can use `grep`. This is an abbreviation for "**g**lobally search for a **r**egular **e**xpression and **p**rint matching lines". If you can't remember this, just ask `whatis grep`.

In [67]:
whatis grep

grep (1)             - print lines that match patterns


Let's take a look in `hello_world.txt` and then use `grep` search for what we find inside.

In [69]:
cat helloworld.txt

Hello Bash


In [70]:
grep "Bash" helloworld.txt

Hello [01;31m[KBash[m[K


The directory `flying_circus` contains the movie scripts for two Monty Python movies. Only one of them has a rabbit as an actor. Let's find out which one it is:

In [71]:
grep "rabbit" -i --count --no-messages flying_circus/* 

[35m[Kflying_circus/brian.txt[m[K[36m[K:[m[K0
[35m[Kflying_circus/dangerous_rabbits.txt[m[K[36m[K:[m[K0
[35m[Kflying_circus/my_lines[m[K[36m[K:[m[K0
[35m[Kflying_circus/the_holy_grail.txt[m[K[36m[K:[m[K22


: 2

OK, only one of these files seems to have any mention of rabbits in it. We can use `man` to understand the options used here.

**Note** that the file `dangerous_rabbits.txt` was not a match, even though the file name contains "rabbit"

In [72]:
man grep

GREP(1)                          User Commands                         GREP(1)

NAME
       grep, egrep, fgrep, rgrep - print lines that match patterns

SYNOPSIS
       grep [OPTION...] PATTERNS [FILE...]
       grep [OPTION...] -e PATTERNS ... [FILE...]
       grep [OPTION...] -f PATTERN_FILE ... [FILE...]

DESCRIPTION
       grep  searches  for  PATTERNS  in  each  FILE.  PATTERNS is one or more
       patterns separated by newline characters, and  grep  prints  each  line
       that  matches a pattern.  Typically PATTERNS should be quoted when grep
       is used in a shell command.

       A FILE of “-”  stands  for  standard  input.   If  no  FILE  is  given,
       recursive  searches  examine  the  working  directory, and nonrecursive
       searches read standard input.

       In addition, the variant programs egrep, fgrep and rgrep are  the  same
       as  grep -E,  grep -F,  and  grep -r, respectively.  These variants are
       deprecated, but are provided for backward co

              this also causes the line number and byte offset (if present) to
              be printed in a minimum size field width.

       -u, --unix-byte-offsets
              Report Unix-style byte offsets.   This  switch  causes  grep  to
              report  byte offsets as if the file were a Unix-style text file,
              i.e., with  CR  characters  stripped  off.   This  will  produce
              results  identical  to  running  grep  on  a Unix machine.  This
              option has no effect unless -b option is also used;  it  has  no
              effect on platforms other than MS-DOS and MS-Windows.

       -Z, --null
              Output  a  zero  byte  (the  ASCII NUL character) instead of the
              character that normally follows a file name.  For example,  grep
              -lZ  outputs  a  zero  byte  after each file name instead of the
              usual newline.  This option makes the output  unambiguous,  even
              in the presence of fi

              expressions  to  fail.   This  option has no effect on platforms
              other than MS-DOS and MS-Windows.

       -z, --null-data
              Treat  input  and  output  data  as  sequences  of  lines,  each
              terminated by a zero byte (the ASCII NUL character) instead of a
              newline.  Like the -Z or --null option, this option can be  used
              with commands like sort -z to process arbitrary file names.

REGULAR EXPRESSIONS
       A  regular  expression  is  a  pattern that describes a set of strings.
       Regular  expressions  are   constructed   analogously   to   arithmetic
       expressions, by using various operators to combine smaller expressions.

       grep understands three different versions of regular expression syntax:
       “basic” (BRE), “extended” (ERE) and “perl” (PCRE).  In GNU  grep  there
       is  no difference in available functionality between basic and extended
       syntaxes.  In other implementations

              selected line when the -v command-line option is omitted,  or  a
              context line when -v is specified).  The default is 01;31, which
              means a bold red  foreground  text  on  the  terminal's  default
              background.

       GREP_COLORS
              Specifies  the  colors  and  other  attributes used to highlight
              various parts of the output.  Its  value  is  a  colon-separated
              list       of       capabilities      that      defaults      to
              ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36  with  the  rv
              and  ne  boolean  capabilities omitted (i.e., false).  Supported
              capabilities are as follows.

              sl=    SGR substring for whole selected  lines  (i.e.,  matching
                     lines when the -v command-line option is omitted, or non-
                     matching lines when -v is  specified).   If  however  the
                     boolean  rv capabili

              this environment variable's value is 1, do not consider the  ith
              operand  of  grep to be an option, even if it appears to be one.
              A shell can put  this  variable  in  the  environment  for  each
              command  it  runs,  specifying which operands are the results of
              file name wildcard expansion and therefore should not be treated
              as  options.   This  behavior  is  available only with the GNU C
              library, and only when POSIXLY_CORRECT is not set.

NOTES
       This man page is maintained only fitfully; the  full  documentation  is
       often more up-to-date.

COPYRIGHT
       Copyright 1998-2000, 2002, 2005-2020 Free Software Foundation, Inc.

       This is free software; see the source for copying conditions.  There is
       NO warranty; not even for MERCHANTABILITY or FITNESS FOR  A  PARTICULAR
       PURPOSE.

BUGS
   Reporting Bugs
       Email  bug reports to the bug-reporting address ⟨bug-

### Context: passing information with pipes

<img src="https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com/f/25247b92-6844-4fef-8ed8-5055cc35bf58/ddzqjp9-2c0f4355-53fa-4a92-bde6-f61c25ecaf25.png?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsIm9iaiI6W1t7InBhdGgiOiJcL2ZcLzI1MjQ3YjkyLTY4NDQtNGZlZi04ZWQ4LTUwNTVjYzM1YmY1OFwvZGR6cWpwOS0yYzBmNDM1NS01M2ZhLTRhOTItYmRlNi1mNjFjMjVlY2FmMjUucG5nIn1dXSwiYXVkIjpbInVybjpzZXJ2aWNlOmZpbGUuZG93bmxvYWQiXX0.S3c23OnhHItSBWvFydwXz5hpPXjf8js39kawm_nX0os" width="200px" style="margin-bottom: 10px;">

A strength of using the shell is that you can connect the output of one command to the input of another command. To do so, you can use the `|` (pipe) character. When you connect commands together with the pipe (`|`) operator, we can the entire statement a **pipeline**.

Pipelines take the general form of:

`command1 -flags arguments | command2 -flags arguments`.

Let's say we want to use grep to search for the occurence of a word that we think could be quite common, like "Ni". We could just print all of the matches. But maybe we want to see the 10 last occurences. A pipe allows us to take the output of `grep`, and give it to another command, `tail`, that does just that.

In [73]:
whatis tail

tail (1)             - output the last part of files


In [74]:
grep "Ni" --no-messages flying_circus/the_holy_grail.txt -nH | tail -n 10

flying_circus/the_holy_grail.txt:3838:       ... Recently Knights of Ni!
flying_circus/the_holy_grail.txt:3841:       Ni!
flying_circus/the_holy_grail.txt:3848:       More shrubberies!  More shrubberies for the ex-Knights of Ni!
flying_circus/the_holy_grail.txt:3859:       A path!  A little path for the late Knights of Ni!
flying_circus/the_holy_grail.txt:3861:   Chorus of "Ni!  Ni!"
flying_circus/the_holy_grail.txt:3891:       the Knights of Ni! cannot hear!
flying_circus/the_holy_grail.txt:4019:|      How about "The knights of Nicky-Nicky"?
flying_circus/the_holy_grail.txt:4842:       What's he do?  Nibble your bum?
flying_circus/the_holy_grail.txt:4982:       Armaments Chapter Two Verses Nine to Twenty One.
flying_circus/the_holy_grail.txt:5268:|   "Aaaaarrrrrrggghhh!  4 miles" and an arrow, and "Ni!  82 miles" and


`grep` and `tail` are two commands that each do a very specific thing. This is generally the case for shell commands on Unix systems, i.e. they follow the "Unix philosophy" of doing a single thing well. Pipes are a great way to combine the functionality of several commands to do what you want.

**Note**: in this example, we could have also used additional options for `grep` to achieve the same result without using a pipe.

By default, `grep` will show us the file name and the line in the text that contains the pattern match. Let's look for the word "swallow". We will limit our matches to 10 and also ask grep to print out the line following our match, so we can have more context.

In [75]:
grep "swallow" -i -n --max-count 10 --after-context 1 flying_circus/*

grep: flying_circus/my_lines: Is a directory
[35m[Kflying_circus/the_holy_grail.txt[m[K[36m[K:[m[K[32m[K414[m[K[36m[K:[m[K       The [01;31m[Kswallow[m[K may fly south with the sun, or the house martin
[35m[Kflying_circus/the_holy_grail.txt[m[K[36m[K-[m[K[32m[K415[m[K[36m[K-[m[K       or the plover seek warmer hot lands in winter, yet these are
[36m[K--[m[K
[35m[Kflying_circus/the_holy_grail.txt[m[K[36m[K:[m[K[32m[K425[m[K[36m[K:[m[K       What? A [01;31m[Kswallow[m[K carrying a coconut?
[35m[Kflying_circus/the_holy_grail.txt[m[K[36m[K-[m[K[32m[K426[m[K[36m[K-[m[K
[36m[K--[m[K
[35m[Kflying_circus/the_holy_grail.txt[m[K[36m[K:[m[K[32m[K431[m[K[36m[K:[m[K|       I'll tell you why not ... because a [01;31m[Kswallow[m[K is about eight
[35m[Kflying_circus/the_holy_grail.txt[m[K[36m[K-[m[K[32m[K432[m[K[36m[K-[m[K|       inches long and weighs five ounces, and you'd be lucky
[3

: 2

Very interesting.

### Summary

- we can print the structure of any given directory with `tree`
- `find` is a great tool to search for files and directories based on their name and other meta-data like size, age, and so on
- `grep` is a great tool to search within (text)files for occurences of a given string or even complex regular expressions
- pipes (`|`) allow us to combine the output of one command with the input of another command

## Scripts and variables

One of the most powerful functions of using the shell is that you can write your commands into a text file called a shell script, and then ask the shell to execute each command in the script in sequence.

This is very helpful if you want to 
- run the same set of commands repeatedly (e.g. every time you log into your computer)
- keep a detailed record of what commands you used to create an output
- share a set of commands with someone, or run their commands 

This is all very useful. So what do we need to do to turn a text file into a shell script?

Let's take a look into `interesting_files` where we will find some shell scripts.

In [76]:
cd interesting_files

In [77]:
ls -F

[0m[01;32mi_can_see_variables.sh[0m*  [01;32mi_make_the_dir_of_doom.sh[0m*  [01;32mrun_me_too.sh[0m*
[01;32mi_make_many_files.sh[0m*    [01;32mrun_me.sh[0m*                  the_meaning_of_life.txt


### Anatomy of a shell script
Let's display the contents of the file `run_me.sh`

In [78]:
cat run_me.sh -n

     1	#!/bin/bash
     2	
     3	# The following command will print a message
     4	echo "Thank you, very kind!"


A shell script needs to contain two things:

- the `#!/bin/bash` statement in the line 1 is called a hash-bang (shebang) and declares what shell program shall be used to execute this script. Here we use the bash shell
- the `echo "Thank you, very kind!"` statement in line 4 is the shell command - this is what gets executed.

Lastly there is
- The statement `#` in line 3 is a comment. The `#` (hash) will prevent the remaining text in this line from being executed. This is a good way to explain in human readable form what your script does
- our text file also uses the file ending `.sh` to show that it is a shell script

### Anatomy of a shell script (contd.)

However, in order to run (i.e. "execute") the script, the right content is not enough. Our script file must also have the right permission.

Remember that `ls -F` displays executable files by appending a `*` to the file name.

In [88]:
ls -F

[0m[01;32mi_can_see_variables.sh[0m*  [01;32mi_make_the_dir_of_doom.sh[0m*  run_me_too.sh
[01;32mi_make_many_files.sh[0m*    [01;32mrun_me.sh[0m*                  the_meaning_of_life.txt


This looks good. Let's execute our script:

In [89]:
./run_me.sh

Thank you, very kind!


What would happen if we try to run `run_me_too.sh`?

In [90]:
ls -lF

total 24
-rwxrwxr-x 1 surchs surchs 142 Jul 14 01:58 [0m[01;32mi_can_see_variables.sh[0m*
-rwxrwxr-x 1 surchs surchs 326 Jul 13 02:08 [01;32mi_make_many_files.sh[0m*
-rwxrwxr-x 1 surchs surchs 626 Jul 13 22:41 [01;32mi_make_the_dir_of_doom.sh[0m*
-rwxrwxr-x 1 surchs surchs  87 Jul 14 01:16 [01;32mrun_me.sh[0m*
-rw-rw-r-- 1 surchs surchs 124 Jul 14 01:29 run_me_too.sh
-rw-rw-r-- 1 surchs surchs   3 Jul 13 12:54 the_meaning_of_life.txt


In [91]:
./run_me_too.sh

bash: ./run_me_too.sh: Permission denied


: 126

We can change the permission of this script with the `chmod` command.

In [92]:
chmod +x run_me_too.sh

In [93]:
ls -lF

total 24
-rwxrwxr-x 1 surchs surchs 142 Jul 14 01:58 [0m[01;32mi_can_see_variables.sh[0m*
-rwxrwxr-x 1 surchs surchs 326 Jul 13 02:08 [01;32mi_make_many_files.sh[0m*
-rwxrwxr-x 1 surchs surchs 626 Jul 13 22:41 [01;32mi_make_the_dir_of_doom.sh[0m*
-rwxrwxr-x 1 surchs surchs  87 Jul 14 01:16 [01;32mrun_me.sh[0m*
-rwxrwxr-x 1 surchs surchs 124 Jul 14 01:29 [01;32mrun_me_too.sh[0m*
-rw-rw-r-- 1 surchs surchs   3 Jul 13 12:54 the_meaning_of_life.txt


In [94]:
./run_me_too.sh

I didn't have execution permission and now I do. How nice.


## How does the shell know where to find commands

Note that when we execute our own shell scripts, we have to specify their path. Just typing the file name does not work:

In [95]:
run_me.sh

run_me.sh: command not found


: 127

In [96]:
./run_me.sh

Thank you, very kind!


But we can just type `cd` or `ls` and the shell knows what command we mean. How does this work?

When you execute a command like `cd` or `ls`, the shell will go and look if it is aware of any programs with that name. `bash` is smart enough to give you suggestions for typos or programs you could install but haven't:

In [97]:
pwwd


Command 'pwwd' not found, did you mean:

  command 'pawd' from deb am-utils (6.2+rc20110530-3.2ubuntu2)
  command 'pwd' from deb coreutils (8.30-3ubuntu2)

Try: sudo apt install <deb name>



: 127

Where does the shell it look for these programs?

### The `$PATH` variable

Your shell has a variable called `$PATH` that contains all of the places where it will look for programs to run. We can use `echo` to take a look inside:

In [98]:
echo $PATH

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin


These are the directories our current shell is looking inside. 

**Note** how several values (here directories) are delineated by the character `:`

So which of these directories is e.g. `ls` inside of? Lucky for us, there is a shell command to tell us. It is called `which`:

In [99]:
which ls

/usr/bin/ls


`which` is a great helper tool to see which command you are currently calling. This can be immensely helpful when you have multiple versions of a tool with the same name in different locations (e.g. different python versions).

We can also call `ls` by using its absolute path. The shell just normally resolves this for us.

In [100]:
/usr/bin/ls

i_can_see_variables.sh	i_make_the_dir_of_doom.sh  run_me_too.sh
i_make_many_files.sh	run_me.sh		   the_meaning_of_life.txt


### You can change the `$PATH` variable

When you start a new shell, the `$PATH` variable gets set by a number of startup files on your system. The system wide startup files are protected and you should (in most cases) not try to change them as this will affect the way your system behaves. There are also user-level startup files in your home directory where you can make changes to the `$PATH` variable (and other variables) that will just affect your shells.

For example, `/home/surchs/.bashrc` is a config file where I can make changes to my `$PATH` variable to have my shell search additional directories for programs.

To take a look, we can use the tool `cat`. Again, let's check what it does.

In [101]:
whatis cat

cat (1)              - concatenate files and print on the standard output


Let's now take a look inside the `.bashrc` file

In [107]:
cat /home/surchs/.bashrc -n | tail

   111	if ! shopt -oq posix; then
   112	  if [ -f /usr/share/bash-completion/bash_completion ]; then
   113	    . /usr/share/bash-completion/bash_completion
   114	  elif [ -f /etc/bash_completion ]; then
   115	    . /etc/bash_completion
   116	  fi
   117	fi
   118	
   119	# User bins
   120	export PATH="$PATH:$HOME/bin"


The statement `export PATH="$PATH:$HOME/bin"` in line 120 adds a directory `bin` in my home directory to the shell $PATH. Notice again the `:` character to separate the new from the old value.

### You can create your own variables

The `$PATH` variable is important, but it is just a normal variable. You can create your own variables.

In [108]:
MY_VAR=10

In [109]:
echo ${MY_VAR}

10


- variable names are case sensitive
- to access the value of a set variable we prepend the `$` character to the variable name
- we use `{` and `}` to delineate the variable name

In [110]:
echo ${my_var}




In [111]:
echo $MY_VARiscool




### Two kinds of shell variables

There are two different kinds of variables in a shell:

- `shell variables` only exist inside your current shell instance. They are not shared with any programs you execute from this shell. By convention we use small caps for shell variables. 
- `environment variables` by contrast are shared with programs you execute in the shell. By convention we use ALL CAPS for environment variables (like `$PATH`).

Any new variable you declare (or set) starts out as a `shell variable`. To "promote" it to an environment variable, you have to `export` it. You can also "demote" an environment variable with `export -n`. You can see all of the environment variables in your shell with `printenv`.

In [112]:
printenv | tail

QT_IM_MODULE=ibus
XDG_RUNTIME_DIR=/run/user/1000
PS1=[PEXP\[\]ECT_PROMPT>
JOURNAL_STREAM=8:124723
XDG_DATA_DIRS=/usr/share/ubuntu:/usr/local/share/:/usr/share/:/var/lib/snapd/desktop
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
GDMSESSION=ubuntu
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
OLDPWD=/home/surchs/shell-course
_=/usr/bin/printenv


### Environment variables get passed to programs

The script `i_can_see_variables.sh` is printing the value of two variables:

`ENV_VAR` and `shell_var`. Let's create them and see how they function differently inside the script

In [113]:
ENV_VAR="important setting"
export ENV_VAR
shell_var="local value"

**Note** how we export the variable itself and not its value (which would require the `$`)

Let's call the script.

In [114]:
./i_can_see_variables.sh

I can see this environment variable: important setting. Very good
I cannot see this shell variable: . How strange


Because environment variables are passed to child processes (e.g. programs) they can change the behaviour of your system. Some tools and installation procedures will ask you to modify environment variables, e.g. by editing the `.bashrc` file in your home directory.

## Summary

- the shell will look for programs in your command in directories defined in the `$PATH` variable
- `$PATH` and other environment variables are set by startup files at the sytem and user level
- you can edit the startup files for your user in your home directory (e.g. `~/.bashrc`)
- more generally, you edit any variable and also create new variables
- to retrieve the value of a variable, we need the `$` character (e.g. `$VAR` vs VAR)
- there are two types of variables: "shell variables" and "environment variables"
    - only environment variables get passed to programs you call from the shell
    - you can turn a "shell variable" into an "environment variable" with `export`
- shell scripts are text files that contain shell commands to be executed in sequence
    - the first line of your script typically declares what shell should run it
    - this statement (e.g. `#!/bin/bash`) is called the shebang
- shell scripts need to have execution permission to be run. You change file permission with `chmod`
- to run a shell script or any command not in the `$PATH`, we specify the path to the command

## Overall Summary

* The bash shell is very powerful!
* It offers a command-line interface to your computer and file system
* It makes it easy to operate on files quickly and efficiently (copying, renaming, etc.)
* Sequences of shell commands can be strung together to quickly and reproducibly make powerful pipelines

Also consider:
* bash and other shells are great for many tasks, particularly when they involve changes to your files and directories
* But bash is not the right tool to create complex pipelines and programs like the ones needed for research analyses
* For these tasks, modern programming languages like Python offer better error handling, control flow, debugging and other features

# References

There are lots of excellent resources online for learning more about bash:

* The GNU Manual is *the* reference for all bash commands: http://www.gnu.org/manual/manual.html
* "Learning the Bash Shell" book: http://shop.oreilly.com/product/9780596009656.do
* An interactive on-line bash shell course: https://www.learnshell.org/
* The reference page of the software carpentry course: https://swcarpentry.github.io/shell-novice/reference.html