# Bash Part 1 - Basics

## What is Bash?

Bash is a computer program known as a 'shell'.  A **shell** is a program that allows the user (that's you) to access the functionality of the operating system.

There are generally two types of shell's.

- GUI (Graphical User Interface)
- TUI (Text-Based User Interface)

You're all probably familiar with a GUI shell - the first way most people learn how to use a computer now.  Clicking folders to open them, launching programs via some sort of icons.  Dragging files to move them.  Etc.

You can do all the same things (and much more) with a text-based interface. Bash is a text-based shell.

### Why use a text-based shell at all?

#### Lots of custom programs you use will require it
    - It's a LOT easier to write a command-line tool than to give it a graphical interface

#### It allows for greater flexibility
    - I can, say, write a short command that moves all files beginning with 2012\_ to a directory called 'backup'

#### Reproducible workflows
    - You can log and save all the actions for documentation/reproduceability 
    
#### Cloud
    - If you want to do work on cloud servers, you'll be configuring things mostly in the shell.

## Where does Bash come from? Where do you find it?

Brief history of Bash:

- **1971**:  Ken Thompson writes the first Unix shell (the Thompson shell) while working at Bell Labs
- **1977**:  Stephen Bourne writes the Bourne shell for Version 7 of Unix (also while working at Bell Labs)
- **1989**:  Brian Fox writes Bash (Bourne-Again SHell) as a free software replacement for the Bourne Shell while working for the Free Software Foundation 

Bash was created 28 years ago!!  Since then, there have been some newer shells that have gained some popularity (Zsh, Csh, and Fish for example), but Bash still reigns supreme as the default text shell for must Unix distributions as well as OSX

# Data for this demonstration

Follow [this link](http://swcarpentry.github.io/shell-novice/data/shell-novice-data.zip) to download a zip archive with some demo data.  These files are provided by Software Carpentry for a similar lesson they teach.

Unzip the archive to create a `data-shell` directory.

## How to open the terminal

### Linux Distro

Varies depending on what graphical shell you're using, though if you're running Linux you probably already know how to get a terminal going.

### OSX

Go to Applications, then the Utilities directory.  Open the Terminal app in the Utilities directory.

### Windows

With Windows, things are a bit complicated.  Windows ships with two shells (MS-DOS and PowerShell), both of which have different syntax than Bash.  However, you can still get Bash on Windows and even if you use Windows primarily, Bash is a much more valuable skill to learn that MS-DOS or Powershell.

#### Windows Subsystem for Linux

Recently Windows has shipped with a built-in way to enable the Bash shell.  This is available for updated versions of Windows 10 and the [instructions are here](https://github.com/diyadas/bash-tutorial/raw/master/Installing%20Windows%20Subsystem%20for%20Linux.pdf)

#### Git Bash

Alternately, if you have an older version of Windows, you can install Bash by installing Git for Windows -- [instructions here](https://github.com/diyadas/bash-tutorial/raw/master/Git%20Bash%20instructions.pdf)

#### Cygwin

Another alternative if you don't have an updated version of Windows 10 is [Cygwin](http://www.cygwin.com/)

# Moving around directories

One of the basic concepts is that your shell is always based somewhere in the directory structure.
An analogy here is if you have only a single window open (e.g. in the Finder on a Mac, Explorer on Windows, or the
directory browser in Linux).


### pwd - Print Working Directory
>*Where am I?* 

Prints the directory in which you are at the current moment. If you create any files, they will appear in this spot. When you first open the terminal shell, you will be in your "home" directory.

```bash
$ pwd

/home/deto
```

### ls - Lists contents of a directory (LiSt)
>*Shows the files and directories inside the current directory*

```bash
$ ls
bash-basics/
data-shell/
some_file.txt
```

#### Command Options/Arguments
Most unix commands like **ls** actually have many different ways of running.  You can control this by specifying different **options** when running the command.  

When specifying an option, you leave a `<space>` after the command, and then each option is entered using a dash (-) and a single letter (or some options will use two dashes (--) and then a word).


**ls** has many options. Here are some of the more useful ones to know:

```bash
$ ls -l
```
lists the long form of the directory entries' security permissions, owners of files, sizes, date created

```bash
$ ls -t
```
Sorts the files and directories by the time they were last modified.

```bash
$ ls -l -t
```
Combines both 'l' and 't' together. Shows long listing, and sorts by modification time.

```bash
$ ls -lt
```
A more compact way to combine 'l' and 't' together.  This generally works for most single-character options

```bash
$ ls -lh
```
lists the long form of the directory entries, but with the sizes in a human-readable format (i.e. MB and GB instead of the number of bytes)

```bash
$ ls -lr
```
reverses the list

```bash
$ ls A_PATH
```
list contents in the directory specified by A_PATH, which can be either relative or absolute.

```bash
$ ls -ltr
```
combine -l, -t, -r options

### cd - Change Directory 
>*Move to a new directory*

Given a path, this command moves your "current location" to the specified directory.
```bash
$ cd data-shell

$ pwd
/home/deto/data-shell
```

To go up a directory (in the tree), use the command cd ..
```bash
$ cd ..
$ pwd
/home/deto
```

Thus far, these have been relative paths (i.e. relative to your current directory), but you can also use an absolute path (which will start with a /):

```bash
$ pwd
/home/deto
$ cd /usr/local/bin
```

A shortcut for your home directory is ~:

```bash
$ cd ~
$ pwd
/home/deto
```

And you can use these as part of a path as well:

```bash
$ cd ~/data-shell
```

An aside on directories...
Directories in UNIX are set up the same way as your regular computer. Just as you would open up a window into your directories and click to open up folders, here you use cd to go through the directories. You are just typing the command instead of clicking.

## Peeking inside files

### less - view contents of a file 
less shows the contents of a file, and allows you to scroll and search the contents. However, less can only be used for simple text files, so you cannot reliably view contents of, say, MS Word documents with less. Fortunately, most of the files we'll be dealing with will be plain text files

So let's see this in action!  Go to the data-shell directory and look at a file.

```bash
$ less notes.txt


- finish experiments
- write thesis
- get post-doc position (pref. with Dr. Horrible)
```

Some useful navigational tips for less:
- You can use the arrow keys to move up or down a line in the text.
- The spacebar will advance an entire page.
- You can search for a word by typing a slash (e.g. /) followed by the search word.
- To quit, type q.
- To see the full help screen, type h.

### head - print first few lines of a file

By default, head prints the top 10 lines of the input file. To print a different number, say 12, lines:
```bash
$ cd data
$ head sunspot.txt

(* Sunspot data collected by Robin McQuinn from *)
(* http://sidc.oma.be/html/sunspot.html         *)

(* Month: 1749 01 *) 58
(* Month: 1749 02 *) 63
(* Month: 1749 03 *) 70
(* Month: 1749 04 *) 56
(* Month: 1749 05 *) 85
(* Month: 1749 06 *) 84
(* Month: 1749 07 *) 95

```

You can change the amount of lines by specifying an option, **-n** when calling **head**.  This is different than the options we saw with **ls** as it also needs a number (to tell it how many lines).

For example, to show the first 5 lines:

```bash
$ head sunspot.txt -n 5
```

There's also a similar command **tail** to look at the last few lines of a file.

```bash
$ tail sunspot.txt
```

### wc - get word/byte/line count for files

Just how big is the *sunspot.txt* file anyways?? To see this, we can run the **wc** command on it.  This is much more informative than just opening it with **less** and scrolling.

```bash
$ wc sunspot.txt

3080 18456 73861 sunspot.txt
```

The output is three numbers, showing the number of lines, words, and characters respectively.  If you just want the number of lines, you can run **wc** with the -l option.

### Aside:  What if I forget all these options??

Googling things is always useful!  But there is an even quicker way - use the **man** command to see the **man**ual for a command.

Try running this:

```bash
$ man wc
```
A text pager (that works the same as **less**) will open with information about the **wc** command: what it does, what options are available, and any other relevant information.

## Making your mark...

### mkdir - Create a given directory (MaKe DIRectory)
>Exactly what it says - let's you create new directories.

```bash
$ mkdir pdb_danger
$ ls
amino-acids.txt   animals.txt   morse.txt   pdb_danger/    salmon.txt
animal-counts/    elements/     pdb/        planets.txt   sunspot.txt
```

Here we can see the new directory we created `pdb_danger`

### cp - copy file or directory] (CoPy) 
>Create a copy of the original file

Make a copy of tnt.pdb in pdb_danger

```bash
$ cp pdb/tnt.pdb pdb_danger/tnt.pdb
$ ls pdb_danger
tnt.pdb
```

We could also leave out the filename if we're copying to a directory and it will just use the original filename

```bash
$ cp pdb/cholesterol.pdb pdb_danger
$ ls pdb_danger
cholesterol.pdb   tnt.pdb
```

You can run **ls** on the original *pdb* directory to verify that the files were copied (and not moved).

Speaking of moving....


### mv - move files or directories(MoVe) 
>Rename a file or directory. Renaming is the same as moving within the same directory.

Example:

```bash
$ mv morse.txt morse_code.txt
$ ls
```
Here we've renamed the file morse.txt to morse_code.txt

If *morse_code.txt* already existed as a file, it would have been overwritten here.  


### rm - delete a file or directory (ReMove)
>Delete a file or directory.

Delete the file somefile.txt

```bash
$ cp animals.txt animals.txt.bak
$ ls
```

Delete animals.txt

```bash
$ rm animals.txt
$ ls
```

Restore the backup

```bash
$ mv animals.txt.bak animals.txt
$ ls
```

To delete an entire directory, use the **-r** option for rm.  The stands for "Recursive".  To copy an entire directory, the **-r** is used with the **cp** command

Copy the entire elements directory into a new directory called "elements_v2"
```bash
$ cp -r elements elements_v2
$ ls elements_v2
```

Delete the new directory you just created
```bash
$ rm -r elements_v2
$ ls
```

### Editing Files

You might prefer to edit files using a GUI text editor if you're more used to that, but you can also edit files directly from the terminal environment.  This might be necessary if you're editing some configuration files on a remote server, for example.  We're not going to go into too much detail on this, but popular CLI editors include Vim, Emacs, and Nano - the last of which is included in most Unix distributions and is fairly straightforward to use.

For example, to edit the contents of *animals.txt*, we can run:

```bash
$ nano animals.txt
```

# File Output

So far the output of many of these commands has just been printed to the terminal.  

Unix tools like **ls** write their outputs to something called **Standard Output** which gets displayed in the terminal.

However, we can also **redirect** the results of a command into a file instead.  This uses the **>** character.

Here's an example redirecting a list of our element files to a new file:

```bash
$ ls elements > my_elements.txt
```

Now run **less** on *my_elements.txt* and see the contents.  The result of calling **ls** was saved to this file.

### echo - send the input to standard out

We can also demonstrate this redirecting using the echo command.  

```bash
$ echo Hello World
Hello World
```

Running the above command just prints Hello World to the terminal, but if we redirect it, we can save the words to a file instead.

```bash
$ echo Hello World > journal.txt
$ less journal.txt
```

"Hello World" is saved in *journal.txt*.  If we run another command and redirect to *journal.txt* with **>**, however, it will overwrite what's already in there.

```bash
$ echo I am learning Bash > journal.txt
$ less journal.txt
```

If we, instead, wanted to just **append** to the file, we could use two chevrons **>>** instead of one.

```bash
$ echo I am learning Bash > journal.txt
$ echo Bash 4eva! >> journal.txt
$ less journal.txt
```


### cat - join two files together (conCATenate) 
>If given just one file, cat will print the contents of the file to the screen. Given multiple files, it will print one after another.

The **cat** command is another popular unix tool.  If you run it with just one file as input, it will print the contents to the terminal.  This can be useful if you just want to see the whole thing and the file doesn't have too many lines.  If you give it multiple files as input, it will send their contents to standard output one by one (displayed in the terminal unless you redirect it). 

```bash
$ cat salmon.txt

coho
coho
steelhead
coho
steelhead
steelhead

$ cat animals.txt
2012-11-05,deer
2012-11-05,rabbit
2012-11-05,raccoon
2012-11-06,rabbit
2012-11-06,deer
2012-11-06,fox
2012-11-07,rabbit
2012-11-07,bear

$ cat animals.txt salmon.txt > animals_and_salmon.txt
$ less animals_and_salmon.txt
```

In the last example, we've con**cat**enated *animals.txt* and *salmon.txt* together and sent the result to a new file *animals_and_salmon.txt*.

# Wildcard Matching

Another useful feature of Bash is wildcard pattern matching in file names - also known as **globbing**.

You can use the **\*** character to represent any number of characters in a file or directory name.

Here's an example for that - say we wanted to create a directory with just the elements (from our elements directory) that begin with the letter P.  We could move them one by one, or we could do this:

```bash
$ mkdir P_elements
$ cp elements/P*.xml P_elements
$ ls P_elements
Pa.xml  Pb.xml  Pd.xml  Pm.xml  Po.xml  Pr.xml  Pt.xml  Pu.xml  P.xml
```

This has grabbed any file in the elements directory that starts with a "P" and ends with a ".xml".

Note that the **\*** character represents 0 or more arbitrary characters.  If you just wanted to match a single arbitrary character, use the **?** character in your globbing expression.

For example, to put all the single-letter elements into a directory:

```bash
$ mkdir single_elements
$ cp elements/?.xml single_elements
$ ls single_elements
B.xml  F.xml  I.xml  N.xml  P.xml  U.xml  W.xml
C.xml  H.xml  K.xml  O.xml  S.xml  V.xml  Y.xml
```

# Running a Python or R script from the command line

All of the commands we've shown so far are actually small programs that we are running.

You can write your own programs to run from the command line in any language you know (Python/R/C/Java/Julia....even Javascript!).  

In this repository there are two programs that show how to do this in Python and in R.

```bash
$ python hello.py David
Hello David! I am a Python program!

$ Rscript hello.R David
Hello David! I am an R program!
```

In this case, I passed the string 'David' in as an input argument.  This example is trivial, but you could imagine writing a script that processes a large data file and which takes its name as an argument.