# Welcome to the CCG's Science Coding Course Series,

## Your instructors are: Joe Russack

# Topics

* Welcome to Science Coding Courses!
* Navigating the UNIX shell
* Manipulating file structure using UNIX commands
* Launching Jupyter Notebooks from the UNIX shell
* How the shell relates to coding languages

# Course Schedule

* TBA

# Introduction

## What are you going to learn? 

You are going to learn basic UNIX commands that allow you to manage your computer and run code. Although this course will not cover specific coding languages, we will begin to understand why using the UNIX shell is an essential first-step in making use of all sorts of coding languages. We will also learn to launch and use Jupyter Notebooks, which are useful for learning code interactively.

# Why Learning Python is Important for Jupyter Notebooks

While our focus today is on UNIX commands and launching Jupyter Notebooks, it’s important to understand that effectively using Jupyter Notebooks requires a basic understanding of Python (or another coding language that supports it). Jupyter Notebooks are powerful tools for writing and running Python code interactively. They allow you to execute code in small chunks, visualize data, and document your analysis.

## Learning a little bit of Python will help you:

- Write and Execute Code: Jupyter Notebooks let you write Python code in cells, which can be executed independently. Understanding Python basics will enable you to write meaningful code in these cells should you choose to continue learning Python. 

- Work with Data: Many data science and analysis tasks involve using Python libraries like Pandas, NumPy, and Matplotlib. A basic understanding of Python will help you work with these libraries effectively.

- Create Visualizations: Python libraries used in Jupyter Notebooks allow you to create plots and visualizations of your data. Knowing Python helps you to use these tools to their full potential.

- Develop and Test Algorithms: Jupyter Notebooks are ideal for experimenting with and testing algorithms. Basic Python knowledge allows you to develop and refine your algorithms directly in the notebook.

## Final Installation Check

If you have any trouble, please reach out to the instructor, so we can troubleshoot.

We have written instllation guides for [Mac](https://docs.google.com/document/d/1glyHUGv32laamRcBqru-HiC9NNPDUILulSJZJ-V1hGc/edit?usp=sharing) and [Windows](https://drive.google.com/file/d/1KdAVRtGcnHIlOoIXhGi76rgRFOklgubP/view?usp=sharing) computers. If you're viewing this on your computer, you're done!

Before we get started for the day, let's make sure everyone has everything they need for this course installed on their system.

Open a fresh terminal window on your computer.

Next we'll make sure Jupyter Notebook is installed. To check on Jupyter Notebook type this in the terminal:

```bash
$ jupyter notebook
```

You should see a bunch of lines by text, followed by your browser opening to the Jupyter Notebook home page.  It might take 5-10 seconds to launch, but if it isn't coming up, raise your hand and someone will come by to help you out.

The notebook server will take over your current shell session. To quit, hit CONTROL + C, type 'y' at the exit prompt, and hit enter.

### An opening note about names

Computer folks throw around a lot of linguistic shorthand. A few that you've likely already seen are "linux", "unix", "binary", "path", etc. Without going into the etymology of the language, it's important to know that some of these are synonyms. Here are some notable examples:
* Linux ←→ UNIX. (Linux is effectively a dialect of UNIX)
* Binary ←→ program.
* program ←→ executable.
* shell ←→ command line.
* terminal ←→ shell. (Terminal is the program in which the shell runs)
* bash ←→ shell. (BASH is a kind of shell)
* unix ←→ *nix. (*nix means 'any dialect of unix'. MacOS is a *nix!

## Using the UNIX shell
You will spend nearly all of your time in one of two places: the shell, or the interactive Jupyter notebooks. The shell allows you to move and copy files, run programs, and more. We will focus on the shell today. The UNIX shell environment, which is often called the terminal, is essential to managing and running code.

When you open you terminal it will start a Bash UNIX shell, where Bash is the particular flavour of UNIX shell. Within this shell, you can navigate through your computer's folders, read and create files, and run programs. Many bioinformatics programs run exclusively in the shell. There are also other shells like the Apple specific zshrc.

### Why use the shell?

Bash is a powerful language in its own right, and we will only scratch the surface of it today, but we will understand the basics of navigating the shell and managing your files with it. Today's introduction covers bash commands useful for both this course and future use of complex bioinformatics programs.

A large fraction of what you can do in the shell (also called the "command line") can be done using GUI based operating system that you're used to (Windows, Mac, and some LINUX). While for the simplest of tasks, the command line may seem like a step backwards. But for anything even mildly more complicated (for example, "move every file with 2012 anywhere in its name to the folder Backup"), it can save a lot of time. And then there are the programs that can only be run from the command line, which are much easier to write and more flexible in what they can do.
We always use a shell to run code - for example when launching a Jupyter notebook. Therefore, first on the agenda is a crash course on bash, which will also prove useful when running other largely UNIX based bioinformatics software.

Finally, we have the **Jupyter Notebook** software, which we are currently using to give this lecture. Notebooks are a great place to write code, allowing for executing and testing of small pieces of code at a time. If working with real data, it allows you to make in-line plots of your data as you go. I usually put my code for a single project in a Jupyter Notebook, allowing me to easily trace the raw data to the final figures, ensuring analysis is reproducible.

### Informative Interlude: Formatting of the lessons for this course
For this and all further examples, a $ represents your shell prompt, followed by the commands to type at the prompt. The text below will be used for output you should see when you take the described action.

$ command argument1 argument2

## Moving around different folders

One of the basic concepts is that your shell is always based somewhere in the directory structure.
An analogy here is if you have only a single window open (e.g. the directory browser in Windows/Linux).


### pwd - Print Working Directory
>*Where am I?* 

Prints the directory in which you are at the current moment. If you create any files, they will appear in this spot. When you first open the terminal shell, you will be in your "home" directory.

```bash
$ pwd

/home/jdoe
```

### cd - Change Directory 
>*Move to a new directory*

Given a path, this command moves your "current location" to the specified directory.
```bash
$ cd CodeLife

$ pwd
/home/jdoe/CodeLife
```

To go up, use the command cd ..
```bash
$ cd ..
$ pwd
/home/jdoe
```

Thus far, these have been relative paths (i.e. relative to your current directory), but you can also use an absolute path (which will start with a /):

```bash
$ pwd
/home/jdoe
$ cd /usr/local/bin
```

A shortcut for your home directory is ~:

```bash
$ cd ~
$ pwd
/home/jdoe
```

And you can use these as part of a path as well:

```bash
$ cd ~/CodeLife
```

Another way to get to the home directory is to simply type "cd":

### An aside on directories...
Directories in UNIX are set up the same way as your regular computer. Just as you would open up a window into your directories and click to open up folders, here you use cd to go through the directories. You are just typing the command instead of clicking.

### ls - Lists contents of a directory (LiSt)
>*Shows the files and directories inside the current directory*

```bash
$ cd ~/CodeLife
$ ls
1.1_Unix_and_Jupyter
```

Some bash commands take additional arguments. ls has many of these optional arguments. Here are some of the more useful ones to know:

```bash
$ ls -l
```
lists the long form of the directory entries' security permissions, owners of files, sizes, date created

```bash
$ ls -lh
```
lists the long form of the directory entries, but with the sizes in a human-readable format (i.e. MB and GB instead of the number of bytes)

```bash
$ ls -lt
```
shows long listing, and sorts by modification time

```bash
$ ls -lr
```
reverses the list

```bash
$ ls ..
```
list contents of the directory above

```bash
$ ls A_PATH
```
list contents in the directory specified by A_PATH, which can be either relative or absolute.

```bash
$ ls -ltr
```
combine -l, -t, -r options

# Echo 
## Echo
### Echo

>*Prints its input back out to the shell*

```bash
$ echo 'Hello, World'
Hello, World
```

**echo**, on it's surface, doesn't seem incredibly useful. However, it is a great way to check that your shell is set up correctly

```bash
$ echo $PATH
/home/jdoe/anaconda2/bin:...
```

The PATH variable contains all the folder the shell knows contain useful programs. For those of you running anaconds, you'll see the /home/jdoe/anaconda2/bin in your path. When you type jupyter notebook, bash looks at folders in its PATH to find programs matching those names to run.


## Less - Peeking inside Files

>*displays the contents of a file*

**less** shows the contents of a file, and allows you to scroll and search the contents. A historical note and memory aid - it used to be that there was only **cat**, a command that printed a file to the screen (well, to standard output, but we'll get to that later). That worked well for short files, but for longer files it was nice to stop after a screen full of information, and to prompt for **more**. So, the utility was called, straigtforwardly, **more**. A more sophisticated **more** which allowed for searching and filtering was released so, of course, it's called **less**.

**less** can only be used for simple text files, so you cannot reliably view contents of, say, MS Word documents with **less**. Fortunately, most of the files we will be dealing with will be plain text files. Want to know what a file is? Use the command **file**; the system will give you its best guess.

Note: Often, you can use <tab> to auto-compelte a filename. Try hitting <tab> part way through the filename below.


```bash
$ less Vast_World_of_Coding.txt
```


Some useful navigational tips for less:
- Use the "enter" key to progress one line at a time through the text.
- You can use the arrow keys to move up or down a line in the text.
- The spacebar will advance an entire page.
- You can search for a word by typing a slash (e.g. /) followed by the search word.
- To quit, type q.
- To see the full help screen, type h.

### mkdir - Create a given directory (MaKe DIRectory)
>Exactly what it says - let's you create new directories.

```bash
$ cd 1.1_Unix_and_Jupyter
$ mkdir Notes
$ cd Notes
$ echo 'Hello World' > my_notes.txt
$ ls
my_notes.txt
$ mkdir data
$ ls
data my_notes.txt
```

### cp - copy file or directory] (CoPy) 
>Create a copy of the original file

Make a copy of my_notes.txt called my_notes.txt2

```bash
$ cp my_notes.txt my_notes2.txt
$ ls
data/
my_notes.txt
my_notes2.txt
```
Create a file called project_notes.txt and make a copy of it called backup.txt


```bash
$ echo 'To Do' > project_notes.txt
$ ls
project_notes.txt
data/
my_notes.txt
my_notes2.txt
```

```bash
$ cp project_notes.txt backup.txt
$ ls
backup.txt
project_notes.txt
data/
my_notes.txt
my_notes2.txt
```


### mv - move files or directories(MoVe) 
>Rename a file or directory. Renaming is the same as moving within the same directory.

Example:

```bash
$ mv backup.txt project_notes.txt
$ ls
project_notes.txt
data/
my_notes.txt
my_notes2.txt
```
Here we've renamed the file backup.txt to project_notes.txt

Because project_notes.txt already existed, this overwrites the old file with whatever was in backup.txt

Essentially, we've restored the backup.

### rm - delete a file or directory (ReMove)
>Delete a file or directory.

Delete the file somefile.txt

```bash
rm somefile.txt
```

Delete somedirectory

```bash
rm -r somedirectory
```
\*Note the "-r" (stands for "recursive") is needed to delete directories and all their contents.

## A note about streams

\*nix systems have a notion called "streams". These represent a flow of data from one 'thing' to another. A data source is typically a program - like 'ls' or 'cat'. A destination can be a file, the screen, or both. A "redirect" is just a chevron (">"), and that's used to send a stream to a file. You've already done that - you had a program called "echo" create a stream with the contents "Hello World". Then, you used a redirect to send the contents of that stream to a file called hello.txt. Note that the redirect command will create a new file. If there's already a file with that name, said file will be overwritten. If you'd like to append a stream to the end of a file, use ">>", a.k.a. "append".

Try the "echo" command by itself. If you don't redirect the stream, it defaults to the "default" stream which is - guess what? the screen itself.

## Special characters

### wildcard matching with the *

The star functions as a "wild-card" character that matches any number of characters.

```bash
$ cd ..
$ ls *txt
greetings.txt  Vast_World_of_Coding.txt  wishes.txt
```

The star can go anywhere in a list of arguments you're supplying, even in the middle of words! There are [other wildcards you can use](https://en.wikibooks.org/wiki/A_Quick_Introduction_to_Unix/Wildcards) but * is the most common.


### pipe |
>(the one above the backslash "\" key)

Piping with | connects UNIX commands, allowing the output - or stream - of one command to "flow through the pipe" to another. This lets you chain programs together, such that each one only needs to worry about one step of the process (either generating, filtering, or modifying data), without knowing or caring where it came from or where it's going to.

A common use of the pipe is to pipe the output to less, to allow scrolling through the first bits of output without overloading the screen

```bash
$ ls | less
```

