# 1. Getting Started

<hr>
<center>This is part 1 of 5 of an <a href="00-unix-intro.ipynb" target="_blank">introduction to Unix</a>.</center>
<hr>

> **Things covered here:**
> * Some general rules
> * Running commands and general syntax
> * File-system structure and how to navigate

---

## A few foundational rules
* **Spaces are special!** The command line uses spaces to know how to properly break things apart. This is why it's not ideal to have filenames that contain spaces, but rather it's better to use dashes (**`-`**) or underscores (**`_`**) – e.g., "draft-v3.txt" is preferred over "draft v3.txt".  

* The general syntax working at the command line goes like this: `command argument`.  

* Arguments (which can also be referred to as "flags" or "options" or "parameters") can be **optional** or **required** based on the command being used.  


### Note on running cells in a Jupyter notebook

Code blocks like the one in the next section are in what the notebook calls "cells". To run a cell, we first need to select it, then we can either click the "play" icon at the top of the notebook, or we can use shortcut keys. Here are a few:

Run a cell and keep the same cell selected:
- Mac:     `CMD + RETURN`
- Windows: `CTRL + ENTER`  

Run a cell and move to the next cell:
- Mac:     `SHIFT + RETURN`
- Windows: `SHIFT + ENTER`

---

## Getting example files

Running this next code block will download example files and ensure we are all starting from the same place. This will be the <b>only</b> time I would like you to blindly run a code block without us necessarily understanding what it is doing 🙂

So select the following code block, and run it either with shortcut keys as noted above, or use the "play" icon at the top of the document to set us up. After doing so, some file-transfer information will populate below it.

In [None]:
cd ~
curl -L -o unix_intro.tar.gz https://ndownloader.figshare.com/files/15573746
tar -xzf unix_intro.tar.gz && rm unix_intro.tar.gz
cd unix_intro/

<br>
<b><center>Great! Now let's get started!</center></b>

---

## Running commands

`date` is a command that prints out the date and time. This particular command doesn't require any arguments:

In [None]:
date

When we run `date` with no arguments, it uses some default settings, like assuming we want to know the time in our computer's currently set time zone. But we can provide optional arguments to `date`. 

Optional arguments most often require putting a dash in front of them in order for the program to interpret them properly. 

Here, we are adding the `-u` argument to tell it to report UTC time instead of the local time – which will be the same if the computer we're using happens to be set to UTC time, of course 🙂: 

In [None]:
date -u

Note that if we try to run it without the dash, we get an error (ignore the message highlighted in red, we wouldn't normally see that outside of a notebook):

In [None]:
date u

Also note that if we try to enter this without the "space" separteing `date` and the optional argument `-u`, the computer won't know how to break apart the command and we get a different error (again, ignoring the red output):

In [None]:
date-u

Notice that the first error above comes from the program `date`. So the program we wanted to use *is* actually responding to us, but it doesn't seem to know what to do with the letter `u` we gave it. And this is because it wasn't prefixed with a dash, like `-u`. 

Now looking at the second error above, that one comes from `bash`, the language we are working in. `bash` is telling us it can't find a command (or program) called "date-u". And it was looking for that because by missing the space in between the command `date` and the argument `-u`, we weren't telling it how to properly break things apart.

<div class="alert alert-block alert-info">
    <b>Note on error messages</b>
    <br>
    Error messages can often seem obtuse and just confusing (and sometimes they are), but in many cases they can also be helpful. Being able to interpret some of them is certainly an acquired skill, but it's always worth doing our best to try to pay attention to them if we're having trouble with something.
</div>

Unlike `date`, most commands require arguments and won't work without them. `head` is a command that prints the first lines of a file, so it **requires** us to provide the file we want it to act on. Here is printing out the first lines of a file called "example.txt": 

In [None]:
head example.txt

Here "example.txt" is the **required** argument, and in this case it is also what's known as a **positional** (we'll see examples of what's *not* a "positional" arugment in a second). 

Whether things need to be provided as positional arguments or not depends on how the command or program we are using was written. Sometimes we need to specify the input file by putting something in front of it (e.g. some commands will use the `-i` flag, but it's often other things as well).

There are also optional arguments for the `head` command. The default for `head` is to print the first 10 lines of a file. We can change that by specifying the `-n` flag followed by how many lines we want:

In [None]:
head -n 5 example.txt

How would we know we needed the `-n` flag for that? There are a few ways to find out. Many standard Unix commands and other programs will have built-in help menus that we can access by providing `-h` or `--help` as the only argument. I usually try `-h` first:

In [None]:
head -h

That tells us "h" is an invalid option, and the version we are using kindly prints out how to access the help menu, so let's try with `--help`:

In [None]:
head --help

That spit out a lot of information (and `head` is a relatively simple command compared to many others), but somewhere in there we can see "-n, --lines..." (we could have used `--lines 5` instead of `-n 5` to get the same result), but even that can be confusing if we're not use to how this information is presented. 

I usually try an built-in help menu first, because it's usually immediately accessible and might help. But if it's not working out, I very quickly go to our good friend google, which will often have a more easily understood answer for me somehwere. 

What options are available for a certain command, and how to specify them, are parts of this process that are not about memorization at all. We might remember a few flags or specific options if we happen to use them a lot, but searching for options and details when needed is definitely the norm!

<div class="alert alert-block alert-success">
    What we've done so far already really is the framework for how almost all things work at the command line! Multiple commands can be strung together, and some commands can have many options, inputs, and outputs and can grow to be quite long, but this is the general framework that underlies it all.
    <br>
    <br>
    <center><b>Becoming familiar with these baseline rules is important, memorizing particular commands and options is not!</b></center>
</div>

---

## The Unix file-system structure

Computers store file locations in a hierarchical structure. We are typically already used to navigating through this stucture by clicking on various folders (known as directories in the Unix world) in a Windows Explorer window or a Mac Finder window. Just like we need to select the appropriate files in the appropriate locations there (in a Graphical User-Interface, or GUI), we need to do the same when working at a command-line interface. What this means in practice is that each file and directory has its own "address", and that address is called its "**path**". 

Additionally, there are two special locations in all Unix-based systems, so 2 more terms we should become familiar with: the "**root**" location and the current user's "**home**" location. "Root" is where the address system of the computer starts; "home" is where the current user's location starts.

Here is an image of an example file-system structure. Let's take a peek at it. First imagining just "clicking" through folders (directories) in a GUI in order to reach the file we want, "processing_notes.txt". Then we'll talk about it in terms of the "path" we could use to get to the same file at the command line.

<center><a href="https://astrobiomike.github.io/images/file_system_structure.png"><img src="https://astrobiomike.github.io/images/file_system_structure.png" width="80%"></a></center>
<br>

We tell the command line where files and directories are located by providing their address, their "path". If we use the `pwd` command (for **p**rint **w**orking **d**irectory), we can find out what the path is for the directory (folder) we are sitting in:

In [None]:
pwd

Note that is providing the path starting from the special **root** location, because it begins with that leading `/`, which is the special character that denotes the start of the address system.

And we can use the `ls` command (for **l**i**s**t) to see what directories and files are in the current directory we are sitting in:

In [None]:
ls

### Absolute vs relative path

There are two ways to specify the path (address in the computer) of the file we want to find or do something to:

* An **absolute path** is an address that starts from one of those two special locations we mentioned above: either the "root" (specified with `/`) or the "home" location (specified with `~/`). 

* A **relative path** is an address that starts from wherever we are currently sitting.

These can sound a little more confusing at first than they are, so it's best to just look at some examples.

Let's start by looking again at the **`head`** command we ran above:

In [None]:
head example.txt

**What we are actually doing here is using a *relative path* to specify where the "example.txt" file is located.** The command line automatically looks in the current working directory if we don't specify anything else about a file's location. So this works specifically because there is a file called "example.txt" in the current directory we are sitting in where we are running the command.

We can also run the same command on the same file, but specifying the file's location using an **absolute path**:

In [None]:
head ~/unix_intro/example.txt

There we are using the special "home" location, specified by the `~/` at the start, then going into the directory that holds the file, then naming the file. 

The previous two commands both point to the same exact file. But the first way, `head example.txt`, will only work if we are entering it while "sitting" in the directory that holds that file, while the second way will work no matter "where" we happen to be in the computer.

<div class="alert alert-block alert-info">
    <b>Note</b>
    <br>
    The address of a file, it's "path", includes the name of the file also. It doesn't stop at the directory that holds it.
</div>

It is important to always think about *where* we are in the computer when working at the command line. **One of the most common errors/easiest mistakes to make is trying to do something to a file that isn't where we think it is.** 

Let's run `head` on the "example.txt" file again, using a relative path by just providing the name of the file, and then let's try it on another file, "notes.txt":

In [None]:
head example.txt

In [None]:
head notes.txt

Here the `head` command works fine on "example.txt", but we get an error message when we call it on "notes.txt" telling us no such file or directory (ignore the red highlighted line, we wouldn't see that if outside of the notebook). 

If we run the `ls` command to **l**i**s**t the contents of the current working directory, we can see the computer is absolutely right – spoiler alert: it usually is – and there is no file here named "notes.txt":

In [None]:
ls

The `ls` command by default operates on the current working directory if we don't specify any location, but we can tell it to list the contents of a different directory by providing it as a positional argument. Here we are telling it to list the contents inside the "experiment" directory:

In [None]:
ls experiment

We can see the file we were looking for is located in this sub-directory called "experiment". Here is how we can run `head` on "notes.txt" by specifying an accurate **relative path** to that file:

In [None]:
head experiment/notes.txt

If we had been using **tab-completion**, we would not have made that mistake!

#### BONUS ROUND: Tab-completion is our friend!
Tab-completion is a huge time-saver, but even more importantly it is a perpetual sanity-check that helps prevent mistakes. 

If we are trying to specify a file that's in our current working directory, we can begin typing its name and then press the `tab` key to complete it. If there is only one possible way to finish what we've started typing, it will complete it entirely for us. If there is more than one possible way to finish what we've started typing, it will complete as far as it can, and then hitting `tab` twice quickly will show all the possible options. **If tab-complete does not do either of those things, then we are either confused about where we are, or we're confused about where the file is that we're trying to do something to** – this is invaluable.

<div class="alert alert-block alert-warning">
    <b>A note on tab-completion in a notebook</b>
    <br>
    Currently, tab-completion doesn't behave within the notebook as it does in the real Unix world. So we are going to hop out into a terminal to look at it, and don't worry if it's not as helpful in here. 
</div>

<center><b>Use tab-completion whenever you can!!</b></center>
<br>

---

### Moving around
We can also move into the directory containing the file we want to work with by using the `cd` command (**c**hange **d**irectory). This command takes a positional argument that is the path (address) of the directory we want to change into. This can be a relative path or an absolute path. 

Here we'll use the relative path of the subdirectory, "experiment", to change into it:

In [None]:
cd experiment

Now let's use `pwd` (**p**rint **w**orking **d**irectory) to see where we are:

In [None]:
pwd

And `ls` (for **l**i**s**t) to see what is in this directory:

In [None]:
ls

And here we can see the "notes.txt" file is here, so we should be able to run `head` on it just by providing it's name as the positinal argument with no error this time:

In [None]:
head notes.txt

Great. But now how do we get back "up" to the directory above us? One way would be to provide an absolute path to the `cd` (**c**hange **d**irectory) command, like `cd ~/unix_intro`, but there is also a handy shortcut. `..` are special characters that act as a relative path specifying "up" one level – one directory/folder – from wherever we currently are. So we can provide that as the positional argument to `cd` to get back to where we started:

In [None]:
cd ..

And we can check out where we are and what is here again with `pwd` and `ls` (having two commands like this one line after another will just run them one after the other and print the output as it goes):

In [None]:
pwd
ls

Moving around the computer like this may feel a bit cumbersome at first, but after spending a little time with it and getting used to tab-completion you'll soon find yourself slightly frustrated when you have to scroll through a bunch of files and find something by eye in a GUI 🙂

---

## Summary
While maybe not all that exciting, these things really are the foundation needed to start utilizing the command line – which then gives us the capability to use lots of tools that only work at a command line, manipulate large files rapidly, access and work with remote computers, and more! Next we're going to look at some of the ways to work with files and directories in [02-unix-intro.ipynb](02-unix-intro.ipynb).

**Terms introduced:**

| Term     | What it is          |
|:----------:|------------------|
| `path` | the address system the computer uses to keep track of files and directories |
| `root` | where the address system of the computer starts, `/` |
| `home` | where the current user's location starts, `~/`|
| `absolute path` | an address that starts from a specified location, i.e. root, or home |
| `relative path` | an address that starts from wherever we are |
| `tab-completion` | our best friend – though maybe not helpful in a notebook yet 😕|


<br>

**Commands introduced:**

|Command     |Function          |
|:----------:|------------------|
|`date`| prints out information about the current date and time |
|`head`| prints out the first lines of a file |
|`pwd` |prints out where we are in the computer (print working directory)|
|`ls`  |lists contents of a directory (list)|
|`cd`| change directories |

<br>

**Special characters introduced:**

|Characters     | Meaning          |
|:----------:|------------------|
| `/` | the computer's root location |
| `~/` | the user's home location |
| `../` |specifies a directory one level "above" the current working directory|

<br>

<br>

---
---

<a href="00-unix-intro.ipynb"><b>Previous:</b> Unix intro home</a>

<div align="right"><a href=”02-unix-intro.ipynb”><b>Next:</b> 2. Working with files and directories</a></div>

---

<div class="alert alert-block alert-info" align="center">
<font size="-1">This is a notebook implementation of the <a href="https://astrobiomike.github.io/unix/unix-intro" target="_blank">Unix introduction</a> from <a href="https://astrobiomike.github.io" target="_blank">Happy Belly Bioinformatics.</a></font>
</div>
    
---