# Exercise 1 - Shell basics

Work through as much of the Software Carpentry [lesson on the Unix Shell](http://swcarpentry.github.io/shell-novice/) as you can.  Run through the Setup section just below, then open a shell from the command line or with a terminal session through Jupyter to run through the exercises.

After you have completed the first few sections of the tutorial, return to this notebook.

Execute all of the cells, answer all of the questions, and wherever you see "**Edit this cell**", do it!


## 0. Setup - getting required files 

To get started, you'll need to have the required files in your directory.  Use `wget` to get them:

In [None]:
!wget http://swcarpentry.github.io/shell-novice/data/data-shell.zip

The `unzip -l` option shows you what will be unpacked before you actually unpack it.  It's always a good habit to check what you're going to find before you fill your disk with new files.

In [None]:
!unzip -l data-shell.zip

Looks good - now we remove the `-l` so we can `unzip` it for real this time.

In [None]:
!unzip data-shell.zip

*Note*: you only need to do this once per session while using Jupyter.  You can open a terminal now and work through the steps, and return to this notebook a little later, and the files will be available either way.  That's because you're working in the same local directory.

If you're reusing one AWS EC2 instance, your files should remain after you stop and restart.  But if you're using datanotebook.org, you'll have to repeat the above, because it cleans out every session when you're done.

Okay, let's get on with the exercise!

## 1. Markdown basics

In the following cells, practice marking up text using markdown:

Make this a top-level header (add "`# `" to the left of the text).

Make this a second-level header (add "`## `" to the left of the text).

Make the following three words three separate bullet points (add "`* `" to the left of each line):

  One
  Two
  Three

In [None]:
Make this cell a Markdown cell instead of code!

print("Make this cell a code cell instead of Markdown!")

## 2. Navigating Files and Directories

As you work through this section of the tutorial, complete the steps here as well, using the `!` shell escape command.  Execute each cell as you go.

These steps aren't exactly the same as what's in the tutorial, where the file layout is a little different and where they're not using a notebook like we are.  That's okay.  Just consider this practice.

In [None]:
!whoami

In [None]:
!pwd

In [None]:
!ls -F

In [None]:
!ls -F

In [None]:
!ls -F data-shell/

In [None]:
!ls -aF

In [None]:
!ls -af .

What is the difference between the two previous cells, and what does the single dot mean?

**EDIT THIS CELL** WITH YOUR ANSWER HERE

In [None]:
!ls -F ..

What do the double dots mean?

**EDIT THIS CELL** WITH YOUR ANSWER HERE

In [None]:
!ls data-shell/north-pacific-gyre/2012-07-03/

## 3. Working with Files and Directories

The following cells come from the next section of the tutorial.

In [None]:
!ls -F

In [None]:
!mkdir thesis

In [None]:
import os
assert "thesis" in os.listdir()

In [None]:
!ls -F

You can't use the nano editor here in Jupyter, so we'll use the `touch` command to create an empty file instead.

In [None]:
!touch thesis/draft.txt

In [None]:
assert "draft.txt" in os.listdir("thesis")

In [None]:
!ls -F thesis

Removing files and directories.

In [None]:
!rm thesis/draft.txt

In [None]:
assert "draft.txt" not in os.listdir("thesis")

In [None]:
!rm thesis

In [None]:
!rmdir thesis

In [None]:
assert "thesis" not in os.listdir()

In [None]:
!ls

Renaming and copying files.

In [None]:
!touch draft.txt

In [None]:
assert "draft.txt" in os.listdir()

In [None]:
!mv draft.txt quotes.txt

In [None]:
assert "quotes.txt" in os.listdir()
assert "draft.txt" not in os.listdir()

In [None]:
!ls

In [None]:
!cp quotes.txt quotations.txt

In [None]:
assert "quotes.txt" in os.listdir()
assert "quotations.txt" in os.listdir()

## 4. Working with output redirection

Create a new directory:

In [None]:
!mkdir part1

Rename `part1` to `partone` using `mv`.

In [None]:
!mv part1 partone
!ls

Create a file named `filelist.txt` using the output from `ls` and the output redirector `>`.

In [None]:
!ls > filelist.txt

In [None]:
!cat filelist.txt

Append to `filelist.txt` using the output appending redirector `>>`.  Note the difference between the single `>` and double `>>`.

In [None]:
!ls >> filelist.txt
!cat filelist.txt

In [None]:
!ls > filelist.txt
!cat filelist.txt

What's the difference between `>` and `>>`?


**EDIT THIS CELL** WITH YOUR ANSWER HERE

Now create a directory called "`mydirectory`":

In [None]:
# Edit this cell!

In [None]:
assert 'mydirectory' in os.listdir('.')

Using `ls` and output redirection, create a file called `myfiles.txt` in the directory `mydirectory` that contains the list of files in the current directory.

In [None]:
# Edit this cell!

In [None]:
assert 'myfiles.txt' in os.listdir('mydirectory')

Clean up the directory you just created by removing its contents (the file you created) using `rm`.

In [None]:
# Edit this cell!

In [None]:
assert 'myfiles.txt' not in os.listdir('mydirectory')

Now remove the directory itself using `rmdir`.

In [None]:
# Edit this cell!

In [None]:
assert 'mydirectory' not in os.listdir('.')

## 5. Filters and pipes

Let's look at something a little more interesting.  Download the text of Charlotte Bronte's *Jane Eyre* from [Project Gutenberg](http://www.gutenberg.org/):

In [None]:
!wget https://s3.amazonaws.com/2018-dmfa/assignment-1/jane-eyre.txt

`head` and `tail` are very useful.  They let you take a quick peek at the start and end of files.

In [None]:
!head jane-eyre.txt

In [None]:
!tail jane-eyre.txt

`grep` is one of the most useful filters.  It lets you search for and match lines that contain specific expressions.  For example, to find mentions of "copyright":

In [None]:
!grep copyright jane-eyre.txt

Notice anything that those lines have in common?

Let's add a little more information by including the `-n` flag to add matching line numbers.

In [None]:
!grep -n copyright jane-eyre.txt

Now let's look for any mention of "book".  This will match a lot of text, so we'll just take the first 10 matching lines by *piping* the output from `grep` into `head`.

In [None]:
!grep -n book jane-eyre.txt | head

How many lines contain "book"?  We can count by piping into the word count tool `wc`.

In [None]:
!grep book jane-eyre.txt | wc

That's 84 matching lines, containing 1055 words and 5889 characters.  If you just wanted the lines by themselves, use `wc -l`:

In [None]:
!grep book jane-eyre.txt | wc -l

What if we want to match both upper- and lower-case text?  Use `grep -i`:

In [None]:
!grep time jane-eyre.txt | wc -l

In [None]:
!grep -i time jane-eyre.txt | wc -l

How many lines in *Jane Eyre* contain "other" (just lower-case)?  Start by using `grep` to extract lines that match the word "other" in `jane-eyre.txt` and redirecting it to a file called `other-lines.txt`.

In [None]:
!grep other jane-eyre.txt > other-lines.txt

In [None]:
%sc h_other = head -1 other-lines.txt
assert "other" in h_other

In [None]:
%sc t_other = tail -1 other-lines.txt
assert "other" in t_other

Now count up the lines in the file you created using wc.

In [None]:
!wc -l other-lines.txt

Your answer should be 426!

## 6. Counting words with `grep`

By piping commands together we can do a lot of powerful things right at the command line.  Let's create a count of the most commonly occurring words in *Jane Eyre*.  To do that, we could write a Python or R script that just counts words, but with the command line shell tools we only need to put a proper pipeline together and we can often accomplish tasks like this in one line.

First we need to split up the text lines into a word per line.  There are `grep` flags for that!

In [None]:
!cat jane-eyre.txt | tr -sc '[:alpha:]' '[\n*]' | head -10

Now we need to sort them and count the unique tokens.  `sort` solves the first problem.

In [None]:
!cat jane-eyre.txt | tr -sc '[:alpha:]' '[\n*]' | sort | head -10

And `uniq -c` solves the second problem.

In [None]:
!cat jane-eyre.txt | tr -sc '[:alpha:]' '[\n*]' | sort | uniq -c | head -25

But there's a catch... do you see it?

We need to convert all the words down into lower case so that we are correctly counting unique words.  There's another command, `tr`, for that.

In [None]:
!cat jane-eyre.txt | tr -sc '[:alpha:]' '[\n*]' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | head -25

...and if we want to know only the top 10 words in *Jane Eyre*, we need to sort the output.

In [None]:
!cat jane-eyre.txt | tr -sc '[:alpha:]' '[\n*]' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort | head -10

But that sorts by character, not number.  Fortunately, `sort -n` does what we want.

In [None]:
!cat jane-eyre.txt | tr -sc '[:alpha:]' '[\n*]' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -n | head -10

But that's the wrong end of the list!  Two ways to fix that:  (a) use `tail` instead of `head`; (b) use `sort -rn`, which will sort in reverse order.  Let's try the latter.

In [None]:
!cat jane-eyre.txt | tr -sc '[:alpha:]' '[\n*]' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn | head -10

Let's try another text.

Download *Through the Looking Glass* from https://s3.amazonaws.com/2018-dmfa/assignment-1/looking-glass.txt

In [None]:
!wget https://s3.amazonaws.com/2018-dmfa/assignment-1/looking-glass.txt

In [None]:
assert 'looking-glass.txt' in os.listdir('.')

Take a look at the next cell.  Will it find the top 25 unique words in *Through the Looking Glass* successfully?

In [None]:
!cat looking-glass.txt | tr -sc '[:alpha:]' '[\n*]' | sort | uniq -c | head -25

Describe what needs to be done to the previous cell to get it to work correctly.  **Describe it using words**, explaining the issues, rather than using shell commands!

**EDIT THIS CELL** WITH YOUR ANSWER HERE

Okay, now implement your solution using shell commands with a pipeline.

In [None]:
# Edit this cell!