# Exercises due by EOD 2017.09.14

## goal

In this set of exercises, we will review some of the detailed points of making `ssh` connections to and from servers, as well as review some of the most basic and useful `linux` shell commands

## method of delivery

as mentioned in our first lecture, the method of delivery may change from assignment to assignment. we will include this section in every assignment to provide an overview of how we expect homework results to be submitted, and to provide background notes or explanations for "new" delivery concepts or methods.

there are two things that will need to be delivered and two different ways (in **bold** below) to deliver them

+ exercise 1 and 2
    + the results of this will be three items (a user name, an ip address, and a public key)
    + these must be **emailed** to [rzl5@georgetown.edu](rzl5@georgetown.edu) and [carlos.blancarte@elderresearch.com](carlos.blancarte@elderresearch.com)
    + these items will be used to set up an ssh connection from your local computer / laptop to my `aws ec2` server
+ exercises 3 - 4
    + nothing to submit for these -- you're on the honor system
+ exercise 5A *or* 5B
    + you only need to complete one of these two
    + the result of several steps will be a single shell script file named `gu511_download.sh`
    + in the following exercise 6 you will use the `ssh` connection set up as a result of exercises 1 and 2 to copy the bash script to your home directory on my `ec2` instance
+ exercise 6
    + you will **secure copy (`scp`)** the script you wrote in exercise 5 to my `ec2` server 
    + the final result will be a file `~/gu511_download.sh` in *your* user directory on *my* `ec2` server
    + I will execute that shell script as your user to verify it works as expected

## exercise 1: generating a public and private key pair

create a public and private `ssh` key pair using the [RSA encryption algorithm](https://simple.wikipedia.org/wiki/RSA_%28algorithm%29). 

use the following programs (depending on your operating system)

| os           | software                                                                     |
|--------------|------------------------------------------------------------------------------|
| windows      | [`puttygen`](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html) |
| mac or linux | `ssh-keygen` (part of the `openssh` package)                                 |

while creating those two files, get a version of the public key in `openssh` RSA format for use in an `openssh authorized_keys` file

+ this is the default for public keys created by `ssh-keygen`, so nothing more is needed here
+ this is in the top window of the `puttygen` program on windowssh: <img align="center" src="http://drive.google.com/uc?export=view&id=0ByQ4VmO-MwEEaERhMUpIekNObFk"></img>
    + Either capture it when creating the key pair, or *load* the created key pair with the "load" button and capture it then

a properly formatted `openssh` public key for an `authorized_keys` file will look like

```
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCcPZIwNUzBD4jufWUPBLVzQRkPLRoJqMYgHUTH+7fdCvYGMMx+WiamyncGzcsMZpcSWDbGlCEuo//NTSc2CSS0jdgsDhBHHZ14kHO5A5zThrmNw0v/D9AH/BaE1B8ls++iDE2SmLMEQIAVD4IfmdWfkCwZaQto6hIb4XUXED/Jz8dWzG4opOpfgNMDiYK31y5qhgZQidaSdUNNOxBoCPaemHURp5SwBm+sbTnTQH4oza/FPkd24G3Ruh9TGIoBB5FGu+Qcz1tuGbk+8Iy6oWmWFa+Z+XtTpUbs5XHjptcbI5xXVsPdg360vK+drCWkJEvdIBEzQXwHDif985oX37rT zlamberty@megaman
```

remember: ***NEVER GIVE ANYONE THAT PRIVATE KEY!!!***

## exercise 2: log in to *my* `ec2` instance

send an email to [rzl5@georgetown.edu](mailto:rzl5@georgetown.edu) and [carlos.blancarte@elderresearch.com](carlos.blancarte@elderresearch.com) with the following info:

1. a desired user name
2. the `ip` address from which you want to connect
3. the `openssh`-formatted **public** (not private) key from exercise 1 (see the formatting just above)

we will respond to this email with an email notifying you that you should have access to our `ec2` instance. we will provide you with

+ the server url
+ confirmation of your user name

## exercise 3: walk through the `ssh` demo notebook with a partner

Partner up with some one else in class and walk through the [`ssh_keys` notebook](ssh_keys.ipynb).

Note: there are two ways to run the `ssh_keys.ipynb` notebook. The first and easiest is to open the notebook in Preview mode (*i.e.* just open the link), and then select the "Clone" button on the dashboard at the top. this may prompt you to create a MS Azure account, which we will do later in this course anyway.

The other option is to *download the file locally* and to execute it locally using the command

```bash
cd /path/to/directory/this/notebook/is/in
jupyter notebook
```

this assumes you have `python` version 3 and the `jupyter` and `notebook` packages installed.


*there is nothing to submit here*

## exercise 4: complete the "learn `python` the hard way: command line crash course"

walk through [this short linux tutorial](https://learnpythonthehardway.org/book/appendixa.html) for a second crash course in linux fundamentals

*there is nothing to submit here*

## exercise 5: creating a useful bash script 

both of the following will be graded equivalently, so choose based on your familiarity with linux or desire for a challenge

### exercise 5.A: creating a "useful" bash script (linux beginners)

we're going to write a bash script that will download current weather information at DCA (Reagan National Airport). we'll do this in stages:

1. create a directory to hold our data
2. download the current weather and delay status for DCA (Reagan Washington National airport)
3. print a status message indicating whether or not we were successful to a log file

to create this script, we will move one step at a time; the final script will just be all of the commands put together into one script

#### create a directory

write a command to make a directory `~/data/weather/`, and not to throw an error if that directory already exists

#### download the current weather and delay status for DCA

the FAA (Federal Aviation Administration) has created [a RESTful `xml` and `json` formatted endpoint](http://services.faa.gov/docs/services/airport/) for basic information about airports -- thanks, FFA!

the endpoint of that API is http://services.faa.gov/docs/services/airport/airportCode, and it expects one of two values for the "format" method:

+ `application/xml`
+ `application/json`

let's shoot for DCA's `json` formatted output head to http://services.faa.gov/docs/services/airport/DCA?format=application/xml in your browser.

using a command line tool, download the json results of that API call to a file named 

`~/data/dca.weather.json`

#### print a status message to a log file

let's get the following for a status message:

1. the current time
2. the result of the previous command (the download command) -- just as an error code, nothing more complicated than that

the end result should be a line formatted like

```
YYYY-mm-dd HH:MM:SS    dca_weather.sh    command status code was: [status code here]
```

write a command to save the current time to a variable called `$NOW`.

once you can construct such a line, *append* that line to a log file at `~/data/eversource/download.log`

#### combine all of the above into a bash script

create a file called `gu511_download.sh` to the following format:

```bash
#!/bin/bash

# the following line(s) creates the directory 
# ~/data/weather if needed
FILL THIS IN

# the following line(s) downloads the current weather 
# and delay status for DCA
FILL THIS IN

# the following line(s) write a log message to file 
# indicating status code of previous line 
FILL THIS IN

# exit with the most recent error code -- you can
# leave this line alone
exit $?
```

#### submit this file: see exercise 6 below

### exercise 5.B: create a *useful* bash script (advanced linux users)

we're going to write a bash script that will download an arbitrary number of urls from a text file in a highly parallel way. we'll write this script in stages:

1. create a directory to hold our downloaded data
2. download the current weather and delay status for DCA (Reagan Washington National airport)
3. print a status message indicating whether or not we were successful to a log file

to create this script, we will move one step at a time; the final script will just be all of the commands put together into one script

#### create a test csv

execute the following commands to create a list of test urls for downloading:

```bash
echo www.google.com >> /tmp/test.urls
echo www.georgetown.edu >> /tmp/test.urls
echo www.elderresearch.com >> /tmp/test.urls
echo www.twitter.com >> /tmp/test.urls
echo www.facebook.com >> /tmp/test.urls
```

#### create a directory

write a command to make a directory `~/data/downloads/`, and not to throw an error if that directory already exists

#### write a command to print the contents of `test.csv` of urls to `stdout`

print the contents of `test.csv` to the terminal (for piping to a later function)

#### use `xargs` to pipe the contents of `test.urls` to the `echo` function

soon we will write a function which will take a *single* url and download it. to pass many urls to this script and to create several forks (separate processes which will work in parallel) we will use the `xargs` command.

let's get some practice with the `xargs` command before trying to use it for our download function. in particular, let's look at the following flags:

1. `-P` or `--max-procs`: specify the maximum number of separate processes we should start (default is 1, 0 is interpreted as "maximum number possible")
2. `-n`: in conjunction with `-P`, the number of items passed to each process
3. `-I`: specify which sequence of characters in the command to follow should be replaced with the item passed in by `xargs`. a somewhat common option is `{}` because it is unlikely to be meaningful in any command that follows. that must be escaped, though -- see below

as an example, check out the results of the following:

```bash
cat /tmp/test.urls | xargs -P 100 -n 3 -I{} echo url is \{\}
```

#### `curl` one of those urls

take one of those urls -- say, www.google.com -- and download it to a file. do the following:

1. run it in "silent" mode
2. cap the maximum time the whole download operation should take at 10 seconds
3. write the contents of that download to a file in `~/data/downloads` with a the same name as the final portion (the `basename` of that url)

*hint*: suppose we have the url is a bash variable `$URL`. we could write

```bash
curl [silent flag and maximum download time flag] $URL > ~/data/downloads/$(basename $URL)
```

the `basename` piece is necessary for urls which are more complicated than just `www.xxxxxxxx.com`, such as `www.xxxxxxxx.com/a/longer/path/with?stuff=x&other_stuff=y`

verify that the downloaded contents for one test url match the source on the corresponding webpage

#### export that `curl` statement as a function

you can create a bash function using the syntax

```bash
function my_function_name {
    # do bash stuff
}
```

arguments are passed to this function as bash variables `$1`, `$2`, and so on, such that if you write

```bash
my_function_name arg1 arg2 arg3 arg4
```

these will be "available" within the body of the function as

| variable name | value |
|---------------|-------|
| `$1`          | arg1  |
| `$2`          | arg2  |
| `$3`          | arg3  |
| `$4`          | arg4  |

for example, if we wanted to turn our echo command up above into a super l33t re-usable function, we could write

```bash
function l33t_url_echo {
    echo "the url is $1"
}

# test it out
l33t_url_echo www.google.com
```

we could also make this available in other bash shells be `export`-ing it:

```bash
export -f l33t_url_echo
```

so, let's talk about **what you should actually do**:

1. convert your `curl` statement from before into a bash function that will take a url as a parameter
2. export it for use in other bash sessions

#### use that function with `xargs` on your test urls

for each of the urls filtered by `xargs` we want to run the newly-minted `bash` function with that url as the argument.

for example, if we wanted to use our `l33t_url_echo` function from above, we could write:

```bash
# ...it pays to read ahead...
cat /tmp/test.urls | xargs -P 100 -n 3 -I{} bash -c l33t_url_echo\ \{\}
```

in the above, the actual *command* we are executing with `xargs` is the `bash` command, which

1. starts a new `bash` shell
2. executes the *command* following flag `-c` (that's what the `-c` flag *is*)
3. replaces the occurrence of `\{\}` with whatever url is available
4. special characters such as spaces and braces need to be escaped to be passed in using the `-c` command

write your own version of the command above, replacing `l33t_url_echo` with the function you created previously.

delete all of the items in `~/data/downloads` to start from scratch, and run the whole `cat + xargs + your_function` line. verify it downloads each test url.

#### combine all of the above into a bash script

create a file called `gu511_download.sh` to the following format:

```bash
#!/bin/bash

# the following line(s) creates the directory 
# ~/data/downloads if needed
FILL THIS IN

# the following line(s) define our single-url curl
# download function
FILL THIS IN

# the following line(s) export that function for use
# in other bash session
FILL THIS IN

# the following line is the "cat + xargs + your_function"
# line from the previous step
FILL THIS IN

# exit with the most recent error code -- you can
# leave this line alone
exit $?
```

##### postscript

*if everything went according to plan, this script should be among the fastest download programs I've ever come across (no exageration there). it was useful enough that I put it and some variants on a github repo I own.*

*...it **really** pays to read ahead...*

#### submit this file: see exercise 6 below

## exercise 6: submitting your homework

### tangent about how your `ssh` access was set up

in exercise 1 and 2 you created a public key and sent it to me along with a user name and an ip address.

after receiving them, I will do the following:

```bash
# created a user for you with your suggested user name
# this user cannot log in with password -- only via ssh keys
sudo adduser --disable-password [YOUR USER NAME HERE]

# created a ~/.ssh folder with the expected ownership values
# and permissions
sudo mkdir -p ~[YOUR USER NAME HERE]/.ssh
sudo chown [YOUR USER NAME HERE]:[YOUR USER NAME HERE] ~[YOUR USER NAME HERE]/.ssh
sudo chmod 700 ~[YOUR USER NAME HERE]/.ssh

# create an ~/.ssh/authorized_keys file with the expected
# ownership values and permissions, and your public key
# inside
sudo echo [YOUR PUBLIC KEY] >> ~[YOUR USER NAME HERE]/.ssh/authorized_keys
```

after all of the above, you should be able to log in to my `ec2` instance.

I will respond to you with your user name and my ip address. you should then be able to log in to my `ec2` server with the command

```bash
ssh -i /path/to/your/private/key [YOUR USER NAME HERE]@[MY EC2 IP ADDRESS HERE]
```

### actually doing exercise 6

the point of this exercise is to use `scp` (the SSH copy command) or some secure copy application (e.g. WinSCP or Filezilla) to copy your bash script file to my `ec2` server.

you should copy it into your home directory (`~`, `/home/[YOUR USER NAME HERE]`) and keep the file name as `gu511_download.sh`.

if you are using `scp`, the general structure of the command is

```bash
# copying a *local* file to a *remote* machine
scp -i /path/to/your/private/key [local files to copy] [user name]@[host name or ip]:[path on remote machine]
```

to go in the other direction, just flip the order between the `[local files to copy]` element and the `[user name]@[host name or ip]:[path on remote machine]` element.

so for this particular copy operation:

```bash
scp -i /path/to/your/private/key /path/to/your/gu511_download.sh [your user name here]@[my aws ec2 ip]:~/gu511_download.sh
```

the final evaluation will be me running your script and verifying that the behavior is as expected.