# Answers for Exercises due by EOD 2017.10.12

# exercise 1: create a static webpage

let's make a webpage!


### get an `html` file

we're going to need some `html` (hypertext markup language) for our webpage. If you have a page you really want to show off to the world, feel free to use it -- otherwise, feel free to use [our example](https://s3.amazonaws.com/shared.rzl.gu511.com/index.html). edit it. go wild.


### create a static webpage bucket

you may remember, but when we were creating buckets in class we mentioned that it was possible to configure a bucket such that it could be used for "static website hosting". create a new bucket (name it whatever you want), and after creating it, configure it to host a static website.

upload the `html` file from the previous step, and point to it as the "Index document". We'll add an error document in step 6 below, but feel free to do that now if you already know what you want to do with that.

leave the rest of the configurations as-is. grab the "endpoint" from the configuration window


### try it out

navigate to that endpoint you were given while configuring the bucket to be a static webpage. what do you see?


### read the gosh darn manual

without advanced preparation (and kudos to you if you did it already!) the default behavior will be to return to you a `403 FORBIDDEN` error. For my bucket, for example:

```
403 Forbidden

Code: AccessDenied
Message: Access Denied
RequestId: A7BA5343504C695B
HostId: 7KtvPPnjmQAk2Ry4CeYn58+I1IL1+W+tV633d2/SX5c6XmIFqvewLMTUGwKxrgaY33tzlOF0jek=
```


First, read [what a 403 error is](https://en.wikipedia.org/wiki/HTTP_403) (or [any `html` code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes), for that matter). 

After this, read [the static website hosting documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/HowDoIWebsiteConfiguration.html) for details on how to configure permissions to allow users to access this site.

*Note*: the documentation tells you how to open *an entire bucket*, so keep in mind this will make all the items in the configured bucket public. It is possible to make a single file public from the file summary page, fwiw.


### fix permissions

using the information from the documentation, make sure that the endpoint you tried above (and possibly received a 403 for) is now publicly accessible


### add an error page

copy the `html` file you are using as your index page to a new file called `error.html`, and edit that new file to contain an error message. This might be as simple as replacing the header (the `<h1>` element) and first paragraph (the `<p>` element) so that they contain warnings that a "page is missing" or "this url was an error". If you have your own `error.html` file, feel free to use that instead.

upload that new `error.html` file, and then go back to the static webpage configuration for your bucket (where we *didn't* enter an `error.html` file before), and add the newly uploaded file. 


### verify that missing pages redirect to `error.html`

verify that a url that doesn't exist takes you to that `error.html` page. Take the ip address from before (for example, mine is: http://wp.rzl.gu511.com.s3-website-us-east-1.amazonaws.com/) and add a meaningless url path to the end of that. Again, for example: http://wp.rzl.gu511.com.s3-website-us-east-1.amazonaws.com/pagedontexistyo.php

verify that the page that is displayed is your error page.


### deliverables:

send us the url of your static webpage. We will visit

+ the path itself, and
+ a path that doesn't exist

to verify that both the index and error pages are available.

## answer 1

first and foremost, just go here: https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html

but in a pinch, the following sequence should be sufficient

1. get a webpage
    1. if you don't want to make your own, run `wget https://s3.amazonaws.com/shared.rzl.gu511.com/index.html`
2. go to the `aws s3` web console
3. create bucket
    1. name and region
        1. name is whatever (e.g. `testwp.rzl.gu511.com`)
    2. leave everything else as is
4. navigate to the new bucket (e.g. `testwp.rzl.gu511.com`)
5. upload the `index.html` file from step 1
    1. click the "upload" button
    2. change the "Manage public permissions" dropdown to "Grant public read access to this object"
    3. submit everything else as is
6. in the bucket main page, click the "Properties" tab
7. click the "Static website hosting" card
8. click the "Use this bucket to host a website" button
    1. copy the endpoint url at the top of the pop-up card (e.g. http://testwp.rzl.gu511.com.s3-website-us-east-1.amazonaws.com/)
    2. type in "index.html" for the "Index document" field
    3. type in "error.html" for the "Error document" field
    4. click "Save"
9. edit `index.html` to indicate the page loaded was in error and save as `error.html`
10. upload `error.html` to the bucket and save it with that name

# exercise 2: a really sad alarm clock


### getting familiar with the script

download [this `python` file](https://s3.amazonaws.com/shared.rzl.gu511.com/alarm_clock.py) and review it to figure out what it does.

the elements below the `command line` comment block implement a command line interface (`cli`) for this python script. check out the `cli` options by trying out

```bash
python alarm_clock.py --help
```

*note*: `boto3` is a prerequisite, so you have to execute the above command in an environment where `boto3` is installed.


### create an alarm clock bucket

create a *new* `s3` bucket (i.e. don't use your homework submission `s3` bucket) and make it visible to the public. we (Carlos and I) will be looking at the contents of this bucket for other assignments, so it's important that anyone can access it.

going forward, I will refer to this as "the alarm clock bucket". email us the `url` for this bucket!


### post a message

use the `alarm_clock.py` file to post a message to that new bucket you created in the previous step.


### send us some proof!

write the exact `bash` command you called to run the `alarm_clock.py` script to a file called `what_i_ran.sh` and upload it to the public homework submission `s3` bucket you shared with us in the last homework assignment.

also, get the link to the text file this command created on `s3`, and include that url in your submission email to us. note: verify that the `url` works for other users, or for you when not signed in.

## answer 2

the contents of `what_i_ran.sh` should look basically like the following:

```bash
python alarm_clock.py -m "WAKE UP YA GOOF" -b MY.BUCKET.NAME
```

# exercise 3: turn a `python` script into a `cron` job on the `ec2` box

the above `bash` command is something that could be put into a [`crontab`](http://www.adminschoice.com/crontab-quick-reference) entry on a linux machine, and run with some frequency by [`cron`](https://en.wikipedia.org/wiki/Cron). let's do that!


### breaking it down

as we said in lectures, `cron` is a linux command scheduling service. it is constantly running, and every minute it checks to see if there are commands which must be executed at that time. users can request that commands be executed with some fixed frequency (up to minute resolution).

the way that users add elements to "the schedule" is by editing a file called a `crontab`. the name `crontab` is short of `cron` table, because the items are expected to be entered in a sort of tabular format (more on that below).

users can use the linux command `crontab` to interact with these files -- try `man crontab` and `crontab -h` on your `ec2` server to see the available options, or jump right in 

a `cron` entry consists of an execution frequency statement and a command to execute


#### execution frequency statement

an entry in a `crontab` file will start with a 5-element sequence that dictates how frequently a command should be executed. you can see several examples on the [`cron` wikipedia page](https://en.wikipedia.org/wiki/Cron), and check out [this cool webapp](https://crontab.guru/) to see what options are available and what a given statement means in "human" terms.

as an example, we could choose to run a job every day at midnight using the frequency statement:

```
0 0 * * *
```

this is at

1. the 0 minute
2. of the 0 hour
3. on every calendar day
4. on every month
5. and on every day of the week

one thing to be aware of: our `ubuntu` computers are using `UTC` for time -- type the `date` command to see what time your `ec2` server thinks it is.

**develop a `crontab` frequency expression to execute our command: every day at 6:00 AM EST**

don't worry about daylight savings time or any of that right now -- just make sure that the timestamp on the file the next time it runs (i.e. any day within the next month) is 6:00 AM EST.


#### command statement

`cron` is effectively just executing shell statements for you, so anything you can type in a `bash` shell is fair game -- including the `python alarm_clock.py` statement from the previous exercise.


##### `cron` and your `PATH` variable

there is an important difference between how you normally execute `bash` commands and how `cron` will do that: whenever you *manually* start up a shell session via `ssh`, your `bash` shell will `source` the contents of your `.bashrc` file.

`cron` **will not do this**.

this is important for a lot of reasons, but the most important right now is that `source`-ing your `.bashrc` will populate your `PATH` variable. We've discussed this several times in class, but it bears repeating: `conda` and `conda` environments depend critically on *changing* this `PATH` variable so that when you type `python` the shell session *resolves* that command to be the particular `python` executable you want to use.

when you executed the `python alarm_clock.py` command in the previous exercise, `python` resolved to something depending on your `PATH`. after you've successfully run that command, 

1. execute `which python` to see which `python` executable you used
2. execute `echo $PATH` to see what your `PATH` variable is 

depending on actions you took (e.g. `source activate myenv`), your `PATH` variable would change. on your `ec2` server,

+ if no changes have been made to your `PATH` at all, `python` will resolve to nothing and `python3` will resolve to `/usr/bin/python3`
+ if your `PATH` was updated by `conda` in your `.bashrc` to be `PATH=/home/ubuntu/miniconda3/bin:[a bunch of other paths]` (this is how we configured it in class), `python` will resolve to `/home/ubuntu/miniconda3/bin/python`
+ if you had run `source activate myenv`, this would have updated your `PATH` to be `PATH=/home/ubunut/miniconda3/envs/myenv/bin:[a bunch of other paths]`, `python` will resolve to `/home/ubuntu/miniconda3/envs/myenv/bin/python`

to summarize:

| previous action taken          | changes to `PATH` variable                          | `python` resolves to...                                                       |
|--------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------|
| default                        | no change                                           | nothing (`python3 --> /usr/bin/python`)                               |
| `conda` statement in `.bashrc` | `PATH=/home/ubuntu/miniconda3/bin:$PATH`            | `/home/ubuntu/miniconda3/bin/python`            |
| `source activate myenv`        | `PATH=/home/ubuntu/miniconda3/envs/myenv/bin:$PATH` | `/home/ubuntu/miniconda3/envs/myenv/bin/python` |


###### why does this matter?

`cron` will not do anything special with your `PATH`. if you want a special `PATH` for a special `python` (hint hint hint: you do), you can add a line anywhere in the `crontab` file before your commands:

```bash
PATH=/the/path/i/want/to/add:$PATH
```

so to use the base `conda` installation, you would have to add:

```bash
PATH=/home/ubuntu/miniconda3/bin:$PATH
```

and to use the a `conda` environment `myenv`, you would have to add:

```bash
PATH=/home/ubuntu/miniconda3/envs/myenv/bin:$PATH
```

(the actual, exact path you want to use here is the directory your `python` executable was in when you ran `which python` after a successful run, and which was in the `PATH` you got from the `echo $PATH` command you ran at that time as well)


### developing and testing

at first, it is helpful to have the command run every minute, and to write all output to a file

```bash
PATH=/my/special/path:$PATH

* * * * *  mycommand.sh >> /home/ubuntu/mycommand.log
```


### deliverables

1. upload a plain text file named `alarm_clock.crontab` to your `s3` homework submission bucket with two lines in it:
    1. the `PATH` statement you wrote
    2. the one-line `crontab` entry to run `alarm_clock.py` every day at 6:00 AM EST
2. we will look for at least one file successfully created in the public "alarm clock" bucket (from the previous exercise) at 6:00 AM EST
    1. because this bucket is public, we will be able to see it ourselves -- no need to send us anything

# answer 3

assuming that the file `alarm_clock.py` is in my home directory `/home/ubuntu`, the two lines in `alarm_clock.crontab` can be

```bash
PATH=/home/ubuntu/miniconda3/bin:$path
* 10 * * * python /home/ubuntu/alarm_clock.py -m "WAKE UP YA GOOF" -b MY.BUCKET.NAME
```

if you explicitly want to use the bash script from exercise 2 and it's saved as `/home/ubuntu/what_i_ran.sh`, that's also possible:

```bash
PATH=/home/ubuntu/miniconda3/bin:$path
* 10 * * * bash /home/ubuntu/what_i_ran.sh
```

personally, I prefer the former option, as it is more explicit

# exercise 4: `xpath` and `css` selectors in a controlled environment

take the following `html` document (also [available via `s3`](https://s3.amazonaws.com/shared.rzl.gu511.com/example.html) if you want to use chrome or firefox Inspect mode):

```html
<html>
    <head></head>
    <body>
        <div id="tablediv">
            <table id="important_table" class="very_pretty">
                <thead>
                    <tr>
                        <th>column a</th>
                        <th>column b</th>
                        <th>column c</th>
                    </tr>
                </thead>
                <tbody>
                    <tr class="oddrow">
                        <td>1</td>
                        <td>4</td>
                        <td>5</td>
                    </tr>
                    <tr class="evenrow">
                        <td>0</td>
                        <td>2</td>
                        <td>4</td>
                    </tr>
                </tbody>
            </table>
            <ul>
                <li>just to be tricky</li>
            </ul>
        </div>
        <div>
            <ul class="very_pretty">
                <li>hello</li>
                <li class="active">world</li>
            </ul>
            <ol class="kinda_ugly">
                <li>howya</li>
                <li class="inactive">doin</li>
            </ol>
        </div>
    </body>
</html>
```

in the following, there are no trick questions. there will always be at least one element selected in 1 and 2, and at least one valid path in 3 and 4. also, remember that you can enter these `xpath` and `css selector` expressions directly in the developer tools (highlight the html elements window and press `Ctrl + F` or `Command + F`) to see the number of matches and to cycle through them


### 4.1

for each of the below `xpath` expressions, idenfity the number of elements matched by that expression:

1. `/html/body/div/ul`
2. `/html/body/div/ul/li`
3. `/html/body/div/*/li`
4. `/html/body/div/*/li[@class]`
5. `/html/body/div/*/li[@class="active"]`


### 4. 2

for each of the below `css` selectors, identify the number of elements matched by that expression:

1. `tr`  (*note: text search in developer tools might be tricky here, since "tr" is a common string...*)
2. `tr.evenrow`
3. `#important_table`
4. `.very_pretty`
5. `div > ul`


### 4. 3

for each of the below, develop the appropriate `xpath` expression

1. use an *absolute* path to select *only* the element `<li class="active">world</li>`
2. use a non-absolute path to select *only* that same element `<li class="active">world</li>` which uses the `class` attribute
3. select all `td` elements
4. select all `td` element in a row with `class="evenrow"`
5. select the `<table id="important_table" class="very_pretty">` element using its `class` attribute
6. select the `<table id="important_table" class="very_pretty">` element using its `id` attribute


### 4. 4

now for each of the below, develop the appropriate `css` selector

1. use a *direct descendant* selector to select *only* the element `<ul class="very_pretty">`
2. use an *any descendant* selector to *only* select the four `<li>` elements in the *second* `div` block
3. select all `td` elements
4. select all `td` element in a row with `class="evenrow"`
5. select the `<table id="important_table" class="very_pretty">` element using its `class` attribute
6. select the `<table id="important_table" class="very_pretty">` element using its `id` attribute


## deliverable

fill in the following table, save it as a `csv` with name `xpath_and_css.csv`, and upload that `csv` to your `s3` homework submission bucket

| exercise | answer |
|----------|--------|
| 4.1.1    |        |
| 4.1.2    |        |
| 4.1.3    |        |
| 4.1.4    |        |
| 4.1.5    |        |
| 4.2.1    |        |
| 4.2.2    |        |
| 4.2.3    |        |
| 4.2.4    |        |
| 4.2.5    |        |
| 4.3.1    |        |
| 4.3.2    |        |
| 4.3.3    |        |
| 4.3.4    |        |
| 4.3.5    |        |
| 4.3.6    |        |
| 4.4.1    |        |
| 4.4.2    |        |
| 4.4.3    |        |
| 4.4.4    |        |
| 4.4.5    |        |
| 4.4.6    |        |

## answer 4

| exercise | answer                                  |
|----------|-----------------------------------------|
| 4.1.1    | 2                                       |
| 4.1.2    | 3                                       |
| 4.1.3    | 5                                       |
| 4.1.4    | 2                                       |
| 4.1.5    | 1                                       |
| 4.2.1    | 3                                       |
| 4.2.2    | 1                                       |
| 4.2.3    | 1                                       |
| 4.2.4    | 2                                       |
| 4.2.5    | 2                                       |
| 4.3.1    | `/html/body/div/ul/li[@class="active"]` |
| 4.3.2    | `//li[@class="active"]`                 |
| 4.3.3    | `//td`                                  |
| 4.3.4    | `//tr[@class="evenrow"]/td`             |
| 4.3.5    | `//table[@class="very_pretty"]`         |
| 4.3.6    | `//table[@id="important_table"]`        |
| 4.4.1    | `div > ul.very_pretty`                  |
| 4.4.2    | `*.`                                    |
| 4.4.3    | this was a typo. free points!           |
| 4.4.4    | `tr.evenrow > td`                       |
| 4.4.5    | `table.very_pretty`                     |
| 4.4.6    | `table#important_table`                 |