# Exercises due by EOD 2017.10.19

## goal

in this homework assignment we will focus on working with web requests, as well as a brief diversion into encryption (useful for password management and secure transfer of sensitive materials over the internet) and the `oauth` authentication framework.

## method of delivery

as mentioned in our first lecture, the method of delivery may change from assignment to assignment. we will include this section in every assignment to provide an overview of how we expect homework results to be submitted, and to provide background notes or explanations for "new" delivery concepts or methods.

this week you will be submitting the results of your homework via upload to the `s3` bucket you created in a previous week's assignment.

summary:

| exercise | deliverable                               | method of delivery     |
|----------|-------------------------------------------|------------------------|
| 1        | a form submission                         | fill in the form       |
| 2        | a python script (`clientside.py`)         | upload to s3 hw bucket |
| 3        | a csv file (`xpath_css_in_the_wild.csv`)  | upload to s3 hw bucket |
| 4        | a script (`POST_the_gist.{extension}`)    | upload to s3 hw bucket |
| 5        | a python script (`goodtweet.py`)          | upload to s3 hw bucket |

# exercise 1: fill in midterm date availability poll

go here, log in with your georgetown.edu google account, and fill in the poll:

https://goo.gl/forms/rv3pnVoskIDwmUW33

# exercise 2: setting up `kms` encryption

in the `s3` lecture, I mentioned that `s3` supports *server-side* encryption as a simple check box (that is, whenever a file is received by `s3` it will encrypt it (jumble the contents so they are unreadable to a human) with a secret key, and it will decrypt that file (reverse that jumbling) whenver someone who is approved (e.g. *you*) requests that file. I also mentioned that *client-side* encryption -- where you as a user jumble the contents before you even send them to `s3`, and decrypt the jumbled contents when they're sent back to you -- is another option, but it requires extra effort, and I didn't elaborate on that extra effort.

then, in the web scraping lectures, I mentioned that it might be possible to store an *encrypted* (jumbled) version of a password in plain text on your local machine and that if you new how to *decrypt* (un-jumble) that encrypted version that this would be more secure than saving the regular password in plain text. but I didn't discuss how you might do that at all.

let's walk through how *client-side encryption* can be done relatively easily using the `aws kms` service and the `python boto3` library.

the end result here will be a pair of `python` functions which can encrypt and upload a secret message, and then download and decrypt that same message.


## create a kms key

go to the `aws iam` web console page. the left-hand menu has, as one of its options, ["Encryption Keys"](https://console.aws.amazon.com/iam/home#/encryptionKeys/us-east-1). Navigate to that place, and create a new key.

+ for the administrator, select your personal `iam` user.
+ for the usage permissions, make sure to add the role you are using for your `aws cli` use on your `ec2` server.
    + we added this in the lecture on `aws cli`, but you can see it in the "description" window for your `ec2` server with a name "IAM role", or you can right click the `ec2` instance, select "Instance Settings > Attach/Replace IAM Role"


## encrypt a message

let's assume that in the previous step your named your key `mykey` (replace all occurrences of `mykey` below with whatever you actually used for your key alias name).

encrypting a message is simple with the `kms` client's `encrypt` method:

```python
import boto3

session = boto3.session.Session(region_name='us-east-1')

message = b'evs'
keyalias = 'mykey'

# note: the kms service does not have a *resource* object yet, so we use a client
kms = session.client('kms')

response = kms.encrypt(
    KeyId='alias/{}'.format(keyalias),
    Plaintext=message,
)

encryptedmessage = response['CiphertextBlob']
print(encryptedmessage)
```

if at any point you run into permissions issues, please resolve those issues using the `iam` service.


## write encrypted message to an `s3` file

using the process we demoed in a mini-exercise in the `s3` lecture where we upload a *string* (not a local text file) into a file on `s3`, create a file on `s3` with the encrypted message as its contents.


## download that file from `s3`

using the process we demoed in a mini-exercise in the `s3` lecture, download the file you just posted to `s3` into a string object (*i.e.* don't download to file).


## decrypt the message inside the downloaded file

the `kms` client object you created above has a `decrypt` method function that take an encrypted message and cycles through your encryption keys until one of them successfully decrypts the string.

apply that `kms.decrypt` function to the encrypted message you just downloaded

```python
import boto3

session = boto3.session.Session(region_name='us-east-1')

kms = session.client('kms')

response = kms.decrypt(CiphertextBlob=encryptedmessage)
decryptedmessage = response['Plaintext']
print(decryptedmessage)
```

## put it all together

use the code you generated above to fill in the details of the `python` script `clientside.py`, available on my shared `s3` bucket here:

https://s3.amazonaws.com/shared.rzl.gu511.com/clientside.py

fill in the regions marked by comment boxes:

```python
# ---------------- #
# FILL THIS IN !!! #
# ---------------- #
```

save your filled-in version as a file named `clientside.py` on your public homework submission `s3` bucket.


## epilogue: application to passwords

*you don't have to do anything in the following: this is just an explanation of how you can use the above to work with passwords*

in the "encrypt a message" section above, suppose the "message" you wanted to encrypt was a plain-text password you entered manually as

```python
import getpass

message = getpass.getpass(prompt="Your Password: ")
```

you could now easily take that plain-text password and encrypt it. you could then quite easily write that encrypted password to a file anywhere on your computer -- say, `~/.secrets/mycredentials.json`.

then, when you want to *use* that password, you could do the following with a `kms` client created the same way as above:

```python
encryptedPw = read_pw_from_file("/home/ubuntu/.secrets/mycredentials.json")
plaintextPw = kms.decrypt(CiphertextBlob=encryptedPw)['Plaintext']
```

not too much effort for a little extra security.

# exercise 3: `xpath` and `css` selectors in the wild

let's construct several `xpath` expressions and `css` selectors to isolate elements on [the hacker news homepage](https://news.ycombinator.com/). 

fill in the `xpath` and `css` selector columns of the table below. [this diagram](https://drive.google.com/file/d/0ByQ4VmO-MwEEd3hZN0xNSHV2WE0/view?usp=sharing) highlights the four elements of each news article entry that we are looking to describe by enumerating and highlighting the exact elements

| number | color  | description      | example                   | `xpath` | `css` selector |
|--------|--------|------------------|---------------------------|---------|----------------|
| 1      | red    | article title    | "If macOS High Sierra..." |         |                |
| 2      | blue   | article source   | "apple.com"               |         |                |
| 3      | orange | number of points | "167 points"              |         |                |
| 4      | green  | age of the post  | "55 minutes ago"          |         |                |

put the contents of the above into a `csv` file called `xpath_css_in_the_wild.csv` and upload it to your public `s3` homework submission bucket.

# exercise 4: `POST` a `github gist`

using your *personal* `github` account, create a `gist` named `GET_the_gist.txt` with whatever contents you want inside of it. make sure this `gist` is associated to your user account, not an anonymous user.

place whatever commands you used to create that gist (a `curl` statement, `python requests` statements, `R httr` statements) into a file called `POST_the_gist.{sh,py,r,etc}`, where the extension is chosen based on the language of the commands you used.

upload that `POST_the_gist.{extension}` file to your homework submission `s3` bucket from last week's assignment

# exercise 5: `twitter` ~~bot~~ self-esteem engine

## 5.0: setting up

I've decided to put some of my thoughts out into the twitter ether. [this is a particularly good one, actually](https://twitter.com/rzl_gu511/status/916025777128427520).

we all recognize it's good, so let's all go ahead and give it a big ol' like, maybe a retweet.


### creating an account

any which way we do this, we'll need a twitter account. **get a twitter account**. create a new one if you'd like, or re-use an existing one -- both are fine.

in order to sign up for accounts below, you *have to associate your account with [a valid email address](https://twitter.com/settings/account) and [mobile phone](https://twitter.com/settings/add_phone)*. come back onto the grid for a little bit.


### reading the ToS

read [twitter's automation rules](https://support.twitter.com/articles/76915). we like to goof, but there are rules and you should obey them!


### installing needed `python` packages

we will do some funky authentication stuff below, so before you dive in too deep, install the `requests_oauthlib` library. **note**: this is not available via `conda`, so we *must* `pip install` it

```bash
pip install requests_oauthlib
```

## 5.1: exploring the endpoint

twitter has [a very robust `api`](https://developer.twitter.com/en/docs) and you should take a look at it! via the `api`, you can do pretty much everything you could do as a real person.

in particular, to "like" a tweet (or "favorite" a tweet, to use the phrase that described "making an outline of a grey heart into a pinkish-red hued heart and thereby stimulate endorphine release in all parties" back when the `api` was originall developed), there is [a simple `POST` endpoint](https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/post-favorites-create). that endpoint is:

```
https://api.twitter.com/1.1/favorites/create.json
```

and the required paramater is `id`, a globally unique identifier for the tweet we want to like (look at the link of the tweet and see if you can guess what the id is!)

that's right:

```
id = 916025777128427520
```

if that were all we needed to know, it'd be simple:

```python
resp = requests.post(
    url='https:api.twitter.com/1.1./favorites/create.json',
    params={'id': 916025777128427520}
)
```

it's not exactly that simple, though -- twitter has *some* standards. this `api` requires *authentication* not by *username and password*, but via a process called `oauth`. let's walk through that now

## 5.2: `oauth` and authentication

### high level: what is `oauth`

[`OAuth`](https://stormpath.com/blog/what-the-heck-is-oauth) is a standard (a set of guidelines and some implementation suggestions or requirements) for "delegating access". basically, this is a way for you as a real human user to allow other things (applications, web apps, `api`s, etc) to act on your behalf.

this is the standard that is being used when you, for example, go to "log in" to a site and it temporarly redirects you to Facebook or Google to either say "yeah, that's me" or "sure, they can have my life's history".

one reason to do something like this is to generate a new, unique password for *every* request. this way we don't have permanent credentials sitting around in files (e.g. the `aws cli` access keys).

twitter asks that we use it, so let's give that a go.


### configuring your twitter ~~bot~~ friendliness generator to use `oauth`

the following is a basic summary of instructions found [here](https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens)

1. head to [the twitter application page](https://apps.twitter.com/)
2. click "create new app"
3. fill in the form
    1. for website, use whatever you want (*e.g.* http://www.noaddress.edu/)
    2. for callback url, leave this blank
4. on the new application's home page (something like https://apps.twitter.com/app/********), click on the "Keys and Access Tokens" tab
    1. at the bottom is a section called "Your Access Token"
    2. click on the "Create my access token" button
5. secrets are secrets for a reason! don't share them!!
6. nothing more to do now, but from this page you will be using:
    1. Consumer Key
    2. Consumer Secret
    3. Access Token
    4. Access Token Secret

## 5.3: "liking" tweets using `requests_oauthlib` 

OAuth authentication is common enough that someone has created an extension of the `requests` library to specifically handle OAuth authentication for regular `requests` library requests.

it follows a common paradigm: rather than use the `requests.post` method, we *create an OAuth-authenticated `session` object* and make our `GET` and `POST` requests with the `oauthsession.get` and `oauthsession.post` methods of that session (this is an identical workflow to the `boto3 session` objects, for example).

```python
import getpass
import requests_oauthlib

def get_oauth_credentials():
    """prompt a user to type in the four values on their twitter application 
    page (https://apps.twitter.com/app/********/keys). this is more secure than 
    typing them in directly, as those commands are saved to your python command 
    history file. it is *not* automation-friendly, however -- for that you would
    want to use a separate file as discussed in the notes
    
    """
    consumerKey = getpass.getpass(prompt="your consumer key: ")
    consumerSecret = getpass.getpass(prompt="your consumer secret: ")
    accessToken = getpass.getpass(prompt="your access token: ")
    accessTokenSecret = getpass.getpass(prompt="your access token secret: ")

    return consumerKey, consumerSecret, accessToken, accessTokenSecret
    
    
def get_tweet_history(screenname='rzl_gu511'):
    """build an oauth-authenticated connection and requests all statuses 
    (tweets) from the provided screen name
    
    """
    consumerKey, consumerSecret, accessToken, accessTokenSecret = get_oauth_credentials()
    
    # this code will create an authenticated session object
    oauthsession = requests_oauthlib.OAuth1Session(
        client_key=consumerKey,
        client_secret=consumerSecret,
        resource_owner_key=accessToken,
        resource_owner_secret=accessTokenSecret
    )
    
    # an oauth-authenticated request (didn't *need* oauth, it's just an example)
    resp = oauthsession.get(
        url='https://api.twitter.com/1.1/favorites/list.json',
        params={'screen_name': 'rzl_gu511'}
    )
    
    return resp.json()
```


### putting it all together

using `requests_oauthlib`, fill in the following `python` code. make sure that the function `oh_geeze_thats_a_good_tweet` works (i.e. it "likes" the tweet, aka creates a favorite record for that tweet id). when you can verify that it is working correctly, save the contents to a file called `goodtweet.py`

```python
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
Module: goodtweet.py
"""

import getpass
import requests_oauthlib


def get_oauth_credentials():
    """prompt a user to type in the four values on their twitter application 
    page (https://apps.twitter.com/app/********/keys). this is more secure than 
    typing them in directly, as those commands are saved to your python command 
    history file. it is *not* automation-friendly, however -- for that you would
    want to use a separate file as discussed in the notes
    
    """
    consumerKey = getpass.getpass(prompt="your consumer key: ")
    consumerSecret = getpass.getpass(prompt="your consumer secret: ")
    accessToken = getpass.getpass(prompt="your access token: ")
    accessTokenSecret = getpass.getpass(prompt="your access token secret: ")

    return consumerKey, consumerSecret, accessToken, accessTokenSecret
    
    
def oh_geeze_thats_a_good_tweet(tweetid=916025777128427520):
    """build an oauth-authenticated connection and "favorite" (aka "like")
    a tweet with a given ID
    
    """
    # load credentials with the get_oauth_credentials function
    #---------------#
    # FILL ME IN!!! #
    #---------------#

    # create an oauth session object using those credentials
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # use the oath session you just created to make a requests to the
    # `favorites/create` api endpoint to "favorite" the tweet with id `tweetid`
    # api docs:
    # https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/post-favorites-create
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # return the json payload of the previous request
    return resp.json()
```

## 5.4: *advanced and optional*: retweeting with `selenium`

*this assignment is optional, for those of you who want experience using `selenium`.*

using `selenium`, fill in the following `python` code. make sure that the function `youve_gotta_see_this_tweet` works (it fully retweets the tweet), and if it does, add it to the file `goodtweet.py` from above

```python
# add this to the import block of your `goodtweet.py` file
import selenium
import selenium.webdriver


# append this to the rest of the `goodtweet.py` file
def youve_gotta_see_this_tweet(username, password, tweeturl, driverpath=None):
    """use selenium to sign in to twitter (requires `username` and `password`),
    open the tweet at `tweeturl`, click the "retweet" icon, and click the 
    "retweet" button in the popup
    
    """
    # build a driver (firefox driver, chrome, etc)
    # *NOTE*: you have to have previously downloaded the driver executable 
    # and added the directory in which that driver resides to your PATH variable
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # navigate to the twitter login page
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # use xpath or css to get the html elements of the user name and password
    # fields, and use the `sendkeys` methods of those elements to type the
    # username and password in. then select the "submit" button and use the
    # `click` method to click it
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # navigate to the awesome tweet
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # find the html element of the "like" icon, and `click` it
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # in the popup, find the "retweet" button and `click` it
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # `close` the driver
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    return
```

## 5.5: deliverable

uplaod the contents of `goodtweet.py` to your public `s3` homework submission bucket