# Exercises due by EOD 2017.11.09

## goal

in this homework assignment we will focus on `aws lambda` functions and available triggers (specifically: cloudwatch scheduled events and `api` gateway calls).

## method of delivery

as mentioned in our first lecture, the method of delivery may change from assignment to assignment. we will include this section in every assignment to provide an overview of how we expect homework results to be submitted, and to provide background notes or explanations for "new" delivery concepts or methods.

this week you will be submitting the results of your homework via upload to two different `s3` buckets. the first is one I created for you and will be used for question 1, and the second is the same homework bucket you have been using now for several weeks.

summary:

| exercise | deliverable                                 | method of delivery                                  |
|----------|---------------------------------------------|-----------------------------------------------------|
| 1        | daily alarm clock messages posted to `s3`   | upload to our `s3` bucket: `2017.fall.gu511.{GUID}` |
| 2        | a url endpoint of a `lambda` function `api` | upload to your personal `s3` hw bucket              |

# exercise 1: the world's worst alarm clock in `lambda`

in previous lectures and homework, we used the linux `cron` program to schedule a recurring job to print "wake up" messages to our `s3` buckets. let's do that again, but this time utilizing serverless `lambda` functions and `CloudWatch` events.


## 1.1: create the `lambda` function

Create a `lambda` function with the following properties

1. the trigger is a `CloudWatch` event executing the function every week day at 5 AM
2. the runtime is `python 3.6` or above
3. the code is simply copy-pasted from [`alarm_clock.py`](https://s3.amazonaws.com/shared.rzl.gu511.com/alarm_clock.py)


## 1.2: test the `lambda` function

test the function you just created with the following `json` event:

```json
{
    "message": "wake up!!!",
    "bucket": YOUR_BUCKET_NAME_HERE
}
```

in the above, please replace `YOUR_BUCKET_NAME_HERE` with the name of the "alarm clock bucket" you used for the previous `alarm_clock.py` exercises (we set up a `cron` job to post to this bucket, you submitted a `what_i_ran.sh` file with this bucket hard-coded in it, and you sent us the name of that hw bucket along with the rest of that assignment).

use this test to make sure that the function works and that it creates a file in the "alarm clock" bucket.

*hint: it probably won't work until you...*


## 1.3: fix permissions

when you created your `lambda` function, you probably also created a new isolated `iam role` for this `lambda` function. when the event is received and the `lambda` function launches, the code is executed from within that `role`. this means that `iam role` needs permission to add items to that particular `s3` bucket.

using the `s3` web console, navigate to the "Permissions" tab of the "alarm clock" bucket. on the permissions page, click the "Bucket Policy" button and add the following bucket policy `json` block to the policy editor (obviously, replace the `ALL_CAPS_WORDS` below with the appropriate value for your `lambda` function `iam role` and `s3` bucket):

```json
{
    "Sid": "LambdaPutObject",
    "Action": "s3:PutObject",
    "Effect": "Allow",
    "Principal": {
        "AWS": [
            "arn:aws:iam::YOUR_AWS_ACCOUNT_NUM:role/service-role/YOUR_LAMBDA_FUNCTION_ROLE_NAME"
        ]
    },
    "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
}
```

for example, mine was

```json
{
    "Sid": "LambdaPutObject",
    "Action": "s3:PutObject",
    "Effect": "Allow",
    "Principal": {
        "AWS": [
            "arn:aws:iam::336188965589:role/service-role/gu511_lambda_basic_execution"
        ]
    },
    "Resource": "arn:aws:s3:::gu511.lamberty.io/*"
}
```


## 1.4: test again

back on the `lambda` function's summary page, try testing the custom `json` event from above again. this time it should work -- if not, head to stack overflow and start debugging!


## 1.5: fixing the cloudwatch event

at this point, you should have

1. a `lambda` function which expects and `event` object with a `message` and `bucket` element
2. a `CloudWatch` trigger on that `lambda` function
3. a `CloudWatch` rule which will yield an `event` every morning at the scheduled time

there is one problem here -- the `event` that the `CloudWatch` rule you created will yield does not have `message` or `bucket` elements (our test version did, but that was hard-coded). let's fix that

1. navigate to the `aws CloudWatch` service via the web console
2. open the `CloudWatch event rule` you created for your `lambda` function
    1. the *target* (your `lambda` function from above) should appear on the left
3. click on your *target*'s "Configure Input" dropdown
4. change the input type to "Constant (JSON text)"
5. make the `json` text be `{"message": "wake up!!!", "bucket": YOUR_BUCKET_NAME_HERE}`


## 1.6: wake me up before you go go

at this point, you have a working alarm clock. good for you. but what about me?

my alarm hasn't been working, so I need some help. I've created an `s3` bucket for each of you on *my* `aws` account, and your bucket name is

```
2017.fall.gu511.{GUID}
```

where `{GUID}` is replace by your Georgetown ID. for example, my GU ID is `rzl5`, and my bucket name would be `2017.fall.gu511.rzl5`.

alter the `CloudWatch` event rule above to post wake up messages to this new bucket.


##### there is no direct submission for this assignment -- we will verify work by seeing daily files added to the `2017.fall.gu511.{GUID}` buckets at 5AM EST

# exercise 2: deploying a model as a `REST api`

in this assignment we will take a trained and packaged random forest model and deploy it as a lambda function behind a RESTful `api`.


## 2.1: reading background info

### the model

in a previous homework assignment we filled in the details on an exploratory notebook `ds_pipeline_template.ipynb`, and the end result of this process was a trained model for classifying someone's income based on a number of features. the solutions to that notebook are available in the neighboring `ds_pipeline_template_answers.ipynb` file.

the end result of that notebook and modeling process is a recursive feature selector object (where the base is a random forest) and a subsequent trained random forest model. it is possible to conver all of that work (pre-processing, feature selection, and modelling) into specific `scikit-learn` transformers, and to have one unified process for both model *training* and future record *scoring*.

I have made some changes to that overall process to do just that: streamline everything under the `scikit-learn` transformer `api` (all the preprocessing steps and the modelling steps). those changes are encapsulated in the neighboring `salarymodel/ds_pipeline_template_sklearn_only.ipynb`, which creates two objects:

1. `preprocessor`: a pipeline which can be used to take records which matches the format of [the csv](http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data) for the [adult salary dataset](http://archive.ics.uci.edu/ml/datasets/Adult) and convert them into fully processed `numpy` arrays
2. `modeller`: a feature selection and modelling pipeline that will take per-processed `numpy` records (like those created by the above) and predict the probability that a person earns more or less than 50K USD for their salary.


### the deployment package

while amazon provides users with basic `python` runtimes, non-base modules have to be included with the package. I have packaged up these extra libraries (`numpy`, `scipy`, `scikit-learn`) and pickled `scikit-learn pipeline`s along with the code required to create a `lambda` function.

the `zip` archive which is `lambda`-function-ready is publicly available [on my `s3` bucket](https://s3.amazonaws.com/shared.rzl.gu511.com/salarymodel/salarymodel.zip). getting this `zip` file to a size that was acceptable while also not breaking the `scipy` binary executable and linkable `.so` files.


## 2.2: creating the `lambda` function

let's put that `zip` archive to work. create a new `lambda` function from scratch with the following properties (assume the default is acceptable unless otherwise noted):

1. name it `salary_model`
2. create a new role `salary_model_role`
3. the runtime should be `python 3.6`
4. the code entry type should be upload from `s3`
5. the `s3` code url is https://s3.amazonaws.com/shared.rzl.gu511.com/salarymodel/salarymodel.zip
6. the handler should be `salarymodel.handler`
7. the memory value (see "Basic Settings") should be 256 MB
8. the timeout should be set to 10 seconds


## 2.3: testing this `lambda` function

create a `lambda` test event `salarytest` with the following `json` event body:

```json
{
    "httpMethod": "GET",
    "queryStringParameters": {
        "age": 52,
        "capital_gain": 15024,
        "capital_loss": 0,
        "education": "HS-grad",
        "hours_per_week": 40,
        "marital_status": "Married-civ-spouse",
        "native_country": "United-States",
        "occupation": "Exec-managerial",
        "race": "White",
        "relationship": "Wife",
        "sex": "Female",
        "workclass": "Self-emp-inc"
    }
}
```

this test should *succeed* and return the following:

```json
{
  "statusCode": "200",
  "body": {"score": {"<=50k": 0.03, ">50k": 0.97}}",
  "headers": {
    "Content-Type": "application/json"
  }
}
```

verify that your test succeeds before moving on


## 2.3: setting up an `api` gateway trigger

big picture, we want users to be able to hit a single `api` endpoint and have this pre-trained model classify that record and respond to them. to do this, we will give our `lambda` function an `api` gateway trigger.

on the `lambda` function's "triggers" tab, add a trigger with type "API Gateway" with the following properties:

+ the name should be `salary_model`
+ the security settings should be `Open`


## 2.4: testing the `api` gateway

in creating the trigger above, you will have created an `api` gateway object. we already tested our `lambda` function with a direct event; now let's go test our triggering gateway endpoint directly

+ navigate to your new `api_gateway` element
+ click on the "Test" box to try the following three tests (the query strings are multi-line for formatting only, put the contents on one line):

| request type | query string | expected behavior |
|-|-|-|
| `GET` | `age=52&capital_gain=15024&capital_loss=0&education=HS-grad`<br>`&hours_per_week=40&marital_status=Married-civ-spouse`<br>`&native_country=United-States&occupation=Exec-managerial`<br>`&race=White&relationship=Wife&sex=Female&workclass=Self-emp-inc` | `{"score": {"<=50k": 0.03, ">50k": 0.97}}` |
| `GET`  | leave empty | `"you must provide input record parameters"` |
| `POST` | leave empty | `"Unsupported method \"POST\""` |

make sure you pass all three tests before moving on.


## 2.5: deploying the `api`

at this point you have a successfully tested `api` which you can interact with via the web console (as we just did). to make it available to the outside world, you have to *deploy* it. let's do that now.

+ navigate back to the main `salary_model api` gateway page
+ *deploy* the `salary_model api` by clicking the "Actions" dropdown menu and selecting "Deploy API" and selecting the "prod" deployment stage.
+ on the "prod" stage, navigate to your `/salary_model GET` endpoint
+ find the url of this endpoint
    + it should be https://dztalctwd6.execute-api.us-east-1.amazonaws.com/prod/salary_model
    
going forward, we will refer to the url above as simply `url`.

make sure that you have an `api` which looks like the example above (ending in `/prod/salary_model`) before moving on.


## 2.6: performing an `http GET` test

if you replace `FILL_IN_YOUR_URL_HERE` with the url from 2.5, the following code *should* work -- verify that it does!

```python
import requests

url = FILL_IN_YOUR_URL_HERE

resp = requests.get(
    url=url
    params={
        "age": 52,
        "capital_gain": 15024,
        "capital_loss": 0,
        "education": "HS-grad",
        "hours_per_week": 40,
        "marital_status": "Married-civ-spouse",
        "native_country": "United-States",
        "occupation": "Exec-managerial",
        "race": "White",
        "relationship": "Wife",
        "sex": "Female",
        "workclass": "Self-emp-inc"
    }
)

j = resp.json()
print(j)

assert j['score']['<=50k'] == 0.03
assert j['score']['>50k'] == 0.97
```


## 2.7: sharing your url

put the full url (`https://***********.execute-api.us-east-1.amazonaws.com/prod/salary_model`) into a raw text file called `salary_model_url.txt`.

##### upload `salary_model_url.txt` to your `s3` homework bucket (the one you've used for the previous few homework assignments.