<a href="https://colab.research.google.com/github/rzl-ds/gu511/blob/master/011_lambda.ipynb" target="_parent">
    <img src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>

# `aws lambda`

think of the steps we needed to go through before we were able to execute our sad `alarm_clock.py` script in the previous lectures and homeworks:

1. provision an `ec2` instance
2. make an `ssh` connection to that server
3. copy our `python` script to that server
4. create an environment capable of executing that `python` script
5. establish an `iam role` for `aws cli` and `boto3` process running on that server
6. add a `crontab` entry to run that script with some frequency
7. leave our `ec2` server up and running while the script runs once a day

that's a fair amount of work and infrastructure for doing something as simple as running a block of code. 

it's also relatively expensive: we didn't *need* our `ec2` service to be up and running constantly -- only for a few seconds each day. And we didn't utilize all of the resources available to our machine while we were executing that code, so even that time was somewhat wasted.

`aws lambda` is a serverless code execution service which tries to provide a more streamlined and simple process for executing code in a way that is highly integrated with other `aws` services.

it also generalizes from *scheduling* to *triggered* events -- meaning we can execute code with some temporal frequency, *or* any time some particular thing happens (*e.g.* a file is posted to a particular `s3` bucket). this makes it a sort of "if this then that" for `aws`

## how `lambda` functions are defined

a `lambda` function is comprised of three things:

1. a runtime and an environment: a particular language and particular versions of that language, other environment setups
2. a trigger: some event that causes a function to run
3. a function: some code executed in that run-time after that trigger

### run-time and execution environment

this is basically the "where" of your code's execution

the available runtimes are updated with some frequency, so it's usually best to look it up and check out [this page](https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html). of course, the `lambda` function creation dialog will also make you choose one of them.

at the time of writing, though, the environment and runtimes were as follows:

+ os: all executed on the Amazon Linux AMI, 64 bit
+ runtimes:
    + `node.js` v4.3.2 and 5.10.3
    + `java` 8
    + **`python` 3.6 and 2.7**
    + `C# .NET Core` 1.0.1
+ sdks
    + `javascript`
    + `boto3` for `python 2.7, python 3.6`

### triggers

this is the "when" of your code's execution.

there is a *huge* list of events and items that could trigger the execution of your `lambda` function. for example,

1. any event in several `aws` services
2. an open `api`
3. an alexa
4. nothing at all (manual)

moral of the story: you can *probably* trigger it the way you want (especially via the `api` endpoint). It may not be obvious, but I bet you can do it.

### functions

finally, the "what" of your code's execution is the code itself.

at their heart, every `aws lambda` function is a zip directory of code that can be executed in the given, fixed runtime from above, and a single functional endpoint called a "handler".

in the `python` world, that endpoint is specified as a python module (basically, a file name) and single function in that file which looks like

```python
# file: file_name.py
def function_name(event, context):
    # does something using the event json and the context
```

the handler here is `file_name.function_name`

#### `event`

the `event` item is a `json` object that is usually completely defined by the triggering event you chose above.

given that we are usually writing `python` script here, you should know that the `json` library has already parsed this `event` into a `dict`

```python
event = {
    'key1': 'val1',
    'key2': ['list2elem1', 'list2elem2', 'list2elem3'],
    'key3': {
        'key3_1': 'val3_1',
        'key3_2': 'val3_2',
    }
}
```

you can either

1. *use* that triggering event to change the way your code executes
    1. *e.g.* `eventval1 = event['key1']`
    2. this means that you need to know the structure of that event `json` object
2. ignore the content of the event and simply resond to the fact that *some* event happened

either way, though, it is expected that your `lambda` function code will take the event in as its first parameter

`aws` has collected an example of every `event` type on the following page: http://docs.aws.amazon.com/lambda/latest/dg/eventsources.html

#### `context`

in each of the runtimes, `aws` creates an object called a `context` that allows the user to interactively interact with the runtime and environment. the [`python` context object is discussed here](http://docs.aws.amazon.com/lambda/latest/dg/python-context-object.html).

it will tell you (among many other things):

+ how long till your function times out
+ the memory limit of this `lambda` function
+ the function name or version

use of the `context` is pretty... context dependent. use of the `event` is much more common

let's create our first `lambda` function using the `hello-world-python3` blueprint

<br><div align="center">**walkthrough: create a hello world `lambda` function**</div>

this is following [the walkthrough in the `lambda` documentation](https://docs.aws.amazon.com/lambda/latest/dg/getting-started-create-function.html)

1. go to [the `lambda` service dashboard](https://console.aws.amazon.com/lambda/home?region=us-east-1#/home)
2. create a function
    1. blueprint
        1. search for hello-world-python3 and select it
    2. triggers
        1. do nothing and click "next"
    3. function
        1. give it whatever name you want
        2. the rest of the configurations are okay
    4. review
        1. just do it!
3. testing it
    1. be on the function page
        1. you were dropped there after creating it
        2. but if you left, you can get there be clicking "functions" in the left panel and selecting the name of your function
    2. click the "test" button
    3. use the default `json`
        1. if there is no default, `{"key3": "value, "key2": "value2",, "key1": "value1"}`
4. monitoring
    1. click the monitoring tab
        1. check out the invocation and time used graph
    2. click on the cloudwatch logs
        1. look at the log stream 

## `trigger`s worth knowing about

there are a lot of options for triggers, and depending on what you want to do with `aws` you will explore any number of them. that being said, there are a few that have immediate impact on services we've used already or will be likely to use while doing data science.

generally you already know what the event you're responding to "is" -- you intentionally dropped an important file in an `s3` bucket, e.g.

the main question to ask is:

*what does the `json` object passed from the triggered event look like, and how can / should we use it in our function?*

### scheduled events

the `lambda` world has its own version of `cron` in `CloudWatch Events`. the same way that we were able to create shell scripts that could be executed on some frequency, we can creaet `lambda` functions and execute them with some frequency.

when we set the frequency of the scheduled event, we can use a modified `cron` syntax, or rate expressions -- docs are [here](https://docs.aws.amazon.com/lambda/latest/dg/tutorial-scheduled-events-schedule-expressions.html)

```bash
# rate example: daily
rate(1 day)

# cron example: work weekdays at 6 AM UTC
cron(0 6 ? * MON-FRI *)
```

<div align="center">**walkthrough: creating a `cloudwatch` scheduled event `lambda` function**</div>

the walkthrough goes through the following steps

+ create a new function
+ on the trigger menu, select "Cloudwatch Event"
+ enter a cron expression
+ use the code below as the function defitinion
+ submit

```python
def lambda_handler(event, context):
    print("event = {}".format(event))
    return 'hello world'
```

whatever particular configuration we chose, an actual `cloudwatch` event `rule` was created. 

we also could have directly created the cloudwatch rule ahead of time via the `cloudwatch` service, or we could let `lambda` build one for us.

let's check out the rule we created in our walkthrough by heading to [the `cloudwatch` service](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1)

<div align="center">**walkthrough: looking at `cloudwatch` scheduled event options**</div>

the walkthrough goes through the following pages

+ navigate to the `cloudwatch` service
+ select the "Events > Rule" page from the left menu
+ select any rule to go to the rule's summary page
+ click edit
    + source
        + event pattern (trigger on *any* event across `aws` services)
        + schedule (rate or cron expression
        + dropdown of sample event indicates exactly what the event is
    + target: `lambda` function
        + this happens to point at a `lambda` function (because it was built for us via `lambda`)
        + the `input` discussed here is the `input` `event` object for the `lambda` function
        + we have options
            + the exact object
            + a single sub-element from the object
            + a *constant* version
            + a value constructure from the components of the original event object (based on key-value replacement)

so, the basic `event` object is

```json
// cloudwatch scheduled event example
{
  "account": "123456789012",
  "region": "us-east-1",
  "detail": {},
  "detail-type": "Scheduled Event",
  "source": "aws.events",
  "time": "1970-01-01T00:00:00Z",
  "id": "cdc73f9d-aea9-11e3-9d5a-835b769c0d9c",
  "resources": [
    "arn:aws:events:us-east-1:123456789012:rule/my-schedule"
  ]
}
```

we could take our `hello world` example and make a small modification to incorporate elements of this event:

```python
def lambda_handler(event, context):
    print("event = {}".format(event))
    return 'hello world! it is {}'.format(event['time'])
```

<div align="center">**mini exercise: create a scheduled hello world `lambda` function**</div>

1. create a new `lambda` function with whatever name you'd like
2. use the definition below for the function
3. trigger the event on a `cloudwatch` scheduled event
4. schedule the even to trigger every M-F at 6:00 UTC
5. test your lambda function

```python
def handler(event, context):
    print("event = {}".format(event))
    return 'hello world! it is {}'.format(event['time'])
```

### `s3`

suppose you created a process which checks every day for an update to some important externally-managed datasource, and downloads any new items posted to that external source to your `s3` bucket -- good thinking!

also suppose that this new data affects a model you already trained, but with the previous version of that data set. wouldn't it be awesome if you could have your training process kick off automatically whenever a new dataset arrives?

it will surely come as a *great shock* that you can do this with `lambda` functions. can you believe it? I posed that arbitrary hypothetical in this lecture on `lambda` functions and here we are, possibly addressing that hypothetical.

wild!

<div align="center">**walkthrough: creating a useless `s3` event triggered `lambda` function**</div>

the walkthrough does the following

1. navigate to the `s3` service page
    1. create an `s3` event-producing bucket
    2. create a subfolder in that bucket
2. navigate back to the `lambda` service page
3. create a new `lambda` function
    1. no blueprint
    2. trigger type is `s3`
    3. configure trigger
        1. pick the bucket you create in step 1.1
        2. event type is "Object Created (All)"
        3. prefix is the subfolder you chose in step 1.2 (with a trailing "/" character, e.g. "myfolder/")
        4. no suffix needed
        5. enable trigger
    4. configure function
        1. use the `python 3.6` code below (does nothing but print our event)
    5. save
4. test new `lambda` function
    1. click the "actions" dropdown to select the "configure test event" option
    2. selet an `s3 put` event template
    3. "save and test"
    
```python
def lambda_handler(event, context):
    print("event = {}".format(event))
    return 'hello world!'
```

toward the end of that walkthrough, we created a test `s3 PUT` event that looked like the following:

```json
{
    "Records": [
        {
            "eventVersion": "2.0",
            "eventTime": "1970-01-01T00:00:00.000Z",
            "requestParameters": {"sourceIPAddress": "127.0.0.1"},
            "s3": {
                "configurationId": "testConfigRule",
                "object": {
                    "eTag": "0123456789abcdef0123456789abcdef",
                    "sequencer": "0A1B2C3D4E5F678901",
                    "key": "HappyFace.jpg",
                    "size": 1024
                },
                "bucket": {
                    "arn": "arn:aws:s3:::mybucket",
                    "name": "sourcebucket",
                    "ownerIdentity": {"principalId": "EXAMPLE"}
                },
                "s3SchemaVersion": "1.0"
            },
            "responseElements": {...},
            "awsRegion": "us-east-1",
            "eventName": "ObjectCreated:Put",
            "userIdentity": {"principalId": "EXAMPLE"},
            "eventSource": "aws:s3"
        }
    ]
}
```

let's update our hello world script to (pretend to) run our data science pipeline process

```python
def lambda_handler(event, context):
    print("event = {}".format(event))
    s3info = event['Records'][0]['s3']
    # return a str version of the command we could execute
    cmd = 'dspipeline.update_file(bucket={}, key={})'.format(
        s3info['bucket']['name'], 
        s3info['object']['key']
    )
    print(cmd)
    return cmd
```

<div align="center">**mini exercise: create an `s3`-triggered hello world `lambda` function**</div>

1. create a new `s3` bucket and subfolder
2. create a new `lambda` function triggered on any new file in that subfolder
3. use the (`python 3.6` runtime) code below as the function defintion
4. test it out by creating a new file in `s3` and looking at the `cloudwatch` logs
    
```python
def lambda_handler(event, context):
    print("event = {}".format(event))
    s3info = event['Records'][0]['s3']
    # return a str version of the command we could execute
    cmd = 'dspipeline.update_file(bucket={}, key={})'.format(
        s3info['bucket']['name'], 
        s3info['object']['key']
    )
    print(cmd)
    return cmd
```

### api gateway

this service allows you to create your own RESTful api endpoint -- a url to which you may send `GET, POST, PUT,` or `DELETE` requests to perform some action. that action that you perform can very easily be a `lambda` function!

why might we want to do this?

as we will see later, it is possible (even easy) to invoke a serverless `lambda` function from code which is written using one of the `aws sdk`s (e.g. `boto3`), and in the `aws` environment. but what if we don't have access to these tools or environments for parts of our project?

RESTful apis are language agnostic and highly standardized ways of providing an integration point to the `lambda` function we wrote and external code. you can expose this `lambda` function as an api endpoint and anyone with knowledge and permissions can invoke it in any way they want

<div align="center">**walkthrough: creating an api gateway triggered `lambda` function**</div>

the walkthrough does the following

1. create a new `lambda` function
    1. no blueprint
    2. trigger type is `api gateway`
    3. configure trigger
        1. we can pick any name (e.g. `HelloWorldTest`)
        2. we leave all other defaults the same
    4. configure function
        1. use the `python 3.6` code below (does nothing but print our event)
    5. save
2. test new `lambda` function
    1. click the "actions" dropdown to select the "configure test event" option
    2. enter the `json` test message below
    3. "save and test"
    
```python
import json

def respond(err, res=None):
    return {
        'statusCode': '400' if err else '200',
        'body': err.message if err else json.dumps(res),
        'headers': {'Content-Type': 'application/json'},
    }


def lambda_handler(event, context):
    print("event = {}".format(event))

    reqtype = event['httpMethod']
    if reqtype == 'GET':
        return respond(
            err=None,
            res='value 1 is {}'.format(
                event['queryStringParameters'].get('key1')
            )
        )
    else:
        return respond(ValueError('Unsupported method "{}"'.format(reqtype)))
```

```json
// simple json test GET request
// change the method POST" for different test case
{
  "queryStringParameters": {
    "foo": "bar",
    "key1": "val1"
  },
  "httpMethod": "GET"
}
```

one thing to note here: the `event` we get is pretty much a direct pass-through of the `http` request sent to the api gateway -- in this sense, we can define whatever structure we want, and impose it on the downstream consumer (so long as it fits within the [`http` message structure](https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages)).

let's look at this from the point of view of the api gateway we just created.

#### `api gateway` object

<div align="center">**walkthrough: looking at `api gateway` event options**</div>

the walkthrough goes through the following pages

+ navigate to the `api gateway` service
+ select the "APIs > [YOUR API NAME] > Resources" page from the left menu
+ select the api endpoint you just created
+ click on the ANY method (it could have been GET, POST, etc)
    + test
        + allows us to perform a http request (a la `curl`)
        + can set arbitrary parameters and see what our response is
        + try `GET` with nothing
            + it fails with a cryptic message... maybe we should fix that... two steps later...
        + try `GET` with paramaters `key1=val1&key2=val2`. SUCCESS!
        + try `POST`. FAILURE, but more descriptive failure. that's progress.
    + method request
        + defines the interface of the api endpoint
        + url query string parameters section
            + let's update it to *require* parameter key1, and suggest key2 and key3
            + let's update the docs for each of those (click the book on the same line)
            + required error message says we should create a validator -- let's do it!
        + request validator
            + click the edit pencil
            + select query string and header option
    + back to test
        + try `GET` with nothing now. cool!
    + integration request
        + this piece specifically configures the handoff to `lambda`
    + lambda function is what we know from before
    + integration response: NA
    + method resposne
        + allows us to provide specific documentation and structure for non-200 responses
+ deploy!
    + actions button dropdown > deploy api
    + basic configs
+ stage: prod
    + check out export tab > export as swagger

let's put that api to work!

In [None]:
import requests

url = 'https://qxmbtpq6lf.execute-api.us-east-1.amazonaws.com/prod/hello_world_api'

resp = requests.get(url=url, params={'key1': 'val1', 'key2': 'val2'})
resp.json()

## `cli` integration

### invoking a function

the `aws lambda` function has an `invoke` subcommand. let's read up on it via the `aws lambda invoke help` command

let's go back to our first "no trigger" hello world lambda function. I called mine `helloworld_blueprint`, so I should be able to invoke it as follows:

```bash
aws lambda invoke  \
    --function-name helloworld_blueprint  \
    --payload '{"key1": "val1", "key2": "val2", "key3": "val3"}'  \
    /tmp/out.txt
```

*note*: if you experience an error, you will get a return with a status code of 200 and an error message. check `/tmp/out.txt` for details

`boto` also has tie-ins:

```python
import boto3
import json

session = boto3.session.Session()
l = session.client('lambda')
l.list_functions()

payload = {"key1": "val1", "key2": "val2", "key3": "val3"}
j = l.invoke(
    FunctionName='helloworld_blueprint',
    Payload=json.dumps(payload)
)
j['Payload'].read()
```

two important notes

1. `lambda` is a reserved keyword in `python`, so if you get `SyntaxError`s popping up in your script make sure you're not playing fast and loose with the names
2. there is not yet a `resource` for `lambda` (our preferred access approach), so we default to the lower-level `client`

### creating a function from local code files

it is also possible to create `lambda` functions using the `cli` and not the web interface. this may be more convenient in the moment (*e.g.* the function code is collected and working on your `ec2` server, or you want to create 100s of functions with slight parameterization differences).

in these instances it may be nice to be able to use the `aws lambda create-function` command. let's demo that now!

this is a little more involved, but gives us a chance to stretch our command-line legs.

<div align="center">**mini-exercise: preparing a `lambda` function deployment package**</div>

1. get the `alarm_clock.py` file
    1. if you don't alread have it: `wget https://s3.amazonaws.com/shared.rzl.gu511.com/alarm_clock.py`
2. create a zip archive with that `alarm_clock.py` file in it
    1. install `zip`: `sudo apt install zip`
    2. `zip -r9 /tmp/alarm_clock.zip alarm_clock.py` (must be in the directory containing `alarm_clock.py`

<div align="center">**mini-exercise: deploying a `lambda` function with the `cli`**</div>

1. look up the `iam role arn` you used for the first `helloworld` `lambda` function
    1. https://console.aws.amazon.com/iam/home?region=us-east-1#/roles
    2. *e.g.* `arn = arn:aws:iam::134461086921:role/service-role/helloworld_blueprint_role` 
2. execute the `aws cli` command below
3. check it out at [the `lambda` console](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions)

```bash
# note: copy-paste may break dashes or quotes :(
aws lambda create-function \
    --function-name helloworld_from_cli_test \
    --zip-file fileb:///tmp/alarm_clock.zip \
    --role "arn:aws:iam::134461086921:role/service-role/helloworld_blueprint_role" \
    --handler alarm_clock.lambda_handler \
    --runtime python3.6
```

## free tier limits

we've done a lot today, so now is perhaps not the *best* time to figure out whether or not we've blown through a stack of cash. as you might guess, what we've done so far is totally free (or I really messed up!)

how much can we get away with for free? let's check out the pricing info:

https://aws.amazon.com/lambda/pricing/

***<div align="center">not too baaaaaaaad</div>***
<div align="center"><img src="https://regmedia.co.uk/2014/11/13/lamb.jpg?x=1200&y=794" width="500px"></div>

# END OF LECTURE

next lecture: [AWS `databases`](012_dbs_1.ipynb)

things created in this lecture that you may want to clean up afterward:

1. `lambda` functions
    1. all are named `helloworld_{event type}_test`, and you can delete them all
2. `iam role` items
    1. all are named `helloworld_{event type}_test_role` and you can delete them all
3. events
    1. `s3` bucket, subdirectory, and attached event
    2. `Cloudwatch` scheduled event rule
    3. `api gateway` endpoint url