# SI 330: Homework 4: APIs on AWS


## Due: Friday, February 9, 2018,  11:59:00pm

### Submission instructions</font>
After completing this homework, you will turn in two files via Canvas ->  Assignments -> HW 4:
Your Notebook, named si330-hw4-YOUR_UNIQUE_NAME.ipynb and
the HTML file, named si330-hw4-YOUR_UNIQUE_NAME.html.

### Name:  Catie Olson
### Uniqname: catieo
### People you worked with: I worked by myself. 

## Top-Level Goal
To create a microservice that returns the counts of all bigrams in a text passage.



## Learning Objectives
After completing this Lab, you should know how to:
* create an AWS Lambda function that takes a string and returns the counts of all bigrams in that text
* write an AWS API Gateway integration which allows both GET and POST requests to access an AWS Lambda
* write documenation to the microservice that you've created

### Note: See end of notebook for notes about going "Above and Beyond"

### Outline of Steps For Analysis
Here's an overview of the steps that you'll need to do to complete this lab.
2. Upload data to an S3 bucket
1. Create an AWS Lambda function that normalizes, tokenizes, and creates and counts bigrams from text, both via a POST request with the text and via a GET request to a URL that returns the text (e.g. an S3 bucket)
3. Create a python code block in this notebook to demonstrate the functionality of your microservice

Each of these steps is detailed below.

## Step 1: Upload data to an S3 bucket
To get ready to test the POST functionality of the code you generate in the next step, you should upload a text file that is **500 or fewer lines** to an S3 bucket.  See the description of CORS for an explanation of why we want to put the data in the same domain (amazonaws.com) as the Lambda.

Follow the same approach that we used in the lab to upload a small text file to your S3 bucket, ensuring that the permissions are set to allow public access

### <font color="magenta">Q1: Enter the URL of your text file

https://s3.amazonaws.com/si330-hw4-catieo/dickens-totc.txt

## Step 2: Create an AWS Lambda function that normalizes, tokenizes, and creates and counts bigrams from text

Similar to what we did in the lab, you're going to create a microservice that consists of two parts: an AWS Lambda and an API Gateway.  You can use exactly the same technique that we did in the lab to get started.

You will need to modify the code in the Lambda to handle two types of requests:
1. A GET request with a queryStringParameter of url=http://some.url.goes.here/text.txt, which specifies the location of the text to be processed and
2. A POST request with the text to be processed included as the "text" value in the body payload.

### The following code block is a reasonable starting point for creating your Lambda.  Note that this code should not be run in this notebook but rather serve as the starting point for your work in the Lambda editor.

**NOTE** Please see https://stackoverflow.com/questions/21844546/forming-bigrams-of-words-in-list-of-sentences-with-python for hints about how to create bigrams without NLTK.

In [None]:
"""
PUT SOME DOCUMENTATION HERE
"""
import json
import re
from botocore.vendored import requests # This line has been added. 
# You'll need to figure out how to use this requests, 
# but it works the same way as the requests module (called using ```import requests```) in python.

def lambda_handler(event, context):
    method = event['httpMethod']
    text = ""
    d = {"text": ""}
    # Handle GET method
    if method == 'GET':
        params = event['queryStringParameters']
        if params:
            url = ... # retrieve the text from the URL
    if method == 'POST':
        body = json.loads(event['body'])
        if 'text' in body:
            # do something 
    # normalize
    # tokenize
    # find bigrams
    # NOTE: see https://stackoverflow.com/questions/21844546/forming-bigrams-of-words-in-list-of-sentences-with-python
    #       for hints about how to create bigrams
    # count bigrams
    
    # Note the strict format of the return dictionary
    # It must contain these three elements, and the body
    # must be a stringified JSON object (i.e. you have to call 
    # json.dumps on the JSON structure you're returning)
    return { 
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps(d),
   }

### <font color="magenta">Q2a: Enter the URL of your Lambda

https://6orkit6m27.execute-api.us-east-2.amazonaws.com/hw4v1/bigramCounter

### <font color="magenta">Q2b: Copy your final Lambda code into the following code block (but do not run it here)

In [1]:
"""
A function that normalizes, tokenizes, and creates and counts bigrams from text
"""
import json
import re
from botocore.vendored import requests

def lambda_handler(event, context):
    method = event['httpMethod']
    text = ""
    func = "text"
    d = {}
    # Handle GET method
    if method == 'GET':
        params = event['queryStringParameters']
        if params:
            s = requests.Session()
            url = params['url']
            if 'func' in params:
                func = params['func']
            response = s.get(url)
            text = response.text
    if method == 'POST':
        body = json.loads(event['body'])
        if 'text' in body:
            text = body['text'] 
            func = params['func']
    if func == "text":
        d["text"] = text
    
    # normalize: convert the text to lowercase
    text = text.lower()
    if func == "normalize":
        d["normalize"] = text
    
    # tokenize: split the text into sentences, the split each sentence into words
    # NOTE: it's probably best to use re.split()
    sentence_list = re.split('\.\s*|\n\s*', text)
    if func == "tokenize_sentences":
        d['tokenize_sentences'] = sentence_list
    
    word_list = re.split('\W+', text)
    if func == "tokenize_words":
        d['tokenize_words'] = word_list
    
    # find bigrams
    # NOTE: it's very difficult to set up NLTK on Lambda, so you'll need to find bigrams "manually"
    # NOTE: see https://stackoverflow.com/questions/21844546/forming-bigrams-of-words-in-list-of-sentences-with-python
    #       for hints about how to create bigrams
    bigrams = [b for l in sentence_list for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
    if func == "bigrams":
        d['bigrams'] = bigrams
    
    # count bigrams
    bigram_counts = {}
    for b in bigrams:
        key = str(b)
        if key not in bigram_counts:
            bigram_counts[key] = 1
        else:
            bigram_counts[key] += 1
    if func == "bigram_counts":
        d['bigram_counts'] = bigram_counts
    
    # Note the strict format of the return dictionary
    # It must contain these three elements, and the body
    # must be a stringified JSON object (i.e. you have to call 
    # json.dumps on the JSON structure you're returning)
    return { 
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps(d),
   }

## Step 3: Demonstrate the GET and POST functionality of your Lambda

### <font color="magenta">Q3: Create a code block that uses `requests` to demonstrate the functionality of your Lambda.  You can modify the template below or create your own.

In [1]:
import requests
import json

lambdaURL = 'https://6orkit6m27.execute-api.us-east-2.amazonaws.com/hw4v1/bigramCounter' 
textURL = 'https://s3.amazonaws.com/si330-hw4-catieo/dickens-totc.txt' 

# Demonstrate the GET functionality
funcParam = "bigrams"
response = requests.get(lambdaURL + '?url=' + textURL + '&func=' + funcParam)
bigrams = json.loads(response.text)
print(bigrams) # you should make this print something nicer

{'bigrams': [['it', 'was'], ['was', 'the'], ['the', 'best'], ['best', 'of'], ['of', 'times,'], ['it', 'was'], ['was', 'the'], ['the', 'worst'], ['worst', 'of'], ['of', 'times,'], ['it', 'was'], ['was', 'the'], ['the', 'age'], ['age', 'of'], ['of', 'wisdom,'], ['it', 'was'], ['was', 'the'], ['the', 'age'], ['age', 'of'], ['of', 'foolishness,'], ['it', 'was'], ['was', 'the'], ['the', 'epoch'], ['epoch', 'of'], ['of', 'belief,'], ['it', 'was'], ['was', 'the'], ['the', 'epoch'], ['epoch', 'of'], ['of', 'incredulity,'], ['it', 'was'], ['was', 'the'], ['the', 'season'], ['season', 'of'], ['of', 'light,'], ['it', 'was'], ['was', 'the'], ['the', 'season'], ['season', 'of'], ['of', 'darkness,'], ['it', 'was'], ['was', 'the'], ['the', 'spring'], ['spring', 'of'], ['of', 'hope,'], ['it', 'was'], ['was', 'the'], ['the', 'winter'], ['winter', 'of'], ['of', 'despair,'], ['we', 'had'], ['had', 'everything'], ['everything', 'before'], ['before', 'us,'], ['we', 'had'], ['had', 'nothing'], ['nothing', '

In [3]:
s3text = requests.get(textURL) # get the text from the bucket
d = {"text": s3text.text, "func" : "bigram_counts"}
response = requests.post(lambdaURL, json = d)
bigrams = json.loads(response.text)
print(bigrams) # you should make this print something nicer

{'bigram_counts': {"('it', 'was')": 10, "('was', 'the')": 10, "('the', 'best')": 1, "('best', 'of')": 1, "('of', 'times,')": 2, "('the', 'worst')": 1, "('worst', 'of')": 1, "('the', 'age')": 2, "('age', 'of')": 2, "('of', 'wisdom,')": 1, "('of', 'foolishness,')": 1, "('the', 'epoch')": 2, "('epoch', 'of')": 2, "('of', 'belief,')": 1, "('of', 'incredulity,')": 1, "('the', 'season')": 2, "('season', 'of')": 2, "('of', 'light,')": 1, "('of', 'darkness,')": 1, "('the', 'spring')": 1, "('spring', 'of')": 1, "('of', 'hope,')": 1, "('the', 'winter')": 1, "('winter', 'of')": 1, "('of', 'despair,')": 1, "('we', 'had')": 2, "('had', 'everything')": 1, "('everything', 'before')": 1, "('before', 'us,')": 2, "('had', 'nothing')": 1, "('nothing', 'before')": 1, "('we', 'were')": 2, "('were', 'all')": 2, "('all', 'going')": 2, "('going', 'direct')": 2, "('direct', 'to')": 1, "('to', 'heaven,')": 1, "('direct', 'the')": 1, "('the', 'other')": 1, "('other', 'way--')": 1, "('in', 'short,')": 1, "('short

## Save your notebook, download it as HTML and submit both the .ipynb and .html files to Canvas

## Notes about going "Above and Beyond"

There are ample opportunities for extending this homework assignment.  You might, for example, decide to break the microservice into three separate ones (normalizing, tokenizing, and creating bigrams).  Alternatively, you might invest time into getting NLTK data into Lambda so you can use its functionality (see https://stackoverflow.com/questions/42394335/paths-in-aws-lambda-with-python-nltk).  Another interesting investigation might be to use the addition of a data file to an S3 bucket as a trigger to run the bigram analysis, perhaps writing the results to another (public) bucket.

**IF YOU CHOOSE TO GO ABOVE AND BEYOND, YOU _MUST_ CHANGE THE FOLLOWING MARKDOWN BLOCK**

## Above and Beyond

Indicate here why you believe that your work should be considered "above and beyond".

My lambda function accepts a second parameter called "func" that accepts the following values: "text", "normalize", "tokenize_sentences", "tokenize_words", "bigrams", or "bigram_counts" and returns the text manipulated according to what was specified in the func parameter. The func param would be passed with the requests.get or requests.post and would be in event['queryStringParameters']. Two examples are shown above with bigrams and bigram_counts.  If no func parameter is specified, the text is returned un-manipulated. The other specified func parameters are demonstrated below with GET requests, but will also work with POST requests. 

In [4]:
funcParam = "text"
response = requests.get(lambdaURL + '?url=' + textURL + '&func=' + funcParam)
bigrams = json.loads(response.text)
print(bigrams)

{'text': 'It was the best of times,\nit was the worst of times,\nit was the age of wisdom,\nit was the age of foolishness,\nit was the epoch of belief,\nit was the epoch of incredulity,\nit was the season of Light,\nit was the season of Darkness,\nit was the spring of hope,\nit was the winter of despair,\nwe had everything before us,\nwe had nothing before us,\nwe were all going direct to Heaven,\nwe were all going direct the other way--\nin short, the period was so far like the present period, that some of\nits noisiest authorities insisted on its being received, for good or for\nevil, in the superlative degree of comparison only.'}


In [5]:
funcParam = "normalize"
response = requests.get(lambdaURL + '?url=' + textURL + '&func=' + funcParam)
bigrams = json.loads(response.text)
print(bigrams)

{'normalize': 'it was the best of times,\nit was the worst of times,\nit was the age of wisdom,\nit was the age of foolishness,\nit was the epoch of belief,\nit was the epoch of incredulity,\nit was the season of light,\nit was the season of darkness,\nit was the spring of hope,\nit was the winter of despair,\nwe had everything before us,\nwe had nothing before us,\nwe were all going direct to heaven,\nwe were all going direct the other way--\nin short, the period was so far like the present period, that some of\nits noisiest authorities insisted on its being received, for good or for\nevil, in the superlative degree of comparison only.'}


In [6]:
funcParam = "tokenize_sentences"
response = requests.get(lambdaURL + '?url=' + textURL + '&func=' + funcParam)
bigrams = json.loads(response.text)
print(bigrams)

{'tokenize_sentences': ['it was the best of times,', 'it was the worst of times,', 'it was the age of wisdom,', 'it was the age of foolishness,', 'it was the epoch of belief,', 'it was the epoch of incredulity,', 'it was the season of light,', 'it was the season of darkness,', 'it was the spring of hope,', 'it was the winter of despair,', 'we had everything before us,', 'we had nothing before us,', 'we were all going direct to heaven,', 'we were all going direct the other way--', 'in short, the period was so far like the present period, that some of', 'its noisiest authorities insisted on its being received, for good or for', 'evil, in the superlative degree of comparison only', '']}


In [7]:
funcParam = "tokenize_words"
response = requests.get(lambdaURL + '?url=' + textURL + '&func=' + funcParam)
bigrams = json.loads(response.text)
print(bigrams)

{'tokenize_words': ['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was', 'the', 'worst', 'of', 'times', 'it', 'was', 'the', 'age', 'of', 'wisdom', 'it', 'was', 'the', 'age', 'of', 'foolishness', 'it', 'was', 'the', 'epoch', 'of', 'belief', 'it', 'was', 'the', 'epoch', 'of', 'incredulity', 'it', 'was', 'the', 'season', 'of', 'light', 'it', 'was', 'the', 'season', 'of', 'darkness', 'it', 'was', 'the', 'spring', 'of', 'hope', 'it', 'was', 'the', 'winter', 'of', 'despair', 'we', 'had', 'everything', 'before', 'us', 'we', 'had', 'nothing', 'before', 'us', 'we', 'were', 'all', 'going', 'direct', 'to', 'heaven', 'we', 'were', 'all', 'going', 'direct', 'the', 'other', 'way', 'in', 'short', 'the', 'period', 'was', 'so', 'far', 'like', 'the', 'present', 'period', 'that', 'some', 'of', 'its', 'noisiest', 'authorities', 'insisted', 'on', 'its', 'being', 'received', 'for', 'good', 'or', 'for', 'evil', 'in', 'the', 'superlative', 'degree', 'of', 'comparison', 'only', '']}
