# Getting Started with Natural Language Processing

This tutorial is targeted at Machine learning beginners who are looking to get started quickly using Amazon's Natural Language Processing APIs.  In this tutorial we will explore how to send and receive data from the Amazon Comprehend API using both the UI, and programatically in Python.  

### What is Machine Learning

Arthur Samuel in 1959 defined machine learning as the "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning models are trained on sets of data that allows a computer algorithm to determine the result when unknown data is encountered.

### What is Natural Language Processing

Natural Language Processing is the application of machine learning to human text and speech. This allows the computer to recognize the appropriate words, phrases, responses and other details from text.

#### Amazon Comprehend

Amazon Comprehend is a natural language processing service which can be used to easily and quickly apply NLP analysis to textual data. Using the comprehend APIs you can understand relationships and insights of your data without having to train your own NLP models.

# **Lab:** Getting Started Using the Amazon Comprehend UI

Getting started with Amazon Comprehend is easy! We will walk through getting started on Amazon Comprehend using the provided UI.

1) Navigate to [Amazon Comprehend Console](https://us-west-2.console.aws.amazon.com/comprehend/v2/home?region=us-west-2#welcome). 

2) Select `Launch Amazon Comprehend` to launch the real time analysis tool.

3) Select `Analyze` to see the results for the sample text.

![Real time analysis](./assets/realtime-analysis.png)

4) By selecting the `analyze` button we have run the sample text through Amazon's Comprehend - Natural Language Processing API. AWS does the heavy lifting for us to classify the text against a machine learning model.  The results window in the `Entities` tab is classified by the Detect Entities operation based on the input text. You can see how different words and phrases are classified along with the models confidence that the analysis is correct.  For a list of different classifications used by the Detect Entities API [see the Amazon documentation.](https://docs.aws.amazon.com/comprehend/latest/dg/how-entities.html)

- Select the `Key Phrases` Tab to view the nouns detected in the input text. 
- Select the `Languages` Tab to view the detected language.
- Select the `Sentiment` Tab to view if the text was determined to have a positive, negative, or neutral tone.
- Select the `Syntax` Tab to view classifications of the text elements as verbs, nouns, adjectives, adposition, adverb, auxilary, or coordinating conjunction.

![Entity result](./assets/entity-result.png)


# **Lab:** Integrating with Amazon Comprehend using Python
Amazon has made it easy to use their services with Python by publishing an SDK called `boto3`. It is imported like any other library in Python is, and with it, we can construct what is called a *client*. If you've never used service SDKs like this before, the concept of a "client" will come up a lot. In short, a client is an object that we provide some basic information such as AWS region and what service we want. This object then will have easy-to-use functions for each thing we'd like to do.

Below is an example of detecting entities on the string *It is raining today in Seattle*. A breakdown of the code can be found below the output.

In [20]:
import boto3
import json

comprehend = boto3.client(service_name='comprehend', region_name='us-west-2')
text = "It is raining today in Seattle"

entities = comprehend.detect_entities(Text=text, LanguageCode='en')
outputStr = json.dumps(entities, sort_keys=True, indent=4)
print(outputStr)

{
    "Entities": [
        {
            "BeginOffset": 14,
            "EndOffset": 19,
            "Score": 0.9999421834945679,
            "Text": "today",
            "Type": "DATE"
        },
        {
            "BeginOffset": 23,
            "EndOffset": 30,
            "Score": 0.999826967716217,
            "Text": "Seattle",
            "Type": "LOCATION"
        }
    ],
    "ResponseMetadata": {
        "HTTPHeaders": {
            "content-length": "199",
            "content-type": "application/x-amz-json-1.1",
            "date": "Thu, 21 May 2020 01:36:14 GMT",
            "x-amzn-requestid": "86200961-7820-41d9-bbbc-db6c13117f3a"
        },
        "HTTPStatusCode": 200,
        "RequestId": "86200961-7820-41d9-bbbc-db6c13117f3a",
        "RetryAttempts": 0
    }
}


We can see in the `Entities` section of the output that the API has marked Seattle as a LOCATION and "today" as a DATE - not bad! Let's take a look at how our code snippet works.

```
import boto3
import json
```
The first two lines of the code are just importing the libraries we need - `boto3` being Amazon's client library, and `json` being a utility to format the output we receive from them.  

```
comprehend = boto3.client(service_name='comprehend', region_name='us-west-2')
```  
The call `boto3.client` is used to build a client for many of Amazon's services - this is why we must say we'd like a Comprehend client by passing a `service_name` argument. The `region_name` argument is set to the AWS Region your project is in - you can see this string by clicking the region dropdown at the top of the AWS console.

![Region String](./assets/region_string.png)

```
text = "It is raining today in Seattle"
```  
Here we are setting a variable equal to the string that we'd like Comprehend to analyze.

```
entities = comprehend.detect_entities(Text=text, LanguageCode='en')
```  
The star of the show - `comprehend.detect_entities`. This method takes an argument `Text`, which we set to the variable we set just a line before. This call also requires a language code, so we've provided "en" to signify we're processing English text.

```
outputStr = json.dumps(entities, sort_keys=True, indent=4)
print(outputStr)
```  
The final two lines are formatting the response so we could print it. 

### Your Turn!
Now it's your turn to write some code for Amazon's Comprehend API. You saw in the web-based exploration tool that Comprehend can analyze the "sentiment" of text. Your task is to analyze the sentiment of the `text` object and store it to an object called `sentimentData`. [API Documentation for Comprehend can be found here](https://docs.aws.amazon.com/comprehend/latest/dg/get-started-api.html) - make sure you study the Python sections!

In [17]:
import boto3
import json

# Set up the client library
comprehend = boto3.client(service_name='comprehend', region_name='us-west-2')

# Specify the string we're going to send to the NLP process
text = "It's a shame that it's raining in Seattle today."

## Your code goes here - make sure the result of your
## sentiment analysis call gets stored in a variable
## called sentimentData

# Print the results
outputStr = json.dumps(sentimentData, sort_keys=True, indent=4)
print(outputStr)

NameError: name 'sentimentData' is not defined