# AWS ML APIs
In this notebook, we walk you through setting up and using an AWS machine learning API via Sagemaker. The material in this notebook largely follows the [AWS Developer Guide](https://docs.aws.amazon.com/machine-learning/?id=docs_gateway) for each service while providing details which are specific to using them in an AWS Educate account.

# What is machine learning?
In QTM 220, you learned how to use linear models and their variants. The focus in that class was on the statistics of the models and their estimation. Another aspect of these models is their use for making predictions, a.k.a. inferences on new data. This is typically the focus of machine learning in practice, and linear models are one of the simplest and oldest machine learning models in use. In recent years, new models that are more complex than linear modesl have come to the forefront of machine learning practice as they can provide useful inferences when dealing with fundamental problems.

The text in this subsection are selections from the [Amazon Machine Learning Developer Guide](https://docs.aws.amazon.com/machine-learning/latest/dg/machine-learning-problems-in-amazon-machine-learning.html) Caution: some of the services referenced in that particular guide are no longer available (e.g. Amazon ML). Hence, we present in this section a summary that is tailored to the background of  QTM 350 students.

## Examples of contemporary business problems that ML is commonly used for

Examples of binary classification problems:

* Is this email spam or not spam? (e.g. Gmail spam filter)

* Is this tweet written by a person or a robot? (E.g. Twitter [bot or not](https://blog.twitter.com/en_us/topics/company/2020/bot-or-not.html))

Examples of multiclass classification problems:

* Will this user want to watch a romantic comedy, documentary, or thriller? (E.g. Netflix [Recommendation algorithms](https://research.netflix.com/research-area/recommendations))

* Which category of products is most interesting to this customer? (E.g. every online shopping site)

Examples of regression classification problems:

* How many days before this customer stops using the application? 

* What price will this house sell for? (e.g. Zillow [Zestimate](https://www.zillow.com/how-much-is-my-home-worth/))

## When to use machine learning?
It is important to remember that ML is not a solution for every type of problem. For example, you don’t need ML if you can determine a target value by using simple rules, computations, or predetermined steps that can be programmed without needing any data-driven learning.

### Use machine learning for the following situations:

#### You cannot code the rules
Many human tasks (such as recognizing whether an email is spam or not spam) cannot be adequately solved using a simple (deterministic), rule-based solution. A large number of factors could influence the answer. When rules depend on too many factors and many of these rules overlap or need to be tuned very finely, it soon becomes difficult for a human to accurately code the rules. You can use ML to effectively solve this problem. 

#### You cannot scale
You might be able to manually recognize a few hundred emails and decide whether they are spam or not. However, this task becomes tedious for millions of emails. ML solutions are effective at handling large-scale problems.

### AI vrs ML
Oftentimes simple rules or algorithms will be called AI, however they are not ML. See the section titled "Process automation" in this article [Harvard Business Review article](https://hbr.org/2018/01/artificial-intelligence-for-the-real-world). There, one of the tasks they list as being solved by AI is "transferring data from e-mail and call center systems into systems of record—for example, updating customer files with address changes or service additions". This requires programmatic control of data (for example by architecting a solution in the cloud to solve this task), not machine learning.

In the same section we find, “reading” legal and contractual documents to extract provisions using natural language processing (NLP). This is ML. Indeed, NLP tasks cannot be solved with simple algorithms or programs and require the latest machine learning models.

Talk to Jinho Choi at Emory if you are interested in NLP, as he is the faculty expert on campus and his team is the recent winner of the ultimate prize in this area, the [Alexa prize](https://developer.amazon.com/alexaprize)!  

## Amazon Rekognition
We first look at a machine learning tool for image and video analysis. From the [documentation](https://docs.aws.amazon.com/rekognition/latest/dg/what-is.html) you will find that:
> Amazon Rekognition is based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images and videos daily. It requires no machine learning expertise to use.

Great! Let's get started.

### Set up an IAM Role
In order to use this API within Sagemaker, we will need to update the Role we have been using to control Sagemaker permissions. Recall, when you created your Sagemaker instance, one of the steps was creating a new IAM Role. If you used the suggested default, the name would be similar to AmazonSageMaker-ExecutionRole-0238127377.

#### Where can I find that role?
Go to your Sagemaker dashboard, then notebook instances, then click the notebook instance name to access the page for the "Notebook instance settings". You should then see the page below.

![Notebook instance settings](./screenshot-instance-settings.png)


There, under the heading "Permissions and encryption" click the link to the IAM role ARN. You should then see a view similar to this one below.


![Notebook instance settings](./sagemaker-role.png)

However, your view will have fewer policies, because I have already completed the step that you are about to complete, namely, adding policies to this Sagemaker role.

#### Adding policies
As we work with new AWS services within our notebooks, it will be necessary to add policies which give Sagemaker access to them. To use the examples we will present for working with Amazon Rekognition, you will need to add `AmazonRekognitionFullAccess` permissions. Also, `AmazonS3ReadOnlyAccess` is required for examples that access images or videos that are stored in an Amazon S3 bucket. Finally, the Amazon Rekognition Video stored video code examples also require `AmazonSQSFullAccess` permissions. 

To add them, in the IAM role Summary page (pictured in the last screenshot), click the blue "Attach policies" button. In the search bar, type the names of these services that were just listed, select them by ticking the empty white box next to the name when it appears, and then click the blue "Attach policy" button. 


### Getting started using the console
Before using the Rekognition service programatically, it will be helpful to understand what it does by walking through examples in the AWS console. To do that, complete Exercises 1 through 4 listed [here](https://docs.aws.amazon.com/rekognition/latest/dg/getting-started-console.html), then return to this notebook. 

### Working with the Rekognition API programatically
In this section, you use the Amazon Rekognition Image API operations to analyze images stored in an Amazon S3 bucket.

#### Step 1
Create a new S3 bucket. This can be done graphically via the AWS console, or programaticaly using bash or the Python SDK.

### Set up the AWS CLI
In any new Sagemaker instance, the AWS CLI (Command Line Interface) comes preinstalled. Indeed, to check that this is the case, run the command below.

In [9]:
!aws s3 help

S3()                                                                      S3()



[1mNAME[0m
       s3 -

[1mDESCRIPTION[0m
       This  section  explains  prominent concepts and notations in the set of
       high-level S3 commands provided.

   [1mPath Argument Type[0m
       Whenever using a command, at least one path argument must be specified.
       There are two types of path arguments: [1mLocalPath [22mand [1mS3Uri[22m.

       [1mLocalPath[22m: represents the path of a local file or directory.  It can be
       written as an absolute path or relative path.

       [1mS3Uri[22m: represents the location of a S3 object, prefix, or bucket.  This
       must  be  written in the form [1ms3://mybucket/mykey [22mwhere [1mmybucket [22mis the
       specified S3 bucket, [1mmykey [22mis the specified S3 key.  The path  argument
       must  begin with [1ms3:// [22min order to denote that the path argument refers
       to a S3 object. Note that prefixes are separate

You should see after running the above cell a list of commands that are available for working with S3 using the command line.

Now, run the command `mb` below to make a new bucket. As bucket names must be globally unique, you may need to modify the name slightly, for example by adding your name to the end.

In [12]:
 !aws s3 mb s3://image-api-example

make_bucket: image-api-example


Now, run the `aws s3 ls` command to list buckets in your account.

In [14]:
!aws s3 ls

2020-10-15 11:14:16 image-api-example


Next, we need to add an image to this bucket. There are many ways to do that. We will continue using the aws cli in order to accomplish this.

First, using the JupyterLab file viewer on the left, click the up arrow symbol to upload a file to your Sagemaker instance. Upload either a .jpg or a .png. For example, I uploaded my profile picture named `jeremyjacobson.png`. Now move this file to your bucket using the `aws s3 mv` command. It works just like the `mv` command in Linux. 

In [15]:
!aws s3 mv jeremyjacobson.png s3://image-api-example

move: ./jeremyjacobson.png to s3://image-api-example/jeremyjacobson.png


Having run the cell above, the file should no longer appear in the JupyterLab file viewer. Let's check what is in the bucket we made now.

In [17]:
!aws s3 ls image-api-example

2020-10-15 11:19:22     127954 jeremyjacobson.png


Good, we see that the image is in the bucket. Let's move on to the next step.

##### Detect labels in an image
This example displays the JSON output from the `detect-labels` call to  the Rekognition API. You will need to modify it by replacing Bucket with your bucket's name and replace Name with your file's name.

In [18]:
!aws rekognition detect-labels --image '{"S3Object":{"Bucket":"image-api-example", "Name":"jeremyjacobson.png"}}'

{
    "Labels": [
        {
            "Name": "Human",
            "Confidence": 98.61043548583984,
            "Instances": [],
            "Parents": []
        },
        {
            "Name": "Person",
            "Confidence": 98.61043548583984,
            "Instances": [
                {
                    "BoundingBox": {
                        "Width": 0.8968233466148376,
                        "Height": 0.8738352060317993,
                        "Left": 0.07586783170700073,
                        "Top": 0.11023861169815063
                    },
                    "Confidence": 98.61043548583984
                }
            ],
            "Parents": []
        },
        {
            "Name": "Face",
            "Confidence": 97.62527465820312,
            "Instances": [],
            "Parents": [
                {
                    "Name": "Person"
                }
            ]
        },
        {
            "Name": "Smile",
            "Confidence": 85.758720

The next step could be to use a command line tool like `jq` to extract from this JSON the information we are most interested in and use that to create a new dataset as we did in earlier notebooks.

### Using the Python SDK
We start by importing the package which containts the code for the Python SDK, `boto3`.

In [19]:
#Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)

import boto3

Next, we create an instance `client` of the client object in the `boto3` package for `rekognition`. It will allow use to communicate and make requests to the Rekognition service using Python. 

In [20]:
client=boto3.client('rekognition')

Now, using the client, we use the dot notation to access one of its methods, `detect_labels`. This example displays the labels that were detected in the input image like we did previously using the CLI. Again, replace the values of bucket and photo with the names of the Amazon S3 bucket and image that you used.

In [23]:
response = client.detect_labels(Image={'S3Object':{'Bucket':"image-api-example",'Name':"jeremyjacobson.png"}}, MaxLabels=10)

Let's investigate what we have obtained as a response.

In [24]:
type(response)

dict

We see that it is a Python dictionary. Let's see what are the keys.

In [27]:
response.keys()

dict_keys(['Labels', 'LabelModelVersion', 'ResponseMetadata'])

Let's have a look at the values for the `Labels` key.

In [28]:
response['Labels']

[{'Name': 'Person',
  'Confidence': 98.61043548583984,
  'Instances': [{'BoundingBox': {'Width': 0.8968233466148376,
     'Height': 0.8738352060317993,
     'Left': 0.07586783170700073,
     'Top': 0.11023861169815063},
    'Confidence': 98.61043548583984}],
  'Parents': []},
 {'Name': 'Face',
  'Confidence': 97.62527465820312,
  'Instances': [],
  'Parents': [{'Name': 'Person'}]},
 {'Name': 'Smile',
  'Confidence': 85.75872039794922,
  'Instances': [],
  'Parents': [{'Name': 'Face'}, {'Name': 'Person'}]},
 {'Name': 'Man',
  'Confidence': 83.50458526611328,
  'Instances': [],
  'Parents': [{'Name': 'Person'}]},
 {'Name': 'Clothing',
  'Confidence': 73.89022827148438,
  'Instances': [],
  'Parents': []},
 {'Name': 'Shirt',
  'Confidence': 68.68953704833984,
  'Instances': [],
  'Parents': [{'Name': 'Clothing'}]},
 {'Name': 'Dating',
  'Confidence': 60.82618713378906,
  'Instances': [],
  'Parents': [{'Name': 'Person'}]},
 {'Name': 'Photography',
  'Confidence': 57.578369140625,
  'Insta

We see it is the same data as obtained before. Now, using Python and Pandas we could extract the data we want from this request and use it to create a new dataset.

### Detecting faces in an image
Here is another example. See the [documentation](https://docs.aws.amazon.com/rekognition/latest/dg/faces-detect-images.html) for details on this example.

In [29]:
!aws rekognition detect-faces \
--image '{"S3Object":{"Bucket":"image-api-example","Name":"jeremyjacobson.png"}}' \
--attributes "ALL" 


{
    "FaceDetails": [
        {
            "BoundingBox": {
                "Width": 0.35594138503074646,
                "Height": 0.47563669085502625,
                "Left": 0.28262796998023987,
                "Top": 0.22201739251613617
            },
            "AgeRange": {
                "Low": 33,
                "High": 49
            },
            "Smile": {
                "Value": true,
                "Confidence": 98.935546875
            },
            "Eyeglasses": {
                "Value": true,
                "Confidence": 99.30152893066406
            },
            "Sunglasses": {
                "Value": false,
                "Confidence": 91.17986297607422
            },
            "Gender": {
                "Value": "Male",
                "Confidence": 93.72130584716797
            },
            "Beard": {
                "Value": false,
                "Confidence": 88.77381896972656
            },
            "Mustache": {
                "Value": f

There is much more that can be done, such as detecting text and reading it. See the [linked documentation](https://docs.aws.amazon.com/rekognition/latest/dg/text-detecting-text-procedure.html) for details.

## Other ML Services
There are countless new models that are in use today and not offered by AWS as a core ML service. To use them in Sagemaker we can use any of the opensource frameworks for ML with Python such as PyTorch, Tensorflow, or Gluon. The latter is an AWS contributed library which we will focus on in the final example.