# Amazon Comprehend Demo

Welcome to the Amazon Comprehend Demo!

You are about to discover Amazon Comprehend Features and do a deep dive building your own custome classifier.

<a id="TOC"></a>
*Table of Content*
1. [Introduction](#Introduction)
1. [Tips](#Tips)
1. [Setup](#Setup)
1. [Amazon Comprehend features](#ComprehendFeatures)
1. [Building a custom classifier](#BuildingACustomClassifier)
    1. [Open Dataset](#OpenDataset)
    1. [Cleanup](#Cleanup)
1. [What's next](#WhatsNext)
    1. [Document Understanding Solution](#DocumentUnderstandingSolution)
    
*Note: this notebooks is based on the [Building a custom classifier using Amazon Comprehend](https://aws.amazon.com/blogs/machine-learning/building-a-custom-classifier-using-amazon-comprehend/) blog post.*

<a id="Introduction"></a>
## Introduction

*([back to Table of Content](#TOC))*

[Amazon Comprehend](https://aws.amazon.com/comprehend/) is a natural language processing (NLP) service that uses machine learning (ML) to find insights and relationships in texts. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; and understands how positive or negative the text is. For more information about everything Amazon Comprehend can do, see [Amazon Comprehend Features](https://aws.amazon.com/comprehend/features/).

You may need out-of-the-box NLP capabilities tied to your needs without having to lead a research phase. This would allow you to recognize entity types and perform document classifications that are unique to your business, such as recognizing industry-specific terms and triaging customer feedback into different categories.

Amazon Comprehend is a perfect match for these use cases. In November 2018, Amazon Comprehend added the ability for you to train it to recognize custom entities and perform custom classification. For more information on building custom entities, please see [Build Your Own Natural Language Models on AWS (no ML experience required)](https://aws.amazon.com/blogs/machine-learning/build-your-own-natural-language-models-on-aws-no-ml-experience-required/).

<a id="Tips"></a>
### Tips

The main tip for today if you are new to Python Notebooks: `SHIFT` + `ENTER` will execute code cell.

<a id="Setup"></a>
## Setup

*([back to Table of Content](#TOC))*

Before moving onto the next step, there are some initial steps required to setup the notebook.

This notebook requires additional Python packages:

- boto3 is required to download the open source data.
- panda is required to transform the data and make them compatible with Amazon Comprehend Custom Classifier
- tqdm is required to monitor what is happening during data preparation

In [1]:
import sys
!{sys.executable} -m pip install boto3==1.15.14
!{sys.executable} -m pip install pandas==1.1.3
!{sys.executable} -m pip install tqdm==2.2.3

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.[0m


<a id="ComprehendFeatures"></a>
## Amazon Comprehend features

*([back to Table of Content](#TOC))*

In [7]:
import boto3

client = boto3.client('comprehend')

text = "Ceci est une belle journée pour découvrir les services d'AI AWS de haut niveau ! 😊"

lang = client.batch_detect_dominant_language(
    TextList=[
        text,
    ]
)
print("Dominant Language Row object: ", lang)

lang_code = lang['ResultList'][0]['Languages'][0]['LanguageCode']
print("Dominant Language Code: ", lang_code, "\n")

sentiment = client.detect_sentiment(
    Text=text,
    LanguageCode=lang_code
)
print("Sentiment Row object: ", sentiment)
print("Sentiment Score: ", sentiment['Sentiment'])


Dominant Language Row object:  {'ResultList': [{'Index': 0, 'Languages': [{'LanguageCode': 'fr', 'Score': 0.9968114495277405}]}], 'ErrorList': [], 'ResponseMetadata': {'RequestId': 'bf5288e0-3f05-4da1-86b6-a9325f062218', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bf5288e0-3f05-4da1-86b6-a9325f062218', 'content-type': 'application/x-amz-json-1.1', 'content-length': '106', 'date': 'Wed, 07 Oct 2020 21:27:57 GMT'}, 'RetryAttempts': 0}}
Dominant Language Code:  fr 

Sentiment Row object:  {'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.9989319443702698, 'Negative': 1.3031689377385192e-05, 'Neutral': 0.001054349122568965, 'Mixed': 6.974259463277122e-07}, 'ResponseMetadata': {'RequestId': '35da29f3-cf7f-46e7-8e0a-6894dc9e178f', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '35da29f3-cf7f-46e7-8e0a-6894dc9e178f', 'content-type': 'application/x-amz-json-1.1', 'content-length': '166', 'date': 'Wed, 07 Oct 2020 21:27:57 GMT'}, 'RetryAttempts': 0}}
Sentime

<a id="BuildingACustomClassifier"></a>
## Building a custom classifier

*([back to Table of Content](#TOC))*

<a id="OpenDataset"></a>
### Open Dataset

<a id="Cleanup"></a>
### Cleanup

<a id="WhatsNext"></a>
## What's next?

*([back to Table of Content](#TOC))*

<a id="DocumentUnderstandingSolution"></a>
### Document Understanding Solution

![DUS_Arch.png](attachment:DUS_Arch.png)