<h1><center>Natural Language Processing with Amazon Comprehend</center></h1>

<p>In this notebook I will be looking at using Natural Language Processing <b>(NLP)</b> techniques to analyse textual data. I will be using Amazon Comprehend which is provisioned by Amazon Web Services <b>(AWS)</b>. Amazon Comprehend is a NLP service that uses machine learning algorithms to uncover unseen insights within text data.</p>

<h2>Sentiment Analysis</h2>

<p>Sentiment analysis is a Natural Language Processing <b>(NLP)</b> technique which is used to determine whether data is positive, negative or neutral. Sentiment analysis or Opinion mining is often used by businesses on textual data to assist them in monitoring their brand and product sentiment in customer feedback and to ultimately understand their customers needs.</p>

<h2>Table of Contents</h2>
<ul>
    <li><a href="#match-report">1 Match Report</a></li>
    <li><a href="#setup">2 Setting up the Environment</a></li>
    <li><a href="#sentiment">3 Sentiment Analysis with Amazon Comprehend</a>
        <ul>
            <li><a href="#line-of-text">3.1 On a line of text</a></li>
            <li><a href="#line-in-report">3.2 On a line in the match report</a></li>
            <li><a href="#full-report">3.3 On full match report</a></li>
            <li><a href="#quotes">3.4 On Indvidual quotes</a></li>
        </ul>
    </li>
    <li><a href="#conclusion">4 Conclusion</a></li>
</ul>
    

<h3 id="match-report">Match Report</h3>

The Analysis will be done on Match reports from Manchester United, the quotes taken from the reports will be of two opposing managers in a Premier League football match. The match in question will be Manchester United vs Aston Villa, the match ended 3-1 to Aston Villa. The quotes are from Unai Emery (Aston Villa Manager) and Erik Ten Hag (Manchester United Manager) and will aim to see what sentiment the quotes are associated with. <a href="https://www.espn.com/soccer/blog-the-match/story/4793789/man-united-loss-at-aston-villa-another-step-back-for-erik-ten-hag-and-donny-van-de-beek">Match report by ESPN</a>

<img src="https://ichef.bbci.co.uk/onesport/cps/976/cpsprodpb/6CA5/production/_127531872_astonvilla.jpg"/>

<h3 id="setup">Setting up the Environment</h3>

In your AWS Account:

<ul>
    <li>Create a <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_console">new IAM user</a></li>
    <li>Once IAM user has been setup Make a note of YOUR_ACCESS_KEY</li>
    <li>Make a note of YOUR_SECRET_KEY</li>

</ul>


In your Terminal:

<ul>
    <li>Ensure the <a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html">AWS CLI is installed</a>, otherwise <i>pip install awscli</i></li>
    <li>use <i>aws configure</i> command to setup your credentials</li>
    <li>aws_access_key_id = <b>YOUR_ACCESS_KEY</b></li>
    <li>aws_secret_access_key = <b>YOUR_SECRET_KEY</b></li>
    <li>Default region name = <b>eu-west-2</b></li>
    <li>Default output name = <b>json</b></li>
    
</ul>

<h3>Import Libraries</h3>

In [36]:
import numpy as np
import pandas as pd
import boto3
import json

<h3 id="sentiment">Sentiment Analysis with Amazon Comprehend</h3>
<h4 id="line-of-text">On a line of Text<h4>

In [37]:
#Boto3 is the AWS SDK for python and makes it easy to integrate your python script
#with AWS services. 
#We create a client with the service name comprehend and also state what regionwe are in
comprehend = boto3.client(service_name="comprehend", region_name="eu-west-2")

In [38]:
#Headline of Match Report is stored in a variable
MUN_vs_AVL_hl = "Man United's resounding loss at Aston Villa another step back for Erik ten Hag and Donny van de Beek"

In [39]:
#We use the detect_sentiment function to inspect our headline text and then 
#returns a json containing the HTTP response information but also sentiment which is what 
#we are really interested in 
print(json.dumps(comprehend.detect_sentiment(Text=MUN_vs_AVL_hl, LanguageCode='en'), sort_keys=True, indent=5))

{
     "ResponseMetadata": {
          "HTTPHeaders": {
               "content-length": "163",
               "content-type": "application/x-amz-json-1.1",
               "date": "Wed, 09 Nov 2022 13:51:40 GMT",
               "x-amzn-requestid": "f7d7e539-907a-4419-a182-4361b8239405"
          },
          "HTTPStatusCode": 200,
          "RequestId": "f7d7e539-907a-4419-a182-4361b8239405",
          "RetryAttempts": 0
     },
     "Sentiment": "NEUTRAL",
     "SentimentScore": {
          "Mixed": 5.4537984397029504e-05,
          "Negative": 0.4732981026172638,
          "Neutral": 0.5132686495780945,
          "Positive": 0.013378795236349106
     }
}


From the title of the Match report, Comprehend as deemed this to have a mostly <b>Neutral sentiment (51%)</b> however with a very high proportion of a <b>Negative sentiment (47%)</b> also being present. As the match was somewhat of a upset as Manchester United were favourites and Aston Villa have recently sacked their manager, the headline also refers to Manchester United taking a step back after having a good run of results in comparison to their start of the season.

In [40]:
# We use the detect_entities function to determine entities in the headline 
print(json.dumps(comprehend.detect_entities(Text=MUN_vs_AVL_hl, LanguageCode='en'), sort_keys=True, indent=5))

{
     "Entities": [
          {
               "BeginOffset": 0,
               "EndOffset": 10,
               "Score": 0.995070219039917,
               "Text": "Man United",
               "Type": "ORGANIZATION"
          },
          {
               "BeginOffset": 32,
               "EndOffset": 43,
               "Score": 0.9842805862426758,
               "Text": "Aston Villa",
               "Type": "ORGANIZATION"
          },
          {
               "BeginOffset": 66,
               "EndOffset": 78,
               "Score": 0.9663777351379395,
               "Text": "Erik ten Hag",
               "Type": "PERSON"
          },
          {
               "BeginOffset": 83,
               "EndOffset": 100,
               "Score": 0.9987688660621643,
               "Text": "Donny van de Beek",
               "Type": "PERSON"
          }
     ],
     "ResponseMetadata": {
          "HTTPHeaders": {
               "content-length": "423",
               "content-type": "applicati

In the headline the Entities present are 'Erik Ten Hag' and 'Donny van de Beek' which are <b>PERSON</b> entities, 'Man United' and 'Aston Villa' are <b>ORGANIZATION</b> entities.

<h4 id="line-in-report">On a line in the match report</h4>

In [41]:
#Read in Match Report from Man Utd vs Real Sociedad game and open and read 
path = "MUN_vs_SOC_MatchReport"
report = open(path, "r")
output = report.readlines()

In [42]:
#Display first line in the match report
output[0]

'A first goal for 18-year-old Alejandro Garnacho was enough to defeat Real Sociedad but was not enough to progress as winners of the group, \n'

In [43]:
print(json.dumps(comprehend.detect_sentiment(Text=output[0], LanguageCode='en'), sort_keys=True, indent=5))

{
     "ResponseMetadata": {
          "HTTPHeaders": {
               "content-length": "162",
               "content-type": "application/x-amz-json-1.1",
               "date": "Wed, 09 Nov 2022 13:51:40 GMT",
               "x-amzn-requestid": "e5bc6117-c4a8-47a3-ad25-6f4663703e8d"
          },
          "HTTPStatusCode": 200,
          "RequestId": "e5bc6117-c4a8-47a3-ad25-6f4663703e8d",
          "RetryAttempts": 0
     },
     "Sentiment": "NEUTRAL",
     "SentimentScore": {
          "Mixed": 0.004636258818209171,
          "Negative": 0.12079549580812454,
          "Neutral": 0.8259332776069641,
          "Positive": 0.04863499104976654
     }
}


The first line of the match report for the Manchester United vs Real Sociedad match that ended 1-0 to Manchester United, had a largely <b>Neutral sentiment (83%)</b>.

<h4 id="full-report">On full Match report</h4>

In [44]:
#Join the whole Match report together
whole_report = ', '.join(map(str, output))

In [45]:
print(json.dumps(comprehend.detect_sentiment(Text=whole_report, LanguageCode='en'), sort_keys=True, indent=5))

{
     "ResponseMetadata": {
          "HTTPHeaders": {
               "content-length": "163",
               "content-type": "application/x-amz-json-1.1",
               "date": "Wed, 09 Nov 2022 13:51:40 GMT",
               "x-amzn-requestid": "f0fe9833-486e-48d2-a95b-3513654f2bbf"
          },
          "HTTPStatusCode": 200,
          "RequestId": "f0fe9833-486e-48d2-a95b-3513654f2bbf",
          "RetryAttempts": 0
     },
     "Sentiment": "NEUTRAL",
     "SentimentScore": {
          "Mixed": 0.00012948377116117626,
          "Negative": 0.09578736126422882,
          "Neutral": 0.8686434626579285,
          "Positive": 0.03543980419635773
     }
}


When taking the larger segement of the Match report into account the sentiment was again largely <b>Neutral (87%)</b> slightly more Neutral than the first line of the match report <b>(83%)</b>.

<h4 id="quotes">On Individual quotes</h4>

In [46]:
#Quote from Unai Emery after winning 3-1 against Manchester United
MUN_vs_AVL_qte1 = "The supporters were amazing supporting us and the players were great with the plan. First step is take confidence, start with energy and the supporters supported with their response. We have the players with the good skills. The way we played the 90 minutes we can be optimistic but only first step and we have to work a lot to keep improving"

In [47]:
print(json.dumps(comprehend.detect_sentiment(Text=MUN_vs_AVL_qte1, LanguageCode='en'), sort_keys=True, indent=5))

{
     "ResponseMetadata": {
          "HTTPHeaders": {
               "content-length": "166",
               "content-type": "application/x-amz-json-1.1",
               "date": "Wed, 09 Nov 2022 13:51:40 GMT",
               "x-amzn-requestid": "06d7af0a-21a9-4150-b430-f7f3b9c210af"
          },
          "HTTPStatusCode": 200,
          "RequestId": "06d7af0a-21a9-4150-b430-f7f3b9c210af",
          "RetryAttempts": 0
     },
     "Sentiment": "POSITIVE",
     "SentimentScore": {
          "Mixed": 0.012709115631878376,
          "Negative": 0.00025073104188777506,
          "Neutral": 0.008580588735640049,
          "Positive": 0.9784594774246216
     }
}


This is the quote from Unai Emery after the game, as you can see this quote had a significantly <b>Positive sentiment (98%)</b> which is to be expected as Aston Villa won the game 3-1 against Manchseter United who were favourites to win the game due to their league position and run of form.

In [48]:
#Quote from Erik Ten Hag after losing 3-1 to Aston Villa
MUN_vs_AVL_qte2 = "I think it was stupid to do that. Because we delivered too quickly crosses in from too far and too much forcing and we don't help him. We have to bring in the crosses in the right moment. We bring the crosses in too quickly. The right moment was Christian Eriksen in the first half in the pocket to deliver the ball to Cristiano at the far post, that was the right moment."

In [49]:
print(json.dumps(comprehend.detect_sentiment(Text=MUN_vs_AVL_qte2, LanguageCode='en'), sort_keys=True, indent=5))

{
     "ResponseMetadata": {
          "HTTPHeaders": {
               "content-length": "163",
               "content-type": "application/x-amz-json-1.1",
               "date": "Wed, 09 Nov 2022 13:51:40 GMT",
               "x-amzn-requestid": "549e136d-f84b-411b-b2ba-6433bd8a7141"
          },
          "HTTPStatusCode": 200,
          "RequestId": "549e136d-f84b-411b-b2ba-6433bd8a7141",
          "RetryAttempts": 0
     },
     "Sentiment": "NEGATIVE",
     "SentimentScore": {
          "Mixed": 0.12997767329216003,
          "Negative": 0.7995894551277161,
          "Neutral": 0.06316803395748138,
          "Positive": 0.007264777086675167
     }
}


This is the quote from Erik Ten Hag after the match, this quote had a <b>Negative sentiment (80%)</b> this was due to them losing the match to Aston Villa who had recently changed their manager and have been on a poor run of form.

<h3 id="conclusion">Conclusion</h3>

In conclusion this notebook was able to demonstrate ways in which Amazon Comprehend can be used, we analysed a full match rerport, individual lines from a match report and quotes from a match and we then derived the sentiment from each of these and was able to give a score percentage of the sentiment. Comprehend is a powerful NLP service with loads of use cases to explore we only touched the surface in this notebook and there is still loads of functions to make use of.