# Analyze Tweets
## Overview
This notebook is used to create a prototype that uses AWS to perform NLP analysis on the output of the `extract-tweets` notebook. The purpose of this prototype is to generate an output that will be fed into Tigergraph to produce our model for a user's confirmation bias.

## Set up dependencies
Execute the line below to set up dependencies

In [2]:
import sys
!{sys.executable} -m pip install -r requirements.txt

[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[0m

## Prepare our data

We assume that the user-tweets.csv will be in our directory. If it's not present, it would be good to run the `extract-tweets` to generate the output file.

In [82]:
import pandas as pd
import re

# Load all user tweets
user_tweets = pd.read_csv('./user-tweets.csv')

# Drop index column attached to user tweets csv
user_tweets = user_tweets.drop(columns=['Unnamed: 0'])

# Clean up the links from the text (they're useless to us)
user_tweets['text'] = user_tweets['text'].apply(lambda x: re.split('https:\/\/.*', str(x))[0])

# Remove all emojis
user_tweets = user_tweets.astype(str).apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))

# Remove blank tweets
user_tweets = user_tweets[user_tweets.text.str.strip().str.len() != 0]

# Print out results
user_tweets.head()

Unnamed: 0,tweet_id,username,text
1,1511297746661253120,thesheetztweetz,Breaking - Amazon $AMZN signed the biggest roc...
2,1511153708654120963,thesheetztweetz,The U.S. Air Force's 388th Fighter Wing tested...
3,1511137391263715331,thesheetztweetz,U.S. Space Force Brig. Gen. Stephen Purdy rece...
4,1511087590832758789,thesheetztweetz,"Due the vent valve issue, the launch director ..."
5,1510994152175149062,thesheetztweetz,The countdown clock has now resumed at T-6:40 ...


## Analyse tweets using AWS

For our backend infrastructure, we'll be using AWS Comprehend as our machine learning component. It contains pre-trained models that can perform Key Phrase extraction, Sentiment analysis and Topic Modeling operations. Nothing custom required at this stage.

In [112]:
import boto3
import json

region = 'ap-southeast-1'
language_code = 'en'

comprehend = boto3.client('comprehend', region_name=region)

def detect_key_phrases(text, language_code):
    response = comprehend.detect_key_phrases(Text=text, LanguageCode=language_code)
    return response

def detect_entities(text, language_code):
    response = comprehend.detect_entities(Text=text, LanguageCode=language_code)
    return response

def detect_sentiment(text, language_code):
    response = comprehend.detect_sentiment(Text=text, LanguageCode=language_code)
    return response

text = user_tweets.iloc[20].text

sentiment = detect_sentiment(text, language_code)
key_phrases = detect_key_phrases(text, language_code)
entities = detect_entities(text, language_code)

print(text)
print(sentiment)
print(key_phrases)
print(entities)

The sounds of @inspiration4x DragonResilience during a phasing burn.I described in moment as an orchestra but its a more percussion-like rhythm &amp; very pleasant. So thankful for @SpaceX's talented team &amp; all the giants @NASA whose shoulders we stand on. @PolarisProgram up soon 
{'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.9946348667144775, 'Negative': 0.000329022848745808, 'Neutral': 0.004913660231977701, 'Mixed': 0.00012249790597707033}, 'ResponseMetadata': {'RequestId': '73509e74-b88f-446f-9ce8-941765245803', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '73509e74-b88f-446f-9ce8-941765245803', 'content-type': 'application/x-amz-json-1.1', 'content-length': '165', 'date': 'Wed, 06 Apr 2022 10:49:52 GMT'}, 'RetryAttempts': 0}}
{'KeyPhrases': [{'Score': 0.9979216456413269, 'Text': 'The sounds', 'BeginOffset': 0, 'EndOffset': 10}, {'Score': 0.9858503937721252, 'Text': '@inspiration4x DragonResilience', 'BeginOffset': 14, 'EndOffset': 45}, {'Score': 0.990