# Vibe Check
### Using Twitter’s API and Sentiment Analysis to Understand What’s the What on the Internet

Today's session will cover:
1.   Setting up Access to the Twitter API and Getting API Access Keys
2.   Getting Data from Twitter using the Twitter API
3.   Basic Data Operations and Data Cleaning
4.   Sentiment Analysis With Python using NLTK

<!--- TODO: Slides with: Intros - who we are, what does FN do overview, session goals/ what is an API + QR code with link to public google colab + session end slides - career guidance? (One of our learning objectives was "What jobs or internships can you search for to use the skills covered in this workshop?" -->


# Data Retrieval

## APIs
### What are APIs?

An API is the most popular way to access data programmatically - API documentation will tell our clients what is available and how to “ask” our API for it.

If you've ever seen tweets embedded on a webpage, those were pulled in via an API!

First, we're going to set up some libraries, and our API authentication information:

We have our "bearer_token" - like a secret password that belongs to only us so Twitter knows who exactly is asking it for data - stored in a file. We're going to read the data in:


Next, we'll set up what we need to make the actual request to the Twitter API:
1. The information telling Twitter exactly who we are:
    - **`bearer_token`**: the secret password, to Authenticate us
    - **`User-Agent`**: a name for what project we're working on.
    - This information is important for Twitter to track so they can keep track of who is using their API and make sure that nobody is abusing the API. Pretty much every API will require you to identify yourself in some way before you can get data back.


2. The URL we're going to request. In this case: `https://api.twitter.com/2/tweets/search/recent`
    - **`api.twitter.com`**: tells Twitter we're trying to hit the API, as opposed to the main feed/user interface.
    - **`2`**: shows that we're hitting Version 2.0 of the API. If we put `1` instead, we would hit the 1st version, which would both require slightly different request syntax, and would return data formatted differently.
    - **`tweets`**: indicates which data type we want to request. We could also input `users`, `spaces`, or `lists` to get different datatypes back.
    - **`search`**: says we want to search over tweets. We could also put `counts` to get the number of tweets, or we could look up tweets directly by their IDs. `search` allows us to give Twitter a query - a set of terms we want to include or exclude - and we'll get back tweets that match our query terms.
    - **`recent`**: Twitter allows you to search either over only Tweets from the last week, or `all` Tweets, depending on your level of access. We'll stick to `recent`, because we're interested in what's happening on Twitter right now. 


In [None]:
# Add comments explaining things
import requests
import json
import urllib
import ipywidgets as widgets
from IPython.display import display

with open(f"../utils/bearer_token.txt", "r") as token_file:
    bearer_token = token_file.read()
    
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent": "stem-for-her-demo"
}

search_url = "https://api.twitter.com/2/tweets/search/recent?"

## Building a Query

See: https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

<!--- Audience Participation here - ask for hashtags/ keyword search ideas - maybe pull up twitter trends on a screen? Live edit notebook to change search keywords-->

-- Harry Styles, Elon Musk, other things that are trending, Taylor Swift tour?

### Optional Fields
tweet.fields lets us add specific fields -  here we add `created_at`

### Query String

-is:retweet *excludes* any retweets



In [None]:
query_string = '#twitter ' # tweets #HarryStyles hashtag
query_string += '"elon musk" ' # tweets that have "watermelon sugar" somewhere in their text
query_string += '-is:retweet ' # eliminate retweets
print(query_string)

See all the different operators types you can add to your search here: https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#operators

## GET vs POST requests

When using APIs, there are multiple ways you can engage with them. The API Documentation will tell you what you're able to do, but one important thing to know about is what _type_ of requests you can make.

`GET` requests are exactly what they sound like - you usually use them to _GET_ data back from the API. 
`POST` requests are a little more complicated, but generally they are used to _create_ data via the API. Any Twitter Bot you see is going to be using POST requests to create Tweets. See: https://twitter.com/MagicRealismBot


In [None]:
query_widget = widgets.Textarea(
    value= query_string,
    placeholder='Enter a search string',
    description='String:',
    disabled=False
)
display(query_widget)

# HTTP Status Codes
>> Karnika TODO

In [None]:
query_params = {'query': query_widget.value,
                'tweet.fields': 'created_at,id,lang,source,text', # what data we want to return
                'expansions': 'author_id'     # will include the profile ID of the author
               }

response = requests.get(url=search_url,params=query_params,
                       headers=headers)
print(response.status_code)
print(json.dumps(response.json(), indent=2))


# What is JSON

In [None]:
count_request = "https://api.twitter.com/2/tweets/counts/recent?query=" + urllib.parse.quote(query_widget.value) + "&granularity=day"
tweet_counts = requests.get(count_request, headers=headers)
print(tweet_counts.status_code)
print(json.dumps(tweet_counts.json(), indent=2))

# Visualizing things

In [None]:
#%pip install plotly
#%pip install pandas
# create a requirements.txt file
import plotly.express as px
import plotly.graph_objs as go
import pandas as pd

tweet_count_df = pd.DataFrame(tweet_counts.json()["data"])
tweet_count_df.head(5)


# When did dates get so complicated
#alwaysbegoogling

In [None]:
from datetime import datetime as dt

for i, r in tweet_count_df.iterrows():
    tweet_count_df.loc[i, 'day'] = dt.strptime(tweet_count_df.loc[i, 'start'], "%Y-%m-%dT%H:%M:%S.%fZ").date()
    
tweet_count_df.head(5)

In [None]:
fig = go.Figure(data=go.Scatter(x=tweet_count_df['day'].astype(dtype=str), 
                        y=tweet_count_df['tweet_count'],
                        marker_color='indianred', text="tweet_count"))
fig.update_layout({"title": 'Recent tweets',
                   "xaxis": {"title":"Days"},
                   "yaxis": {"title":"Total tweets"},
                   "showlegend": False})
#fig.write_image("by-day.png",format="png", width=1000, height=600, scale=3)
fig.show()
# Add query to chart so we know what the search was?

In [None]:
#Create widget again

# Getting Started with Sentiment Analysis

<!--- TODO: Add more here - what is NLP? Short explanation of word tokenization
Ref: https://realpython.com/python-nltk-sentiment-analysis/#using-nltks-pre-trained-sentiment-analyzer
-->

In [None]:
#%pip install nltk
import nltk

#nltk.download(["names", "stopwords", "averaged_perceptron_tagger", "vader_lexicon","punkt"])


In [None]:
words = []
json_response = response.json()
for item in json_response.get('data'):
    words.extend(nltk.word_tokenize(item.get('text')))
unwanted = nltk.corpus.stopwords.words("english")
unwanted.extend([w.lower() for w in nltk.corpus.names.words()])
words_clean = [w for w in words if w.isalpha() and w not in unwanted]

In [None]:
fd = nltk.FreqDist(words_clean)
print(fd.most_common(10))
print(fd.tabulate(5))
# Remove the https? is that in stopwords?

# What is NLP, Machine Learning etc


In [None]:
from nltk.sentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()

tweets = [t['text'].replace("://", "//") for t in json_response.get('data')]

def is_positive(tweet: str) -> bool:
    """True if tweet has positive compound sentiment, False otherwise."""
    return sia.polarity_scores(tweet)["compound"] > 0, sia.polarity_scores(tweet)["compound"]

compound_sentiment = {'tweet': [], 'pos': [], 'comp_score': []}

for t in tweets:
    compound_sentiment['tweet'].append(t)
    is_pos, score = is_positive(t)
    compound_sentiment['pos'].append(is_pos)
    compound_sentiment['comp_score'].append(score)

compound_sentiment_df = pd.DataFrame(compound_sentiment)
compound_sentiment_df.head(5)



In [None]:
def create_gauge():
    fig_gc = go.Figure(go.Indicator(
    mode = "gauge+number",
    value = tw_select.value,
    gauge = {'axis': {'range': [-1, 1]}, 
             'bar': {'color':'darkslategray'},
             'steps': [{'range': [-1, 0], 'color': 'lightcoral'}, {'range': [0, 1], 'color': 'lightgreen'}]},
    domain = {'x': [0,1], 'y': [0,1]},
    title = "Average sentiment"))
    # make space for explanation / annotation
    fig_gc.update_layout(margin=dict(l=20, r=20, t=20, b=60),paper_bgcolor="white")

    # add annotation
    fig_gc.add_annotation(dict(font=dict(color='darkslategray',size=15),
                                        x=0,
                                        y=-0.12,
                                        showarrow=False,
                                        text=tw_select.label,
                                        textangle=0,
                                        xanchor='left',
                                        xref="paper",
                                        yref="paper"))
    fig_gc.show()

In [None]:
mean_sentiment = compound_sentiment_df.mean(numeric_only=True)['comp_score']

tw_options = [(compound_sentiment_df.loc[i, 'tweet'], compound_sentiment_df.loc[i, 'comp_score']) for i, r in compound_sentiment_df.iterrows()]
tw_options.append(('Average Score', mean_sentiment))
tw_select = widgets.Dropdown(options=tw_options,
                             value = mean_sentiment,
                             description='Tweet:')
caption = widgets.Label(value='initial pos')

def handle_change(change):
    create_gauge()
    

tw_select.observe(handle_change)

display(tw_select)

create_gauge()
# Add garbage collection for gauge
