# Vibe Check
### Using Twitter’s API and Sentiment Analysis to Understand What’s the What on the Internet

Today's session will cover:
1.   Setting up Access to the Twitter API and Getting API Access Keys
2.   Getting Data from Twitter using the Twitter API
3.   Basic Data Operations and Data Cleaning
4.   Sentiment Analysis With Python using NLTK

<!--- TODO: Slides with: Intros - who we are, what does FN do overview, session goals/ what is an API + QR code with link to public google colab + session end slides - career guidance? (One of our learning objectives was "What jobs or internships can you search for to use the skills covered in this workshop?" -->


# Data Retrieval

## APIs
### What are APIs?

An API is the most popular way to access data programmatically - API documentation will tell our clients what is available and how to “ask” our API for it.

If you've ever seen tweets embedded on a webpage, those were pulled in via an API!

First, we're going to set up some libraries, and our API authentication information:

In [None]:
import requests
import json

headers = {
    "Authorization": f"Bearer <insert-token-here>",
    "User-Agent": "stem-for-her-demo"
}

search_url = "https://api.twitter.com/2/tweets/search/recent"

## Building a Query

<!--- Audience Participation here - ask for hashtags/ keyword search ideas - maybe pull up twitter trends on a screen? Live edit notebook to change search keywords-->

-- Harry Styles, Elon Musk, other things that are trending, Taylor Swift tour?

### Optional Fields
tweet.fields lets us add specific fields -  here we add created_at

### Query String

-is:retweet *excludes* any retweets



In [None]:
query_params = {'query': '#harrystyles -is:retweet',
                'tweet.fields': 'created_at',
                'expansions': 'author_id'}

# TODO: Use stream tweets/ archive search to get more results than just 'recent'

response = requests.get(search_url, headers=headers, params=query_params)
print(response.status_code)

if response.status_code != 200:
    raise Exception(response.status_code, response.text)

json_response = response.json()
print(json.dumps(json_response, indent=4, sort_keys=True))

# TODO: Maybe add tweet counts and visualize a trend line?

# Getting Started with NLTK

<!--- TODO: Add more here - what is NLP? Short explanation of word tokenization
Ref: https://realpython.com/python-nltk-sentiment-analysis/#using-nltks-pre-trained-sentiment-analyzer
-->

In [None]:
%pip install nltk
import nltk

nltk.download(["names", "stopwords", "averaged_perceptron_tagger", "vader_lexicon","punkt"])


In [None]:
words = []
for item in json_response.get('data'):
    words.extend(nltk.word_tokenize(item.get('text')))
unwanted = nltk.corpus.stopwords.words("english")
unwanted.extend([w.lower() for w in nltk.corpus.names.words()])
words_clean = [w for w in words if w.isalpha() and w not in unwanted]

In [None]:
fd = nltk.FreqDist(words_clean)
print(fd.most_common(5))
print(fd.tabulate(5))

In [None]:
from nltk.sentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()

tweets = [t['text'].replace("://", "//") for t in json_response.get('data')]

def is_positive(tweet: str) -> bool:
    """True if tweet has positive compound sentiment, False otherwise."""
    return sia.polarity_scores(tweet)["compound"] > 0

for t in tweets[:20]:
    print(">", is_positive(t), t)