In [None]:
from IPython.core.display import display, HTML
display(HTML('<p style="text-align:center;"><img src="https://www.shareicon.net/data/512x512/2017/05/22/886198_twitter_512x512.png" alt="Twitter Logo"></p>')) 

# <div align='center'>Sentiment Analysis of Twitter Data</div>
## <div align='center'>The New and Improved(?) Twitter API 2.0</div>

Documentation:  
[Complete Walk-thru on my Blog](https:www.girlexmachina.com)  
[YouTube Playlist for all Parts]()  
[Twitter API](https://developer.twitter.com/en/docs/developer-portal/overview)  

## Background
As an application, Twitter has opened up access to opinions and ideas like no other form of social media.  You can follow, tag, and mention anyone with an account to allow others to hear your views and opinions.  As of late, Twitter has been used by many activist to express their concerns to members of the media, government, and corporate america.  The true question is do companies utilize Twitter data for marketing feedback?  It's virtually free and can be a good thermomater for the feelings of the online crowd.

<mark>I think it should go without saying that this is in no way a beginner tutorial.</mark>  Maneuvering the backwaters and bayous of the official Twitter documentation can be a difficult task, even for those who are already initiated.  It's possible that this code will run if you simply have a bearer token.  If you don't know what that is, then I strongly suggest you follow the walk-thru posted to my blog, and definitely the video series.  Otherwise you could be more confused than when you started.

## Overview
In this tutorial, we will analyze tweets before and after a movie trailer release to see if there was positive or negative sentiment towards the movie.  We will then compare our findings with box office numbers to see if our algorithm is accurate!  Scraping the tweets is the most difficult task in this tutorial.  Twitter severely limits the rate at which you can pull tweets (per 15 minutes), the total number of tweets, and also (to prevent the overwhelming of their API, force you to code for pagenation at 10 tweets per request.  As I'm using one of my many Twitter dev accounts, I'm going to try to pull as many extended tweets as possible.  Let's get started!

In [1]:
# Standard Libraries
import os, requests, json
import pandas as pd
import tweepy as tp # just for the authenticatoin test.  Will eventually rewrite

In [2]:
# import our passwords from our config.py file
# Note: YOU WILL ONLY NEED THE BEARER_TOKEN FOR THIS EXERCISE, BUT THE OTHER KEYS ARE USED TO TEST YOUR ACCESS
from config import twit_access_secret, twit_access_token, twit_api_key, twit_api_secret_key, twit_bearer_token

In [3]:
# create directory for saving tweets.  You never want to lose your data when working with an API, especially if you must pay for it!
os.makedirs("scraped_tweets/",exist_ok=True)

## Check Authentication

In [4]:
# This code block is from the official documentation (with a minor, personal touch!)
auth = tp.OAuthHandler("twit_api_key", "twit_api_secret_key")
auth.set_access_token("twit_access_token", "twit_access_secret")
try:
    twitter = tp.API(auth)
    twitter.verify_credentials()
    print('Giddy-up!')
except:
    print('Check your settings!')

Giddy-up!


[Convert Twitter IDs](https://tweeterid.com/)  

In [5]:
#Convert Twitter handle to user_id using site.  Will update this code in the future to do it though the API.
#@marvel => 15687962
i = 0
USER_ID = 15687962
bearer_token = twit_bearer_token

## Harvesting Tweets

Prerequisites:  
- Twitter Developer account (requires approval)
- Setup a development app within twitter account
- App linked to Twitter Developer Labs (not available to everyone)

Coding modularly so that the code is easily updated in the event Twitter make further changes to the API
I will also save this off to a .py file so that I can simply automate the scraping of tweets!

In [6]:
user_id = USER_ID
url = "https://api.twitter.com/2/users/{}/mentions".format(user_id)

EXPANSIONS = 'author_id,referenced_tweets.id,referenced_tweets.id.author_id,in_reply_to_user_id,attachments.media_keys,attachments.poll_ids,geo.place_id,entities.mentions.username'
MEDIA_FIELDS = 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics'
TWEET_FIELDS = 'created_at,author_id,public_metrics'
USER_FIELDS = 'location,profile_image_url,verified'
params =  {'max_results':100,'expansions': EXPANSIONS,'tweet.fields': TWEET_FIELDS,'user.fields': USER_FIELDS,'media.fields': MEDIA_FIELDS}

headers = {"Authorization": "Bearer {}".format(bearer_token)}

In [6]:
response = requests.request("GET", url, headers=headers, params=params)
print(response.status_code)
if response.status_code != 200:
    raise Exception(
        "Request returned an error: {} {}".format(
            response.status_code, response.text
        )
    )
json_response = response.json()
df_tweets = pd.DataFrame(json_response['data'])
df_meta = json_response['meta']
df_tweets.to_csv(f'./scraped_tweets/marvel{i:02d}.csv')

200


In [7]:
params

{'max_results': 100,
 'expansions': 'author_id,referenced_tweets.id,referenced_tweets.id.author_id,in_reply_to_user_id,attachments.media_keys,attachments.poll_ids,geo.place_id,entities.mentions.username',
 'tweet.fields': 'created_at,author_id,public_metrics',
 'user.fields': 'location,profile_image_url,verified',
 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics'}

In [8]:
url = "https://api.twitter.com/2/users/{}/mentions".format(user_id)
for i in range(20):
    token = json_response['meta']['next_token']
    params['pagination_token']=json_response['meta']['next_token']
    #print(url)
    response = requests.request("GET", url, headers=headers, params=params)
    #print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    else:

        json_response = response.json()
        df_tweets = pd.DataFrame(json_response['data'])
        df_meta = json_response['meta']
        df_tweets.to_csv(f'./scraped_tweets/marvel{i:02d}.csv')
        print(f'Round {i} saved!')