# Pulling Tweets from Twitter

This is the notebook that I use to get twitter data. First I read in the S&P 500 ticker symbols and names. Then I create a list of search words based off those names. The search words are then used in an API pull. I pull the most recent 10000 tweets for each company in the S&P 500 for each day in the last week. API keys are needed, however they are not supplied here. Once the pull is complete the results are saved in a .csv file.

#### Import necessary packages

In [2]:
import os
import tweepy as tw
import pandas as pd
from datetime import date
from datetime import timedelta

#### Read in Twitter API's and Authorize

In [2]:
# insert APIs here


In [3]:
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

#### Tweet Sources - get S&P 500 ticker symbols and names

In [8]:
# ticker symobls and company names
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
sAndp = pd.read_html(url)[0][['Symbol', 'Security']]

#### Get the last week's of dates

In [9]:
# dates (twitter will only give access to the last week's tweets)
dates = [(date.today() - timedelta(days=i)) for i in range(0, 10)]
search_words = sAndp['Symbol'] + " " + "OR '" + sAndp['Security'] + "' -filter:retweets"
num_tweets = 10000

#### Create a data frame from pulled tweets

In [10]:
tweet_agg = pd.DataFrame()

for i in search_words:
    words = i
    for j in range(0, 9):
        
        start_date = dates[j]
        end_date = dates[j + 1]
        
        tweets = tw.Cursor(api.search,
                       q=i,
                       lang="en",
                       until=start_date,
                       since=end_date,
                       wait_on_rate_limit = True,
                       wait_on_rate_limit_notify = True,
                       result_type="popular").items(num_tweets)

        tweet_info = pd.DataFrame([[tweet.text,
                                    tweet.user.screen_name,
                                    tweet.user.location, 
                                    tweet.created_at,
                                    tweet.retweet_count,
                                   words] for tweet in tweets])
        
        tweet_agg = pd.concat([tweet_agg, tweet_info])
        

Rate limit reached. Sleeping for: 859
Rate limit reached. Sleeping for: 862
Rate limit reached. Sleeping for: 859
Rate limit reached. Sleeping for: 866
Rate limit reached. Sleeping for: 872
Rate limit reached. Sleeping for: 865
Rate limit reached. Sleeping for: 870
Rate limit reached. Sleeping for: 873
Rate limit reached. Sleeping for: 864
Rate limit reached. Sleeping for: 869
Rate limit reached. Sleeping for: 866
Rate limit reached. Sleeping for: 866
Rate limit reached. Sleeping for: 865
Rate limit reached. Sleeping for: 869
Rate limit reached. Sleeping for: 862
Rate limit reached. Sleeping for: 865
Rate limit reached. Sleeping for: 866
Rate limit reached. Sleeping for: 864
Rate limit reached. Sleeping for: 874
Rate limit reached. Sleeping for: 871
Rate limit reached. Sleeping for: 873
Rate limit reached. Sleeping for: 873
Rate limit reached. Sleeping for: 869
Rate limit reached. Sleeping for: 866
Rate limit reached. Sleeping for: 863
Rate limit reached. Sleeping for: 866
Rate limit r

In [11]:
tweet_agg[3] = tweet_agg[3].astype(str)

#### Save the data frame to a .csv file.

In [12]:
# i wonder if I could pull the number of followers from the person who tweeted it
tweet_agg.to_csv('tweetData3.csv', index=False)