## Using Twitter official API

**Notes**   
Great resource
https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/ipynb/Chapter%201%20-%20Mining%20Twitter.ipynb

Go to twitter dev to get an API key.
https://dev.twitter.com/apps

Must add a phone number. 

In [14]:
import twitter
from twitter_helpers.API_KEYS_TWITTER import CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET  
    # tell students not to put credentials on github
    # be sure not to name folder "twitter" - this will have a namespace conflict with the library twitter
auth = twitter.oauth.OAuth(ACCESS_TOKEN, ACCESS_TOKEN_SECRET,
                           CONSUMER_KEY, CONSUMER_SECRET)
twitter_api = twitter.Twitter(auth=auth)

Obtaining a search result. This is a bit more complicated than requesting n=1000 tweets. Some issues:

1. We need to parse the information returned, as it isn't in a neat Dataframe for us to begin analyzing.

2. We need to make multiple requests to get an amount good for analysis. The limit for the Twitter API is 100 tweets per request and 180 requests per 15 minutes, which gives us 18,000 for 15 minutes.

3. We need to make sure we're not getting duplicate tweets. Since tweets and tweet counts are always changing, we do this with tweet IDs. For the guide I used, see: https://dev.twitter.com/rest/public/timelines

### Let's start by grabbing the first result.

In [15]:
# https://dev.twitter.com/rest/reference/get/search/tweets
# general queries: https://dev.twitter.com/rest/public/search

SEARCH = "Food"  # note that no tweets older than 1 week will be found
search_results = twitter_api.search.tweets(q=SEARCH, count=100)

Now let's investigate the structure of this object.

In [16]:
type(search_results)

twitter.api.TwitterDictResponse

In [17]:
search_results.keys()

dict_keys(['statuses', 'search_metadata'])

Since we want to look at the actual text of the tweets, let's grab the values instead.

In [18]:
results = search_results.values()
results = list(results)  # conver the dict_view object to a indexable list

Check out the results (too long to print)

In [19]:
# results

Looks like we have the results from 100 tweets. Now we can extract the text. Some experimentation gives us the structure of the `results` list.

`results[0: search data or 1: metadata][tweet #][information type]`

**Let's extract the text and turn it into a dataframe.**

In [20]:
# Figure out what keys are available in the python dictionary
results[0][0].keys()

dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'metadata', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])

In [21]:
results[0][0]['text']

"There's a Twist to Rey's Survival in 'The Force Awakens'\n\nThe food rations Rey ate to survive were the direct r… https://t.co/VZxc30s70J"

In [22]:
results[0]

[{'contributors': None,
  'coordinates': None,
  'created_at': 'Mon Mar 06 06:12:06 +0000 2017',
  'entities': {'hashtags': [],
   'symbols': [],
   'urls': [{'display_url': 'twitter.com/i/web/status/8…',
     'expanded_url': 'https://twitter.com/i/web/status/838633268455096320',
     'indices': [113, 136],
     'url': 'https://t.co/VZxc30s70J'}],
   'user_mentions': []},
  'favorite_count': 0,
  'favorited': False,
  'geo': None,
  'id': 838633268455096320,
  'id_str': '838633268455096320',
  'in_reply_to_screen_name': None,
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'is_quote_status': False,
  'lang': 'en',
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'place': None,
  'possibly_sensitive': False,
  'retweet_count': 0,
  'retweeted': False,
  'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>',
  'text': "There's a Twist to Rey's Survival i

In [23]:
tweets = []
for i in range(100):
    tweets.append(results[0][i]['text'])
tweets[0:2]  # take a look at the first 2 tweets

["There's a Twist to Rey's Survival in 'The Force Awakens'\n\nThe food rations Rey ate to survive were the direct r… https://t.co/VZxc30s70J",
 'RT @maryjanep_: always blowing my money on food🙂']

In [24]:
import pandas as pd
dat = pd.DataFrame(pd.Series(tweets), columns=['tweet_text'])
dat.head()

Unnamed: 0,tweet_text
0,There's a Twist to Rey's Survival in 'The Forc...
1,RT @maryjanep_: always blowing my money on food🙂
2,Bedford Stuyvesant #Restoration Corporation Pr...
3,Awesome Indian Food
4,Surround yourself with people who are willing ...


Awesome. We got and processed our data. Now let's grab a lot more tweets. 1000 tweets, or 10 requests, should be enough to play with.