#  Why Analyze Twitter Data?

There are many reasons you may want to analyze Twitter data. Which of these is NOT an area of data science you could use analyzing Twitter data for?

### Possible Answers


    Analyzing the mentions of each political party in an election.
    
    
    Detecting the reactions to the introduction of a new product.
    
    
    Understanding the geographical scope of discussion of a news story.
    
    
    Uncovering the motives of Twitter users following a hashtag. {Answer}

# Uses of Twitter analysis

You've been asked to identify the success (or failure) of a particular product. What Twitter analysis strategy could you use to best execute this?

### Possible Answers


    Collect mentions of the product and identify if people are talking about it positively.
    
    
    Examine the size of the retweet network mentioning the product.
    
    
    Analyzing the geographical penetration of users mentioning the product.
    
    
    All of the above. {Answer}

# Twitter APIs

True or False : I could collect data from last year based on keyword searches with the Streaming API.

### Possible Answers


    True: The Streaming API allows historical data collection on keywords, user IDs, and locations.
    
    
    False: The Streaming API only allows real-time data collection on ads.
    
    
    False: The Streaming API only allows real-time data collection on keywords, user IDs, and locations. {Answer}
    
    
    False: The Streaming API only allows access from the past week.

In [13]:
access_token, access_token_secret, consumer_key ,consumer_secret = '', '', '', ''

In [14]:
# exercise 01

"""
Setting up tweepy authentication

In the video, we saw how tweepy can be used to collect Twitter data with the Streaming API. tweepy requires a Twitter API key to authenticate with Twitter.

In this exercise, you will load several objects from tweepy and set up the authentication for the package.

The API keys access_token, access_token_secret, consumer_key, and consumer_secret have already been defined for you.
"""

# Instructions

"""

    Import OAuthHandler and API from the tweepy module.

    Pass consumer_key and consumer_secret to OAuthHandler.

    Set the access tokens with access_token and access_token_secret.

    Pass the auth object to the API.

"""

# solution

from tweepy import OAuthHandler
from tweepy import API

# Consumer key authentication
auth = OAuthHandler(consumer_key, consumer_secret)

# Access key authentication
auth.set_access_token(access_token, access_token_secret)

# Set up the API with the authentication handler
api = API(auth)

#----------------------------------#

# Conclusion

"""
Great! You are now authenticated.
"""

ModuleNotFoundError: No module named 'tweepy'

In [None]:
# exercise 02

"""
Collecting data on keywords

Now that we've set up the authentication, we can begin to collect Twitter data. Recall that with the Streaming API, we will be collecting real-time Twitter data based on either a sample or filtered by a keyword.

In our example, we will collect data on any tweet mentioning #rstats or #python in the tweet text, username, or user description with the filter endpoint.

The SListener module has already been defined and imported for you.
"""

# Instructions

"""

    Import Stream from tweepy.

    Set keywords_to_track to a list containing #rstats and #python.

    Pass the auth and listen objects to Stream.

    Set the keyword argument track equals to keywords_to_track.

"""

# solution

from tweepy import Stream

# Set up words to track
keywords_to_track = ["#rstats", "#python"]

# Instantiate the SListener object 
listen = SListener(api)

# Instantiate the Stream object
stream = Stream(auth, listen, access_token, access_token_secret)

# Begin collecting data
stream.filter(track = keywords_to_track)

#----------------------------------#

# Conclusion

"""
Good job! You are now collecting tweets.
"""

'\n\n'

In [15]:
tweet_json = '''{
  "created_at": "Thu Apr 19 14:25:04 +0000 2018",
  "id": 986973961295720449,
  "id_str": "986973961295720449",
  "text": "Writing out the script of my @DataCamp class and I can't help but mentally read it back to myself in @hugobowne's voice.",
  "truncated": false,
  "entities": {
    "hashtags": [],
    "symbols": [],
    "user_mentions": [
      {
        "screen_name": "DataCamp",
        "name": "DataCamp",
        "id": 1568606814,
        "id_str": "1568606814",
        "indices": [29, 38]
      },
      {
        "screen_name": "hugobowne",
        "name": "Hugo Bowne-Anderson",
        "id": 1092509048,
        "id_str": "1092509048",
        "indices": [101, 111]
      }
    ],
    "urls": []
  },
  "metadata": {
    "iso_language_code": "en",
    "result_type": "recent"
  },
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": null,
  "in_reply_to_user_id_str": null,
  "in_reply_to_screen_name": null,
  "user": {
    "id": 661613,
    "id_str": "661613",
    "name": "Alex Hanna, Data Witch",
    "screen_name": "alexhanna",
    "location": "Toronto, ON",
    "description": "Assistant professor @UofT. Protest, media, computation. Trans. Roller derby athlete @TOROLLERDERBY (Kate Silver #538). She/her.",
    "url": "https://t.co/WGddk8Cc6v",
    "entities": {
      "url": {
        "urls": [
          {
            "url": "https://t.co/WGddk8Cc6v",
            "expanded_url": "http://alex-hanna.com",
            "display_url": "alex-hanna.com",
            "indices": [0, 23]
          }
        ]
      },
      "description": {
        "urls": []
      }
    },
    "protected": false,
    "followers_count": 4267,
    "friends_count": 2801,
    "listed_count": 246,
    "created_at": "Thu Jan 18 20:37:52 +0000 2007",
    "favourites_count": 23387,
    "utc_offset": -14400,
    "time_zone": "Eastern Time (US & Canada)",
    "geo_enabled": true,
    "verified": false,
    "statuses_count": 71840,
    "lang": "en",
    "contributors_enabled": false,
    "is_translator": false,
    "is_translation_enabled": false,
    "profile_background_color": "000000",
    "profile_background_image_url": "http://abs.twimg.com/images/themes/theme16/bg.gif",
    "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme16/bg.gif",
    "profile_background_tile": false,
    "profile_image_url": "http://pbs.twimg.com/profile_images/980799823900180483/J9CDOX_X_normal.jpg",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/980799823900180483/J9CDOX_X_normal.jpg",
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/661613/1514976085",
    "profile_link_color": "0671B8",
    "profile_sidebar_border_color": "666666",
    "profile_sidebar_fill_color": "CCCCCC",
    "profile_text_color": "333333",
    "profile_use_background_image": false,
    "has_extended_profile": false,
    "default_profile": false,
    "default_profile_image": false,
    "following": false,
    "follow_request_sent": false,
    "notifications": false,
    "translator_type": "regular"
  },
  "geo": null,
  "coordinates": null,
  "place": null,
  "contributors": null,
  "is_quote_status": false,
  "retweet_count": 0,
  "favorite_count": 1,
  "favorited": false,
  "retweeted": false,
  "lang": "en"
}'''

In [16]:
# exercise 03

"""
Loading and accessing tweets

In the video, we loaded a tweet we collected using tweepy into Python. Tweets arrive from the Streaming API in JSON format and need to be converted into a Python data structure.

In this exercise, we'll load a single tweet into Python and print out some fields.

The tweet JSON has been loaded for you and is stored in tweet_json.
"""

# Instructions

"""

    Import the json module.

    Convert the tweet JSON stored in tweet_json from JSON to Python object using json's .loads() method.

    Print the tweet text and id using the appropriate keys.

"""

# solution

# Load JSON
import json

# Convert from JSON to Python object
tweet = json.loads(tweet_json)

# Print tweet text
print(tweet['text'])

# Print tweet id
print(tweet['id'])

#----------------------------------#

# Conclusion

"""

"""

Writing out the script of my @DataCamp class and I can't help but mentally read it back to myself in @hugobowne's voice.
986973961295720449


'\n\n'

In [18]:
# exercise 04

"""
Accessing user data

Much of the data which we want to know about the Twitter data is stored in child JSON objects. We will access several parts of the user's information with the user child JSON object.

The tweet from the previous exercise has been loaded for you.
"""

# Instructions

"""

    Print the user handle with key screen_name.
    Print the user follower count with key followers_count.
    Print the user self-defined location with key location.
    Print the user self-defined description with key description.

"""

# solution

# Print user handle
print(tweet['user']['screen_name'])

# Print user follower count
print(tweet['user']['followers_count'])

# Print user location
print(tweet['user']['location'])

# Print user description
print(tweet['user']['description'])

#----------------------------------#

# Conclusion

"""
Excellent!
"""

alexhanna
4267
Toronto, ON
Assistant professor @UofT. Protest, media, computation. Trans. Roller derby athlete @TOROLLERDERBY (Kate Silver #538). She/her.


'\nExcellent!\n'

In [25]:
rt = {'created_at': 'Thu Apr 19 12:45:59 +0000 2018',
 'id': 986949027123154944,
 'id_str': '986949027123154944',
 'text': "RT @hannawallach: ICYMI: NIPS/ICML/ICLR are looking for a full-time programmer to run the conferences' submission/review processes. More in…",
 'truncated': False,
 'entities': {'hashtags': [],
  'symbols': [],
  'user_mentions': [{'screen_name': 'hannawallach',
    'name': 'Hanna Wallach',
    'id': 823957466,
    'id_str': '823957466',
    'indices': [3, 16]}],
  'urls': []},
 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 661613,
  'id_str': '661613',
  'name': 'Alex Hanna, Data Witch',
  'screen_name': 'alexhanna',
  'location': 'Toronto, ON',
  'description': 'Assistant professor @UofT. Protest, media, computation. Trans. Roller derby athlete @TOROLLERDERBY (Kate Silver #538). She/her.',
  'url': 'https://t.co/WGddk8Cc6v',
  'entities': {'url': {'urls': [{'url': 'https://t.co/WGddk8Cc6v',
      'expanded_url': 'http://alex-hanna.com',
      'display_url': 'alex-hanna.com',
      'indices': [0, 23]}]},
   'description': {'urls': []}},
  'protected': False,
  'followers_count': 4267,
  'friends_count': 2801,
  'listed_count': 246,
  'created_at': 'Thu Jan 18 20:37:52 +0000 2007',
  'favourites_count': 23387,
  'utc_offset': -14400,
  'time_zone': 'Eastern Time (US & Canada)',
  'geo_enabled': True,
  'verified': False,
  'statuses_count': 71840,
  'lang': 'en',
  'contributors_enabled': False,
  'is_translator': False,
  'is_translation_enabled': False,
  'profile_background_color': '000000',
  'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme16/bg.gif',
  'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme16/bg.gif',
  'profile_background_tile': False,
  'profile_image_url': 'http://pbs.twimg.com/profile_images/980799823900180483/J9CDOX_X_normal.jpg',
  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/980799823900180483/J9CDOX_X_normal.jpg',
  'profile_banner_url': 'https://pbs.twimg.com/profile_banners/661613/1514976085',
  'profile_link_color': '0671B8',
  'profile_sidebar_border_color': '666666',
  'profile_sidebar_fill_color': 'CCCCCC',
  'profile_text_color': '333333',
  'profile_use_background_image': False,
  'has_extended_profile': False,
  'default_profile': False,
  'default_profile_image': False,
  'following': False,
  'follow_request_sent': False,
  'notifications': False,
  'translator_type': 'regular'},
 'geo': None,
 'coordinates': None,
 'place': None,
 'contributors': None,
 'retweeted_status': {'created_at': 'Tue Mar 06 23:50:35 +0000 2018',
  'id': 971171213216239616,
  'id_str': '971171213216239616',
  'text': "ICYMI: NIPS/ICML/ICLR are looking for a full-time programmer to run the conferences' submission/review processes. M… https://t.co/aB9Y5tTyHT",
  'truncated': True,
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [],
   'urls': [{'url': 'https://t.co/aB9Y5tTyHT',
     'expanded_url': 'https://twitter.com/i/web/status/971171213216239616',
     'display_url': 'twitter.com/i/web/status/9…',
     'indices': [117, 140]}]},
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user': {'id': 823957466,
   'id_str': '823957466',
   'name': 'Hanna Wallach',
   'screen_name': 'hannawallach',
   'location': 'Brooklyn, NY',
   'description': 'MSR NYC. Machine learning, computational social science, fairness/accountability/transparency in ML. NIPS 2018 program chair, WiML co-founder, sloth enthusiast.',
   'url': 'https://t.co/hrcIziHrkf',
   'entities': {'url': {'urls': [{'url': 'https://t.co/hrcIziHrkf',
       'expanded_url': 'http://dirichlet.net/',
       'display_url': 'dirichlet.net',
       'indices': [0, 23]}]},
    'description': {'urls': []}},
   'protected': False,
   'followers_count': 10614,
   'friends_count': 865,
   'listed_count': 499,
   'created_at': 'Fri Sep 14 20:38:24 +0000 2012',
   'favourites_count': 3507,
   'utc_offset': -14400,
   'time_zone': 'Eastern Time (US & Canada)',
   'geo_enabled': False,
   'verified': False,
   'statuses_count': 1505,
   'lang': 'en',
   'contributors_enabled': False,
   'is_translator': False,
   'is_translation_enabled': False,
   'profile_background_color': 'CCCCCC',
   'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/521040468528754688/_Ayh3ZCE.jpeg',
   'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/521040468528754688/_Ayh3ZCE.jpeg',
   'profile_background_tile': False,
   'profile_image_url': 'http://pbs.twimg.com/profile_images/2623320981/kinlr53ma1flkp9jerk4_normal.jpeg',
   'profile_image_url_https': 'https://pbs.twimg.com/profile_images/2623320981/kinlr53ma1flkp9jerk4_normal.jpeg',
   'profile_banner_url': 'https://pbs.twimg.com/profile_banners/823957466/1347986011',
   'profile_link_color': '999999',
   'profile_sidebar_border_color': 'FFFFFF',
   'profile_sidebar_fill_color': 'DDEEF6',
   'profile_text_color': '333333',
   'profile_use_background_image': False,
   'has_extended_profile': False,
   'default_profile': False,
   'default_profile_image': False,
   'following': True,
   'follow_request_sent': False,
   'notifications': False,
   'translator_type': 'none'},
  'geo': None,
  'coordinates': None,
  'place': None,
  'contributors': None,
  'is_quote_status': False,
  'retweet_count': 37,
  'favorite_count': 52,
  'favorited': False,
  'retweeted': False,
  'possibly_sensitive': False,
  'lang': 'en'},
 'is_quote_status': False,
 'retweet_count': 37,
 'favorite_count': 0,
 'favorited': False,
 'retweeted': False,
 'lang': 'en'}


In [26]:
# exercise 05

"""
Accessing retweet data

Now we're going to work with a tweet JSON that contains a retweet. A retweet has the same structure as a regular tweet, except that it has another tweet stored in retweeted_status.

The new tweet has been loaded as rt.
"""

# Instructions

"""

    Print the text of the tweet.
    Print the text of the tweet which has been retweeted, which is contained in retweeted_status.
    Print the user handle of the tweet.
    Print the user handle of the tweet which has been retweeted, which is contained in retweeted_status.

"""

# solution

# Print the text of the tweet
print(rt['text'])

# Print the text of tweet which has been retweeted
print(rt['retweeted_status']['text'])

# Print the user handle of the tweet
print(rt['user']['screen_name'])

# Print the user handle of the tweet which has been retweeted
print(rt['retweeted_status']['user']['name'])

#----------------------------------#

# Conclusion

"""
Good job!
"""

RT @hannawallach: ICYMI: NIPS/ICML/ICLR are looking for a full-time programmer to run the conferences' submission/review processes. More in…
ICYMI: NIPS/ICML/ICLR are looking for a full-time programmer to run the conferences' submission/review processes. M… https://t.co/aB9Y5tTyHT
alexhanna
Hanna Wallach


'\nGood job!\n'