# Geotagged Tweet Mapping
#### Welcome to the Geotagged Tweet Mapping project for *Teaching Privacy*.

**This project is due on 00/00/00.**

This project will rely on several Python libraries, some of which you may have not used before, and will require you have a Twitter account. Thus, it is highly recommended you do this in pairs or groups. 

If students have not worked with for loops or dictionaries before, they may require extra assistance during those sections.

## Part 1a: Installing Tweepy


*pip install tweepy*

Check the readme file on https://github.com/tweepy/tweepy for the most up to date installation instructions. 

Run the cell below to import the module.

In [2]:
import tweepy
from tweepy import TweepError
import json

## Part 1: Create Twitter App


1. Go to https://apps.twitter.com and click 'Sign In'. If you do not have a Twitter account or do not want to use your current Twitter account, you will have to create one.
2. Click on 'Create New App'.
3. Give your app a Name, Description and a Website. For the website you are allowed to put a placeholder such as https://www.google.com.

## Part 2: Obtain Twitter Tokens 

When using APIs that require tokens and keys for authentication, it is common practice to have your keys in a separate JSON file as to protect yourself and the application's users. Your file should not be posted on public repositories, and you should **never** share your keys. 


Create a new text file named **twitter_keys.json** with the following format:

{ <br>
   "api_key":"", <br>
   "api_secret":  "", <br>
   "access_token": "", <br>
   "access_token_secret": "" <br>
}
<br>
1. Go to your app you created in the previous step and head on to the 'Keys and Access Tokens' tab. 
2. Copy and paste the tokens and keys for the corresponding variables in your JSON file. <br>
    a. You will have to click 'Create my access token' the first time you create your app. <br>
    b. Make sure you copy and paste the tokens inside the quotation marks.
3. Run the cell below to assign your keys to the keys variable.

In [3]:
keys_file = 'twitter_keys.json'
with open(keys_file) as file:
    keys = json.load(file)

## Part 3: Using the Twitter API with Tweepy

Run the cell below to check if you have correctly set up the keys.

In [4]:
try:
    auth = tweepy.OAuthHandler(keys["api_key"], keys["api_secret"])
    auth.set_access_token(keys["access_token"], keys["access_token_secret"])
    api = tweepy.API(auth)
    print("You have correctly set up your API keys. Your username is:", api.auth.get_username())
except TweepError as e:
    print("Tweepy found an error. Revisit your twitter_keys.json file and make sure you have the correct keys.")

You have correctly set up your API keys. Your username is: ImKarloss


Now that you have been authenticated to use the Twitter API, it is time to get acquainted with the Twitter API.

Using the <a href="http://tweepy.readthedocs.io/en/v3.5.0/">documentation</a>, find Twitter's @jack 200 most-recent tweets in the cell below.

**Hint: Look for a method to return the user timeline under 'API Reference'. http://docs.tweepy.org/en/v3.5.0/api.html#timeline-methods**

In [5]:
tweets = api.user_timeline(screen_name="jack", count=200)

In the cell below, find what type of data type we found in the previous tweet.

In [6]:
type(tweets)

tweepy.models.ResultSet

The cell above should say we have a tweepy.models.ResultSet, which is a list of Status objects, or tweets. Confirm this in the cell below by indexing the first tweet and checking its type.

In [35]:
first_tweet = tweets[0]
type(first_tweet)

tweepy.models.Status

RESTful APIs typically send data in JSON format, the same format as our keys file. Using the '._json' attribute, convert the first tweet into a dictionary in the cell below. 

**Hint: Read this stackoverflow post for more information about the json attribute https://stackoverflow.com/questions/27900451/convert-tweepy-status-object-into-json ** <br>
**Hint 2: If you have not used dictionaries before, you can see this video on the subject. https://stackoverflow.com/questions/27900451/convert-tweepy-status-object-into-json**

In [36]:
first_tweet_dict = first_tweet._json
first_tweet_dict

{'contributors': None,
 'coordinates': None,
 'created_at': 'Thu Aug 16 00:02:50 +0000 2018',
 'entities': {'hashtags': [],
  'symbols': [],
  'urls': [],
  'user_mentions': [{'id': 7445912,
    'id_str': '7445912',
    'indices': [0, 7],
    'name': 'Pablo Defendini',
    'screen_name': 'pablod'}]},
 'favorite_count': 154,
 'favorited': False,
 'geo': None,
 'id': 1029881129266900992,
 'id_str': '1029881129266900992',
 'in_reply_to_screen_name': 'pablod',
 'in_reply_to_status_id': 1029880867894775810,
 'in_reply_to_status_id_str': '1029880867894775810',
 'in_reply_to_user_id': 7445912,
 'in_reply_to_user_id_str': '7445912',
 'is_quote_status': False,
 'lang': 'en',
 'place': None,
 'retweet_count': 16,
 'retweeted': False,
 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
 'text': '@pablod We take responsibility, and enforce accordingly',
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Tue Mar 21 20:50:14 +0

Looking at the cell above, you should see that we are returned a nested dictionary. This represents the common JSON format, however, this in itself is not a JSON file. 

Explore the result and find where the tweet location is and under which keys. Use the cell below to print the first tweet's location.

**Note: Not all tweets have locations embedded. Tweets without location will have a None value in their place.**
**Hint: Find the first tweet's 'place' tag.**

In [37]:
first_tweet_location = first_tweet_dict['place']
print('This tweet was tweeted from:', first_tweet_location)

This tweet was tweeted from: None


## Part 4: Tweet Locations

In the cell below, find the locations for all tweets we obtained. 

Hint: Not all tweets are geo-tagged so figure out how to only append tweet locations to the list instead of those with no location.

In [13]:
locations = []
tweets_with_location = []
for tweet in tweets:
    current_tweet = tweet._json['place']
    if current_tweet is not None:
        tweets_with_location.append(current_tweet)
        locations.append(current_tweet['full_name'])
locations

['Missouri, USA',
 'Illinois, USA',
 'Ohio, USA',
 'West Virginia, USA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'S

## Part 5a: Installing 

We will be using the geoplotlib library to visualize tweet locations. Since geoplotlib requires two other libraries, numpy and pyglet, we will have to install those too using the following 3, separate, commands in your terminal:

*pip install numpy <br>
pip install pyglet <br>
pip install geoplotlib <br>*

Once done, run the cell below to import tweepy and all other necessary Python modules.

In [14]:
import geoplotlib

## Part 5: Tweet Location Visualization

Now that we have stored the location of the user's tweets, it is time to create a visualization.

For each tweet with a location, twitter stores 4 pairs of latitudes and longitudes for each corner of the bounding box. For each tweet, store the 1st pair from each bounding box in an array named 'coords'.


In [15]:
coords = []
for tweet in tweets_with_location:
    coords.append(tweet['bounding_box']['coordinates'][0][0])

geoplotlib has a utils.DataAccessObject that takes in a dictionary or pandas dataframe to create a DataAccessObject. This is the data type that the library uses to create its maps.

Create a dictionary with 3 keys: latitude, longitude, and name of the city. For each key the value should be a list with the corresponding values, you should already have the necessary values in previously assigned arrays.

Once done, use the utils.DataAccessObject method to create the DataAccessObject and createa a dot density map with the .dot method.

**Hint: After using the .dot method to create a dot density map, you must call geoplotlib.show() to open up a window with the map.**

In [None]:
lat, lon, name = [], [], locations
lat = [coordinate[1] for coordinate in coords]
lon = [coordinate[0] for coordinate in coords]
loc = {'lat': lat, 'lon': lon, 'name': name}
geo_loc = geoplotlib.utils.DataAccessObject(loc)
geoplotlib.dot(geo_loc)
geoplotlib.show()

## Part 6: Conclusion

This assignment will have different results depending on the Twitter user you inspect. Some users will have no tweets with embedded locations, and others may only tweet from a single city.

Examining locations in tweets can give an estimate of where a user lives or a user's up-to-date whereabouts.