# Geotagged Tweet Mapping
#### Welcome to the Geotagged Tweet Mapping project for *Teaching Privacy*.

**This project is due on 00/00/00.**

This project will rely on several Python libraries, some of which you may have not used before. Thus, it is highly recommended you do this in pairs or groups. 

## Part 0a: Installing Tweepy

The easiest way to install is to run the following in your terminal:

*pip install tweepy*

If this does not work check out the readme file on https://github.com/tweepy/tweepy for the most up to date installation instructions. 

## Part 0b: Installing geoplotlib

We will be using the geoplotlib library to visualize tweet locations. Since geoplotlib requires two other libraries, numpy and pyglet, we will have to install those too using the following 4, separate, commands in your terminal:

*pip install numpy <br>
pip install pyglet <br>
pip install geoplotlib <br>
pip install pandas*

Once done, run the cell below to import tweepy and all other necessary Python modules.

In [1]:
import tweepy
from tweepy import TweepError
import json
import numpy
import pyglet
import geoplotlib
import pandas

## Part 1: Create Twitter App


1. Go to https://apps.twitter.com and click 'Sign In'. If you do not have a Twitter account or do not want to use your current Twitter account, you will have to create one.
2. Click on 'Create New App'.
3. Give your app a Name, Description and a Website. For the website you are allowed to put a placeholder such as https://www.google.com.

## Part 2: Obtain Twitter Tokens 

When using APIs that require tokens and keys for authentication, it is common practice to have your keys in a separate JSON file as to protect yourself and the application's users. Your file should not be posted on public repositories, and you should **never** share your keys. 


Create a new text file named **twitter_keys.json** with the following format:

{ <br>
   "api_key":"", <br>
   "api_secret":  "", <br>
   "access_token": "", <br>
   "access_token_secret": "" <br>
}
<br>
1. Go to your app you created in the previous step and head on to the 'Keys and Access Tokens' tab. 
2. Copy and paste the tokens and keys for the corresponding variables in your JSON file. <br>
    a. You will have to click 'Create my access token' the first time you create your app. <br>
    b. Make sure you copy and paste the tokens inside the quotation marks.
3. Run the cell below to assign your keys to the keys variable.

In [2]:
keys_file = 'twitter_keys.json'
with open(keys_file) as file:
    keys = json.load(file)

## Part 3: Using the Twitter API with Tweepy

Run the cell below to check if you have correctly set up the keys.

In [3]:
try:
    auth = tweepy.OAuthHandler(keys["api_key"], keys["api_secret"])
    auth.set_access_token(keys["access_token"], keys["access_token_secret"])
    api = tweepy.API(auth)
    print("You have correctly set up your API keys. Your username is:", api.auth.get_username())
except TweepError as e:
    print("Tweepy found an error. Revisit your twitter_keys.json file and make sure you have the correct keys.")

You have correctly set up your API keys. Your username is: ImKarloss


Now that you have been authenticated to use the Twitter API, it is time to get acquainted with the Twitter API.

Using the <a href="http://tweepy.readthedocs.io/en/v3.5.0/">documentation</a>, find Twitter's @jack 200 most-recent tweets in the cell below.

**Hint: Look for a method to return the user timeline under 'API Reference'.**

In [4]:
tweets = api.user_timeline(screen_name="jack", count=200)

In the cell below, find what type of data type we found in the previous tweet.

In [5]:
type(tweets)

tweepy.models.ResultSet

The cell above should say we have a tweepy.models.ResultSet, which is a list of Status objects, or tweets. Confirm this in the cell below by indexing the first tweet and checking its type.

In [6]:
first_tweet = tweets[0]
type(first_tweet)

tweepy.models.Status

RESTful APIs often send data in JSON format, the same format as our keys file. Using the '_json' attribute, convert the first tweet into a dictionary in the cell below.

In [7]:
first_tweet_dict = first_tweet._json
first_tweet_dict

{'contributors': None,
 'coordinates': None,
 'created_at': 'Sat Jun 30 21:42:30 +0000 2018',
 'entities': {'hashtags': [],
  'media': [{'display_url': 'pic.twitter.com/zgFsMvyIRr',
    'expanded_url': 'https://twitter.com/BoringEnormous/status/1013091058698407936/video/1',
    'id': 1013090797695258625,
    'id_str': '1013090797695258625',
    'indices': [43, 66],
    'media_url': 'http://pbs.twimg.com/ext_tw_video_thumb/1013090797695258625/pu/img/gzkvb1OVqRLf3lGV.jpg',
    'media_url_https': 'https://pbs.twimg.com/ext_tw_video_thumb/1013090797695258625/pu/img/gzkvb1OVqRLf3lGV.jpg',
    'sizes': {'large': {'h': 368, 'resize': 'fit', 'w': 640},
     'medium': {'h': 368, 'resize': 'fit', 'w': 640},
     'small': {'h': 368, 'resize': 'fit', 'w': 640},
     'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
    'source_status_id': 1013091058698407936,
    'source_status_id_str': '1013091058698407936',
    'source_user_id': 20538843,
    'source_user_id_str': '20538843',
    'type': 'photo'

Looking at the cell above, you should see that we are returned a nested dictionary. This represents the common JSON format, however, this in itself is not a JSON file. 

Explore the result and find where the tweet location is and under which keys. Use the cell below to print the first tweet's location.

**Hint: Not all tweets have locations embedded. Find the first tweet's 'place' tag.**

In [8]:
first_tweet_location = first_tweet_dict['place']
print('This tweet was tweeted from:', first_tweet_location)

This tweet was tweeted from: None


## Part 4: Tweet Locations

In the cell below, find the locations for all tweets we obtained. 

Hint: Not all tweets are geo-tagged so figure out how to only append tweet locations to the list instead of those with no location.

In [9]:
locations = []
tweets_with_location = []
for tweet in tweets:
    current_tweet = tweet._json['place']
    if current_tweet is not None:
        tweets_with_location.append(current_tweet)
        locations.append(current_tweet['full_name'])
locations

['San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'South San Francisco, CA',
 'Cuba',
 'Cuba',
 'Haiti',
 'Cuba',
 'Cuba']

## Part 5: Tweet Location Visualization

Now that we have stored the location of the user's tweets, it is time to create a visualization.

In the cell below, make an array with all the coordinates for each location. Use the 4th pair from each bounding box.

In [10]:
coords = []
for tweet in tweets_with_location:
    coords.append(tweet['bounding_box']['coordinates'][0][3])

Now that you have both a list of locations and coordinate pairs, create a table with pandas. 

You can create a DataFrame from an array of arrays with the following line:
pandas.DataFrame(array)

After doing so, make sure to rename the columns to 'lon' and 'lat' as appropriate, and add another column with the locations.

In [11]:
locs = pandas.DataFrame(coords)
locs['name'] = locations
locs.rename(columns={0:'lon', 1:'lat'}, inplace=True)

Now that you have all the information contained in a DataFrame, it is time to convert it into the correct data type. To map out the tweet locations, we will want to create a dot density map. 

geoplotlib has a .dot method to create a dot density map. The only required argument is a geoplotlib DataAccessObject. You can use the utils.DataAccessObject in geoplotlib to convert the DataFrame object.

**Hint: After using the .dot method to create a dot density map, you must call geoplotlib.show() to open up a window with the map.**

In [12]:
geo_locs = geoplotlib.utils.DataAccessObject(locs)
geoplotlib.dot(geo_locs)
#geoplotlib.show()

## Part 6: Conclusion

This assignment will have different results depending on the Twitter user you inspect. Some users will have no tweets with embedded locations, and others may only tweet from a single city.

Examining locations in tweets can give an estimate of where a user lives or a user's up-to-date whereabouts.