# Working with Geotags

As we saw in `01_querying_twitter`, we can filter tweets that are geotagged within some radius of a particular location.

**REMEMBER**: 

Twitter ToS strictly prohibits
- separating geo information from the tweet it is attached to
- using aggregated geo-information to track individuals or other movement

Twitter ToS allows for "Heat maps and related tools that show aggregated geo activity (e.g.,: the number of people in a city using a hashtag)"

## "Beach" Geodata
`saved_searches/beach_tweets_extensive.csv` is 407 tweets that occurred in a circle roughly centered on the continental U.S. and which included the word "beach".

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import geopandas

# load in tweet df; converters argument tells pandas to read the geo.coordinate column as an object (list) not a string
beach_tweets = pd.read_csv('saved_searches/beach_tweets_extensive.csv', converters = {'geo.coordinates': eval})

# add columns for long and lat
beach_tweets[['geo.lat', 'geo.long']] = pd.DataFrame(beach_tweets['geo.coordinates'].to_list())


## geo.coordinates information

Let's remind ourselves of the structure of the "extensive" tweet csv files which are the result of saving all of the .json information from our searches.

The parameter of interest here is`geo.coordinates`, which is a list of the latitude and longitude of geo-tagged tweets.

In [None]:
print(beach_tweets.shape)
print('\n\n\n')
print(beach_tweets['geo.coordinates'].describe())
print('\n\n\n')
print(beach_tweets.head())


## .shp files

Shapefiles (.shp and associated files) are the standard format for representing digital maps.

These files define the borders, polygons, and often metadata of various maps.

There exist **many** free and open-source Shapefiles for nations, cities, and specialized purposes. You can even create your own Shapefiles files if you learn GIS software.

<br />
<br />

We're going to use the python library `geopandas` to manipulate Shapefiles from the US Census and layer our tweet data on-top of this map.

In [None]:
# shape files from US census https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
states = geopandas.read_file('us_state_data/cb_2018_us_state_500k.shp')

fig, ax = plt.subplots()
fig.set_size_inches(25,50)
ax.set_aspect('equal')
ax.set_ylim([25,50])
ax.set_xlim([-125,-67])

states.plot(ax = ax, color = 'white', edgecolor = 'black')
plt.show()

## Flexible Approach

There are an incredible number of flexible options in both `geopandas` and `matplotlib` that we're not going to be able to get into. 

I've linked to some resources in the README which demonstrate how powerful these tools are together.

## Plotting Geo-Tagged Tweets

Let's content ourselves with just adding points for our geo-tagged tweets as a layer on our US shapefile.

In [None]:
# convert long/lat from beach_tweets into a geoDataframe
gdf = geopandas.GeoDataFrame(beach_tweets, geometry = geopandas.points_from_xy(beach_tweets['geo.long'], beach_tweets['geo.lat']))

# shape files from US census https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
states = geopandas.read_file('us_state_data/cb_2018_us_state_500k.shp')

fig, ax = plt.subplots()
fig.set_size_inches(25,50)
ax.set_aspect('equal')
ax.set_ylim([25,50])
ax.set_xlim([-125,-67])

states.plot(ax = ax, color = 'white', edgecolor = 'black')
gdf.plot(ax = ax, color = 'red')
plt.show()

## Visualizing Tweet Density

Single-size dots aren't giving us a great idea of the true picture, especially since there might be many tweets sent from the same location with overlapping points.

Let's set the point marker size to relative to the number of tweets that came from the same location!

In [None]:
# performing some pandas shenanigans to create a count of each unique geo_coordinates value
# since geo.coordinates' values are lists, we'll convert them to tuple to be able to count them
# our count column is named 'geo.count'
geo_count_df = beach_tweets.groupby(beach_tweets['geo.coordinates'].map(tuple)).size().reset_index(name='geo.count')
print(geo_count_df.head())

# temporarily change geo.coordinates to tuple to match
beach_tweets['geo.coordinates'] = beach_tweets['geo.coordinates'].map(tuple)

# create a new df (beach_tweets2) that matches the geo.count to each observation
beach_tweets2 = pd.merge(beach_tweets, geo_count_df, on = 'geo.coordinates')

# map both 'geo.coordinates' values from tuples back to lists
geo_count_df['geo.coordinates'] = geo_count_df['geo.coordinates'].map(list)
beach_tweets['geo.coordinates'] = beach_tweets['geo.coordinates'].map(list)

In [None]:
# convert long/lat from beach_tweets into a geoDataframe
gdf = geopandas.GeoDataFrame(beach_tweets, geometry = geopandas.points_from_xy(beach_tweets['geo.long'], beach_tweets['geo.lat']))

# set tweet marker size to the column we just created
gdf['marker_size'] = beach_tweets2['geo.count']

# shape files from US census https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
states = geopandas.read_file('us_state_data/cb_2018_us_state_500k.shp')

fig, ax = plt.subplots()
fig.set_size_inches(25,50)
ax.set_aspect('equal')
ax.set_ylim([25,50])
ax.set_xlim([-125,-67])

states.plot(ax = ax, color = 'white', edgecolor = 'black')
gdf.plot(ax = ax, color = 'teal', markersize = gdf['marker_size']*15, alpha = 0.35)
plt.show()