# Module 6 R Exercise

You will create map visualizations using ggmap and ggplot2. First let's start with a simple data set; number of visitors to United Stated from 2006 to 2015. 

In [3]:
# First, read the data and make sure the first column is character and the rest is numeric
vis <- read.csv("../../../datasets/spatial/US_visitors.csv",colClasses=c("character",rep("numeric",10)))
head(vis)

Region,X2006,X2007,X2008,X2009,X2010,X2011,X2012,X2013,X2014,X2015
Africa,394163,426922,474160,452693,485110,508489,573184,645919,757181,792026
Asia,8371244,8781480,8795236,7820986,9404375,10027386,11062760,12230911,13307053,14025173
Europe,12792122,13993051,15931641,14559083,14692093,15481558,15710015,16167460,17376449,17340542
North America,8491307,9963858,9832557,8963282,16449861,20940354,19996738,24561055,35589531,36312759
Oceania,1039872,1067258,1127444,1065909,1290993,1513963,1618337,1770569,1822066,1859507
South America,2432010,2763355,3039883,3075013,3587883,4126385,4651162,5511558,6052610,6256760


We will create a **flow map** to visualize number of visitors from different continents. For that, we'll need coordinates. *ggmap* library has functions to look up for *geocodes* of locations; these locations can be addresses, city names, or even continent names. 
Let's lookup continents' coordinates information by using *mutate_geocode* function that will augment our data frame.

In [4]:
library(ggplot2)
library(ggmap)

# geo coordinates lookup for the place names in the Region attribute 
vis <- mutate_geocode(vis, Region)
head(vis)
# get the map 
world <- map_data("world")

Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Africa&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Asia&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Europe&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=North%20America&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Oceania&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=South%20America&sensor=false


Region,X2006,X2007,X2008,X2009,X2010,X2011,X2012,X2013,X2014,X2015,lon,lat
Africa,394163,426922,474160,452693,485110,508489,573184,645919,757181,792026,34.50852,-8.783195
Asia,8371244,8781480,8795236,7820986,9404375,10027386,11062760,12230911,13307053,14025173,100.61966,34.047863
Europe,12792122,13993051,15931641,14559083,14692093,15481558,15710015,16167460,17376449,17340542,15.25512,54.525961
North America,8491307,9963858,9832557,8963282,16449861,20940354,19996738,24561055,35589531,36312759,-105.25512,54.525961
Oceania,1039872,1067258,1127444,1065909,1290993,1513963,1618337,1770569,1822066,1859507,140.01877,-22.73591
South America,2432010,2763355,3039883,3075013,3587883,4126385,4651162,5511558,6052610,6256760,-55.49148,-8.783195


In [5]:
# get coordinates for US 
us <- geocode("United States")

Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=United%20States&sensor=false


**Exercise 1:** Create a flow map that shows the number of visitors to US from different continents. Make sure color encodes region and size encodes number of visitors in 2015.

In [None]:
# Now first plot the world map 
ggplot() + 
geom_map(data=,map=, aes(<- YOUR CODE HERE ->), <- YOUR CODE HERE ->) +

# then plot the flow curves from continents to the same point in US. 
geom_curve(<- YOUR CODE HERE ->), 
           curvature=0.1, arrow=arrow(length=unit(0.05, "npc"))) +
# add coords themes etc. 
coord_equal() +
theme_void() + theme(legend.position="None")

**Exercise 2:** All flow curves end at the same point in US and get superposed. Instead of using a single endpoint, let's use multiple endpoints in US to make it look better. Go to [latlong.net](http://www.latlong.net/) and choose good endpoints for each continent, and then add their coordinates to your data frame A ENDLON AND ENDLAT attributes and redraw. 

In [None]:
# <- YOUR CODE HERE ->

Now, we will work on the flight data from the L6_Projections_Practice. We will visualize flights. Let's get the data first. 

In [None]:
library(dplyr)
library(sp)
library(geosphere)

# airport codes and coordinates 
airports <- read.csv("../../../datasets/spatial/airports.csv", as.is=TRUE, header=TRUE)
# flight destinations and counts 
flights <- read.csv("../../../datasets/spatial/flights.csv", as.is=TRUE, header=TRUE)
airports$lat <- as.numeric(airports$lat)
airports$long <- as.numeric(airports$long)
# get airport locations
airport_locs <- airports[, c("iata","long", "lat")]

# Link airport lat long to origin and destination
OD <- left_join(flights, airport_locs, by=c("airport1"="iata"))
OD <- left_join(OD, airport_locs, by=c("airport2"="iata"))
head(OD)

**Exercise 3:** Add another attribute to the OD data frame that shows the distance between two airports and visualize only those routes that are longer than 500 miles. 

In [None]:
# This is how we find the geodesic distance between two pairs of coordinates using geosphere library
# Compute the geodesic distance between airports 
dd <- data.frame(Distance=distGeo(matrix(c(OD$long.x, OD$lat.x), ncol = 2), matrix(c(OD$long.y, OD$lat.y), ncol = 2)))
# convert meters to miles 
dd <- dd*0.000621371 
# Now add this to the OD data frame as another attribute and visualize only those routes that are longer than 500 miles. 
head(dd)


# <- YOUR CODE HERE ->



**Exercise 4:** Let's read the Missouri County population data set and create a choropleth map that shows the population in year 2000 county by county. 

In [None]:
moco <- read.csv("../../../datasets/spatial/MO_2009_County.csv")

head(moco)

In [None]:
# The folllowing is in order to create a "region" common id.
# Get the Missouri counties map and rename the county column to "region"
mo_map <- map_data("county","missouri")
mo_map <- mo_map[ ,-5]
names(mo_map)[5] <- 'region'

# make the county names lowercase
moco <- mutate(moco, region = tolower(COUNTYNAME))
head(moco)

In [None]:
# Now define the filling attribute with your data frame 
ggplot(<- YOUR CODE HERE ->) +

# and define your map with the common id 
geom_map(<- YOUR CODE HERE ->)+

# mapping stuff
expand_limits(x = mo_map$long, y = mo_map$lat) +
coord_map() + 
theme_void()

**Exercise 5:** Let's read the Walmart store opening data, and create a density map out of it. Get a Texas map from Google Maps, then create a density plot and its borders as layers on the map. Finally add store locations as another layer encoded by blue dots. 

In [None]:
df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/1962_2006_walmart_store_openings.csv')
head(df)

In [None]:
# Get the Google Map tile at zoom level 7 for Fort Worth, Texas
TX <- suppressMessages(get_map(<- YOUR CODE HERE ->))

# Create a map from it, slightly whiten it. 
TXmap <- ggmap(TX, extent = "device", darken = c(.2,"white"))

# Draw the map first 
TXmap +

# add levels 
geom_density2d(<- YOUR CODE HERE ->) + 

# add density heat map
stat_density2d(<- YOUR CODE HERE ->, size = 0.01, bins = 16, <- YOUR CODE HERE ->) + 

# color scale 
scale_fill_gradient(<- YOUR CODE HERE ->) + 

scale_alpha(range = c(0.05, 0.2), guide = FALSE) +

# add store locations here so that they are on top 
geom_point(<- YOUR CODE HERE ->) +

# remove the legend 
<- YOUR CODE HERE -> + 

# add title
<- YOUR CODE HERE ->


**Exercise 6:** Do the same exercise for motor vehicle theft data except the point layer. Plot the densities of all data points for all years on the same map. 

In [None]:
df <-read.csv("../../../datasets/motor_vehicle_thefts/mvt.csv")
head(df)

# Get the Google Map tile at zoom level 11 for Chicago

<- YOUR CODE HERE ->


**Exercise 7:** Repeat exercise 6 for small multiples of years. 

In [None]:
<- YOUR CODE HERE ->
