# R Exercise

You will create map visualizations using ggmap and ggplot2. 
First let's start with a simple data set; number of visitors to United Stated from 2006 to 2015. 

Find places to replace the "`<- YOUR CODE HERE ->`" to complete the practices.

In [None]:
# First, read the data and make sure the first column is character and the rest is numeric
vis <- read.csv("/dsa/data/all_datasets/spatial/US_visitors.csv",colClasses=c("character",rep("numeric",10)))

head(vis)

We will create a **flow map** to visualize number of visitors from different continents. 
For that, we'll need coordinates. *ggmap* library has functions to look up for *geocodes* of locations; 
these locations can be addresses, city names, or even continent names. 

Let's lookup continents' coordinates information by using *mutate_geocode* function that will augment our data frame.

In [None]:
library(ggplot2)
library(ggmap)

# get geo coordinates for the place names and put them in the Region attribute 
vis <- mutate_geocode(vis, Region)
head(vis)
# get the map 
world <- map_data("world")

In [None]:
# get coordinates for US 
us <- geocode("United States")

**Exercise 1:** Create a flow map that shows the number of visitors to US from different continents. Make sure color encodes region and size encodes number of visitors in 2015.

In [None]:
# Now first plot the world map 
ggplot() + 
geom_map(data=<- YOUR CODE HERE ->,map=<- YOUR CODE HERE ->, aes(<- YOUR CODE HERE ->), <- YOUR CODE HERE ->) +

# then plot the flow curves from continents to the same point in US. 
geom_curve(<- YOUR CODE HERE ->), 
           curvature=0.1, arrow=arrow(length=unit(0.05, "npc"))) +
# add coords themes etc. 
coord_equal() +
theme_void() + theme(legend.position="None")

**Exercise 2:** All flow curves end at the same point in US and get superposed. Instead of using a single endpoint, let's use multiple endpoints in US to make it look better. Go to [latlong.net](http://www.latlong.net/) and choose good endpoints for each continent, and then add their coordinates to your data frame as ENDLON and ENDLAT attributes and redraw. 

In [None]:
# <- YOUR CODE HERE ->

Now, we will work on the flight data from the R_Projections. 
We will visualize flights. Let's get the data first. 

In [None]:
library(dplyr)
library(sp)
library(geosphere)

# airport codes and coordinates 
airports <- read.csv("/dsa/data/all_datasets/spatial/airports.csv", as.is=TRUE, header=TRUE)
# flight destinations and counts 
flights <- read.csv("/dsa/data/all_datasets/spatial/flights.csv", as.is=TRUE, header=TRUE)
airports$lat <- as.numeric(airports$lat)
airports$long <- as.numeric(airports$long)
# get airport locations
airport_locs <- airports[, c("iata","long", "lat")]

# Link airport lat long to origin and destination
OD <- left_join(flights, airport_locs, by=c("airport1"="iata"))
OD <- left_join(OD, airport_locs, by=c("airport2"="iata"))
head(OD)

**Exercise 3:** Add another attribute to the OD data frame that shows the distance between two airports and visualize only those routes that are longer than 1500 miles. 

In [None]:
# This is how we find the geodesic distance between two pairs of coordinates using geosphere library
# Compute the geodesic distance between airports 
dd <- data.frame(Distance=distGeo(matrix(c(OD$long.x, OD$lat.x), ncol = 2), matrix(c(OD$long.y, OD$lat.y), ncol = 2)))
# convert meters to miles 
dd <- dd*0.000621371 
# Now add this to the OD data frame as another attribute and visualize only those routes that are longer than 1500 miles. 
head(dd)


# <- YOUR CODE HERE ->



**Exercise 4:** Let's read the Missouri County population data set and create a choropleth map that shows the population in year 2000 county by county. 

In [None]:
moco <- read.csv("/dsa/data/all_datasets/spatial/MO_2009_County.csv")

head(moco)

In [None]:
# The folllowing is in order to create a "region" common id.
# Get the Missouri counties map and rename the county column to "region"
mo_map <- map_data("county","missouri")
mo_map <- mo_map[ ,-5]
names(mo_map)[5] <- 'region'

# make the county names lowercase
moco <- mutate(moco, region = tolower(COUNTYNAME))
head(moco)

In [None]:
# Now define the filling attribute with your data frame 
ggplot(<- YOUR CODE HERE ->) +

# and define your map with the common id 
geom_map(<- YOUR CODE HERE ->)+

# mapping stuff
expand_limits(x = mo_map$long, y = mo_map$lat) +
coord_map() + 
theme_void()

# Save your notebook, then File > Close and Halt