## Module 4 Practice - Heat Maps



This practice notebook has exercises for plotting heat maps using R and ggplot library.



A heat map is a 2D graphical representation of data where the individual values contained in a matrix are represented as colors. 
There are different types of heat maps used in different disciplines, each referred to by the term “heat map”, even though they use different visualization techniques. 

Heat maps :
  * directly map values from a matrix to 2D tiles, or 
  * visualize densities as shades of colors usually using sequential color palettes, or 
  * represent data points in scatter plots where too many superposed data points do not give an accurate representation. 


Let's start with the diamonds data set and plot a heat map to mitigate the crowded data points in the plot.

In [None]:
library(ggplot2)
data(diamonds)
head(diamonds)
c <- ggplot(diamonds, aes(carat, price)) + geom_point(alpha=0.5, color="lightblue")
c

As you can see, the chart is cluttered in the bottom left region.
Traditionally, we may use the `alpha` value to apply transparency and let the density build darker.
However, sometimes the data is too dense or granual even for that approach.

Recall, in a previous lab, when using discrete values for _Hour-of-Day_ and _Day-of-Week_, we used `geom_tiles()`.
Below, we will use the `geom_bin2d()` which is good for highly continuous data.
This will get us more uniform grid patterns, as opposed by the retangular patterns in the 24x7 layout.

In [None]:
# heat map
c <- ggplot(diamonds, aes(carat, price)) + 
    geom_bin2d(bins=30)   # the 2-dimensional binning of frequency counts
c

Alternatively, we can generate surfaces as the rendering using the `stat_density_2d`.
This is statistical function similar to the probability density function over a histogram in a prior lab.

In [None]:
# Or we can 2D density estimate to the plot
c <- ggplot(diamonds, aes(carat, price)) +  
    stat_density_2d(aes(fill = ..level..), geom="polygon")  # the stat_ functions are approximations
                                                            # representing the data
c

## <span style="background:yellow">Your Turn</span>

Please notice that the generated shape and rendering of the statistical plot does not match up with the expectations, visually, based on the previous heatmap with 2-D boxes.

In the cell below, correct the perspective of the above plot.
Add a code comment to what you are correcting and why.


In [None]:
# Add your code below this comment
# --------------------------------









---

#### New data

Let's plot a matrix heat map using NBA basketball statistics data. 

In [None]:
library(RColorBrewer)
nba <- read.csv("/dsa/data/all_datasets/ppg2008.csv", sep=",")
head(nba)

In [None]:
# sort it 
nba <- nba[order(nba$PTS),]
# add names 
row.names(nba) <- nba$Name
nba <- nba[,2:20]
# R heat maps can only accept matrix data
nba_matrix <- data.matrix(nba)
# plot it using the built/default heatmap (http://www.r-graph-gallery.com/215-the-heatmap-function/)

nba_heatmap <- heatmap(nba_matrix, Rowv=NA, Colv=NA, col = brewer.pal(9, "Blues"), scale="column", margins=c(5,10))

## Spatial Heatmap
Let's load the familiar Kings county data for house prices. we will lot a heat map to show the distribution of houses between floors and the year in which the house is built. The blue tiles represent larger number of houses built and white tiles represent fewer number of houses built.

In [None]:
kc_house_data = read.csv("/dsa/data/all_datasets/house_sales_in_king_county/kc_house_data.csv")

Heat maps are frequently used with **`geospatial maps`**. 
Let's generate the map for **`Seattle`** and overlay a density heat map on it. 

get_map() in **`ggmap`** will fetch you the geospatial map of the supplied location. 
It queries Google Maps, OpenStreetMap, Stamen Maps or Naver Map servers for a map. 

The location can be an address, longitude/latitude pair (in that order), or left/bottom/right/top bounding box. 

**External Reference: **
  * [ggmap](https://www.rdocumentation.org/packages/ggmap/versions/2.6.1/topics/ggmap)
  * [get_map()](https://www.rdocumentation.org/packages/ggmap/versions/2.6.1/topics/get_map)

In [None]:
table(is.na(kc_house_data$long))

In [None]:
library(ggmap)
kc_map_outline <- get_map(location='Seattle', zoom=11)
kc_map <- ggmap(kc_map_outline)
kc_map <- kc_map + geom_point(data=kc_house_data, aes(x=long, y=lat), 
                              inherit.aes=FALSE,color='red', alpha=0.2)
kc_map

**`geom_density2d`** in below code will draw the 2d contours based on the density of data points. The number of contours around a region is proportional to density of points. **`stat_density2d`** will show the heat based on number of data points. The map is shaded red if the number of data points are more in the region and yellow otherwise. 

**Rerefence:**

- [geom_density2d](https://www.rdocumentation.org/packages/ggplot2/versions/1.0.1/topics/geom_density2d)
- [stat_density2d](https://www.rdocumentation.org/packages/ggplot2/versions/1.0.1/topics/stat_density2d)

In [None]:
ggmap(kc_map_outline) + 
geom_density2d(data = kc_house_data, aes(x = long, y = lat), size = 0.3) + 
stat_density2d(data = kc_house_data, aes(x = long, y = lat, fill = ..level.., alpha = ..level..), size = 0.01, bins = 16, geom = "polygon") + 
scale_fill_gradient(low = "yellow", high = "red") + 
scale_alpha(range = c(0, 0.3), guide = FALSE)

The below plot is a scatter plot where size of the data point indicates the price of the house. 
By using transparency, we can efficiently visualize super-imposed data points and also approximate a heat map.

In [None]:
circle_scale_amt = 0.000001
ggmap(kc_map_outline) + 
    geom_point(data=kc_house_data, 
               aes(x=long, y=lat),col="orange",
               alpha=0.3, 
               size=kc_house_data$price*circle_scale_amt) + 
    scale_size_continuous(range=range(kc_house_data$price)) 

## <span style="background:yellow">YOUR TURN</span>

Center the and zoom the map to Mercer Isand, by adjusting the parameters the get_map function.  
Here is the API documentation:

```
get_map(
    location = c(lon = -95.3632715, lat = 29.7632836), 
    zoom = "auto", 
    scale = "auto", 
    maptype = c("terrain", "terrain-background", "satellite", "roadmap", "hybrid", "toner", "watercolor", "terrain-labels", "terrain-lines", "toner-2010", "toner-2011", "toner-background", "toner-hybrid", "toner-labels", "toner-lines", "toner-lite"), 
    source = c("google", "osm", "stamen", "cloudmade"), 
    force = ifelse(source == "google", TRUE, TRUE), 
    messaging = FALSE, 
    urlonly = FALSE, 
    filename = "ggmapTemp", 
    crop = TRUE, 
    color = c("color", "bw"), 
    language = "en-EN", 
    api_key
 )
```

Render the last version of the spatial heatmap, with size of the data point indicating the price of the house and using transparency to approximate heatmap.
Make the points shades of red.

In [None]:
# A) EDIT the code below this Comment by setting the 
# correct code into the "<FIX_ME>" spots
# ---------------------------------------------------
library(ggmap)


kc_map_outline <- get_map(location=<FIX_ME>, 
                              zoom=<FIX_ME>)
kc_map <- ggmap(kc_map_outline)


In [None]:
# B) Add code below this comment to draw onto the map
# the points and render
# ---------------------------------------------------








# SAVE YOUR NOTEBOOK, and then File > "Close and Halt"