# Bubble Charts

This notebook has exercises for plotting bubble charts using R. 

A bubble chart is a sepcial type of scatter plot where each point mark is encoded with typically two visual variables (e.g., color and size) to represent two attributes of a data point. 
In essence, bubble charts are used for plotting 4 dimensions (or features/fields) on a two dimensional plot.
Therefore, the four visual variables are:
  1. X position
  1. Y position
  1. size
  1. color

A famous interactive and animated bubble chart is the Gapminder that you can see and interact [here (external link)](http://gapminder.org/tools). 


In the below lab, we use a new crime data set which provides some aggregate statistics by state.

In [None]:
library(ggplot2)
crime = read.csv("/dsa/data/all_datasets/crime.csv")
head(crime)

In [None]:
summary(crime)

## Building from a scatter plot

Our first two visual variables are the
  1. X = Murder measurement
  1. Y = Burglary measurement

In [None]:
# First create a scatter plot using geom_point
                        # Setting the position to be (murder,burglary)
(p <- ggplot(crime, aes(murder,burglary)) + geom_point())  # and rendering points.


## Adding Size

We can then add a third visual variable, state population as the **size** of the points.

In [None]:
# Add size visual variable to encode state population
(p <- ggplot(crime, aes(murder,burglary,size=population)) + geom_point())


## Adding a few more visual enhancements

  1. Changing color
  1. Writing text over the bubble

In [None]:
# Let's add the state names as labels and rewrite the axis labels
#                                                       # Label is the annotation of a data point
p <- ggplot(crime, aes(murder,burglary,size=population, label=state))
p <- p + geom_point(colour="red", alpha=1/4) + geom_text(size=2)
p + xlab("Murders per 1,000 population") + ylab("Burglaries per 1,000")

## Color as a visual variable

Recall from day 2, color was one of the most basic visual variables.
We will set the color of the points to be a measurement, motor_vehicle_theft.


In [None]:
# Now we will add another visual variable: color will encode motor_vehicle_theft
p <- ggplot(crime, aes(murder,burglary,size=population, label=state))
p <- p + geom_point(aes(colour=motor_vehicle_theft)) + geom_text(size=2)
p + xlab("Murders per 1,000 population") + ylab("Burglaries per 1,000")

Additionally, we can use color palettes to provide more distinctive colors for our plot.

In [None]:
# and fix the color palette 
p <- ggplot(crime, aes(murder,burglary,size=population, label=state))
p <- p + geom_point(aes(colour=motor_vehicle_theft)) + geom_text(size=2)
p <- p + xlab("Murders per 1,000 population") + ylab("Burglaries per 1,000") 
p + scale_color_continuous(low="yellow", high="purple")


## <span style="background:yellow">Your Turn</span>

Take note of the visual variables in the plot above.
A key aspect of the the expressive power of Bubble Charts is the integration of multiple measurements and how humans are able to perceive relationships among the data points.

Below, describe some insight the above plot gives you about crime rates.

## <span style="background:yellow">Your Turn</span>

Modify the crime Bubble Chart to discover something you find interesting, expected, or unexpected.

  1. Change around the visual variables, including possibly the measure for X, Y, size, and color.
  1. Below, describe the thing you found in step 1.

In [None]:
# 1) Write your code below this comment
# -------------------------------------







### Let's do a similar plot with air quality data.

The airquality data is built into R environments, as are various other data sets (e.g., Iris data).
The `data()` function loads one of these well known, learning data sets.

All the data sets are described here:
  *  https://r.org/R-manual/R-devel/library/datasets/html/00Index.html
     * [Local Mirror](/static/mirror_sites/r.org/R-manual/R-devel/library/datasets/html/00Index.html)
  
**Air Quality**:
```
Description

Daily air quality measurements in New York, May to September 1973.
Usage

airquality

Format

A data frame with 154 observations on 6 variables.
[,1] 	Ozone 	numeric 	Ozone (ppb)
[,2] 	Solar.R 	numeric 	Solar R (lang)
[,3] 	Wind 	numeric 	Wind (mph)
[,4] 	Temp 	numeric 	Temperature (degrees F)
[,5] 	Month 	numeric 	Month (1--12)
[,6] 	Day 	numeric 	Day of month (1--31)
```

In the below code, we will load the data set and do a little carpentry.

In [None]:
library(datasets)
data(airquality)
head(airquality)
# get only the three months and create a month label variable
aq_trim <- airquality[which(airquality$Month == 7 | airquality$Month == 8 | airquality$Month == 9), ]
aq_trim$Month <- factor(aq_trim$Month, labels = c("July", "August", "September"))

In [None]:
#Let's start with basic scatter plot 
pa <- ggplot(aq_trim, aes(x = Day, y = Ozone)) + geom_point()
pa

In [None]:
#Let's encode wind by size channel 
pa <- ggplot(aq_trim, aes(x = Day, y = Ozone, size = Wind)) + geom_point()
pa

In [None]:
# Fix the axis 
pa + scale_x_continuous(breaks = seq(1, 31, 5))

In [None]:
# add title and color and change shape to circle 
pa <- ggplot(aq_trim, aes(x = Day, y = Ozone, size = Wind)) +
        geom_point(shape = 21, colour = "mediumvioletred",
                   fill = "springgreen") +
        ggtitle("Air Quality in New York by Day") +
        labs(x = "Day of the month", y = "Ozone (ppb)") +
        scale_x_continuous(breaks = seq(1, 31, 5))
pa

In [None]:
# encode temparature by color variable 
pa <- ggplot(aq_trim, aes(x = Day, y = Ozone, size = Wind, fill = Temp)) +
        geom_point(shape = 21) +
        ggtitle("Air Quality in New York by Day") +
        labs(x = "Day of the month", y = "Ozone (ppb)") +
        scale_x_continuous(breaks = seq(1, 31, 5))
pa

In [None]:
# and change the color scheme
pa + scale_fill_continuous(low = "plum1", high = "purple4")

In [None]:
# we can also color w.r.t. month which is categorical 
pa <- ggplot(aq_trim, aes(x = Day, y = Ozone, size = Wind, fill = Month)) +
        geom_point(shape = 21) +
        ggtitle("Air Quality in New York by Day") +
        labs(x = "Day of the month", y = "Ozone (ppb)") +
        scale_x_continuous(breaks = seq(1, 31, 5))
pa

## <span style="background:yellow">Your Turn</span>

Modify the Air Quality Bubble Chart to use the Ozone as the size or color.
Change around the visual variables, including possibly the measure for X, Y, size, and color.

  1. Produce two different visualizations, A) and B)
  1. Compare / Contrast the information conveyed in those two visualiziations.

In [None]:
# 1.A) Write your code below this comment
# -------------------------------------







In [None]:
# 1.B) Write your code below this comment
# -------------------------------------







# SAVE YOUR NOTEBOOK, then File > "Close and Halt"