# Lab 3 - Bubble Charts

### Data Visualization and Perception

Efficiency of a visualization should be judged by the degree to which we can easily, accurately, and meaningfully perceive the story. 
To do this, we utilize the perceptual strengths of design elements. 
As we have seen in data visualization, **human visual perception principles** guide us to choose efficient designs for the data visualization. 

Data visualization should strike a **balance between perception and cognition** to take fuller advantage of the brain's abilities. 
Seeing (i.e visual perception) which is handled by the visual cortex located in the rear of the brain, is extremely fast and efficient. 
Thinking (i.e. cognition), which is handled primarily by the cerebral cortex in the front of the brain, 
is much slower and less efficient. 
Traditional data presentation methods require conscious thinking for almost all of the work. 
**Data visualization shifts the balance toward greater use of visual perception, taking advantage of our powerful visual perception whenever possible.**



<img src="../images/brain-balance-perception-cognition.jpg">

In this lab notebook, we will create and study **interactive bubble charts** that require an effective design of visual encodings to afford effective visual perception of data. 

### Bubble charts in ggplot2

We have seen how to create bubble charts with ggplot2 in the data visualization course, 
we will use the same concept to create Shiny apps that display interactive bubble charts. 
We will use the **gapminder** data set.

In [None]:
library(ggplot2)
library(dplyr)
gapminder <- readRDS(file = "/dsa/data/all_datasets/gapminder_data.rds")

gm <- gapminder 

# convert population from integer to numeric -- needed for accurate computation of sum
gm <- transform(gm, pop = as.numeric(pop))
head(gm)

In [None]:
# group by continent and year and compute total pops and WEIGHTED life expectancy, gdp per cap. 

cgm <- gm %>% group_by(continent,year) %>% summarise(totpop=sum(pop),avglifeExp=sum(pop*lifeExp)/totpop,avggdpPercap=sum(pop*gdpPercap)/totpop, numCountries=n())

head(cgm)
# list of years we'll use for the UI element
levels(factor(cgm$year))

**First we will create a simple plot with no user interaction. **

In [None]:
#DEPLOY TO SHINY SERVER
dir <- getwd() #This gets the current Working Directory
course <- "DATA-SCI-8654" #This is to specify the course path for the shiny server
folder <- "module3-bubble1" #This specifies the folder name to copy

system(sprintf("/usr/local/bin/shiny_deploy %s %s %s", course, dir,folder), 
       intern = TRUE,
       ignore.stdout = FALSE, 
       ignore.stderr = FALSE,
       wait = TRUE, 
       input = NULL)

This creates a simple **scatter plot** for gdpPercap versus lifeExp. 
This is a plot only for a given year. 
**Next, we use two visual channels, size and color, to encode  population and continent. **

```R
# REPLACE the ggplot line with this:
p <- ggplot(data=subset(gm,year==1952),
  aes_string(x="gdpPercap",y="lifeExp",size="pop",color="continent"))+geom_point()
```

Let's explore different visuals by creating a scatter plot using **continent data** to show _GDP per cap_ vs. _life expectancy_.
We will explore the options in the Jupyter notebook first. 

**Visual channels** we will use: 

 - *position* = life exp and GDP per cap
 - *size* = population
 - *color* = continent

This plot contains all years; 
**trends** are visible by **perceptual grouping**, 
although this may not work for all kinds of data if trends are not "obvious". 

**Remember the Gestalt principles from data vis course: they are at work here:** similarity (color), proximity and continuity (position)

Run the following cells in the notebook:

In [None]:
p <- ggplot(data=cgm, aes(x=avggdpPercap, y=avglifeExp, size=totpop, color=continent)) + geom_point()
p

In [None]:
# make the plot nicer for less distraction

p <- p + scale_size(range = c(0, 10)) + guides(size=FALSE)
p

In [None]:
# CONNECTION is the strongest clue for perceptual grouping, let's try that. 
pline <- p + geom_line(size=0.5)
pline

In [None]:
# We can choose only one continent to connect the bubbles 

p + geom_line(data=subset(cgm, continent=="Africa"), size=0.5)

In [None]:
# Let's create a bubble chart for a given YEAR only 

p2 <- ggplot(data=subset(cgm, year==1977), aes(x=avggdpPercap, y=avglifeExp, size=totpop, color=continent)) + geom_point()
p2

In [None]:
# Less distractions 

p2 <- p2 + scale_size(range = c(0, 10)) + ylim(0,90) + xlim(0, 40000) + theme(legend.position="none") 
p2

In [None]:
# We show the whole time data for one continent only 

p2 <- p2 + geom_line(data=subset(cgm, continent=="Africa"), size=0.5) + geom_point(data=subset(cgm, continent=="Africa"), size=1)
p2

In [None]:
# Let's add annotation for easy comparison, keep the eyes on the plot, no legends. 

p2 <- p2 + geom_text(aes(label=continent), size=3, check_overlap = TRUE, vjust = 0, nudge_y = 1, nudge_x = -2000)
p2

Now we will turn this into an interactive plot in Shiny.
First we will let user choose the continent by checkboxes to turn _on_ or _off_ the time data for the continent. 
For this, we need **Checkbox Group Input Control**. 
The following code creates five checkboxes:

```R
checkboxGroupInput("check", "Choose continents:",
  choices = list("Africa", "Americas", "Asia", "Europe", "Oceania")
)
```

Let's put this in the UI, and use the ```check``` input to choose which continents to highlight. 

In [None]:
#DEPLOY TO SHINY SERVER
dir <- getwd() #This gets the current Working Directory
course <- "DATA-SCI-8654" #This is to specify the course path for the shiny server
folder <- "module3-bubble2" #This specifies the folder name to copy

system(sprintf("/usr/local/bin/shiny_deploy %s %s %s", course, dir,folder), 
       intern = TRUE,
       ignore.stdout = FALSE, 
       ignore.stderr = FALSE,
       wait = TRUE, 
       input = NULL)

You should see an interface similar to this: 

<img src="../images/bubble2.png">


Next, we will add another interface element for the **year**. 
We will allow to choose one year only, so we can't use ```checkboxGroupInput```, instead we can use ```selectInput``` or ```radioButtons``` like this:

```R
yearlist <- levels(factor(cgm$year))  # put this outside the UI, will run only once.
# the following will be in the UI:
radioButtons("chooseyear", "Choose Year:", choices = yearlist)
```
Let's see the whole code: 

In [None]:
#DEPLOY TO SHINY SERVER
dir <- getwd() #This gets the current Working Directory
course <- "DATA-SCI-8654" #This is to specify the course path for the shiny server
folder <- "module3-bubble3" #This specifies the folder name to copy

system(sprintf("/usr/local/bin/shiny_deploy %s %s %s", course, dir,folder), 
       intern = TRUE,
       ignore.stdout = FALSE, 
       ignore.stderr = FALSE,
       wait = TRUE, 
       input = NULL)

You should see an interface similar to this: 

<img src="../images/bubble3.png">

Play with the sizes of the bubbles, scales, and fonts to make the plot more perceivable. In the practice, you will change the radio buttons to a slider which is more adequate to represent time input. 