# Module 2 Practice

In this notebook, we will look at different ways of choosing color schemes for our visualizations. We will use ggplot2 and RColorBrewer libraries.

[This cheat sheet can also be handy.](http://www.guianaplants.stir.ac.uk/seminar/materials/colorPaletteCheatsheet.pdf)

**Some of the following code cells require you fill in your code in < YOUR CODE > lines or question marks (???); for others,  run the code cell and study the outputs to understand what it does.**

In [None]:
library(ggplot2)
# Color palettes from Color Brewer
library(RColorBrewer)

First, show all palettes with their names: sequential, qualitative, diverging (remember how to do that from the lab notebook?)


In [None]:
display.brewer.all()

Display five colors for a qualitative data type, use 'Dark2' palette:


In [None]:
display.brewer.pal(n=5, name='Dark2')

Display color maps with **seven colors for a diverging data type with colorblind safe choices:**

(look up the parameters of the function and color scheme names)


In [None]:

display.brewer.all(n=7, type ='div', colorblindFriendly=TRUE)


In [None]:
brewer.pal.info

The following library also contains **colorblind safe** color maps.

**notice how palettes can be accessed by the `colorschemes` variable.**

In [None]:
library(dichromat)
colorschemes

In [None]:
colorschemes$BrowntoBlue.10

If we want to get **MORE colors than available** in the library, we can **interpolate** colors like this: 

In [None]:
p <- colorRampPalette(brewer.pal(9,'Blues'))(100)
p

**Now your TURN:** create a color palette with 50 colors using a sequential, colorblind friendly **brewer** palette (choose a palette from above and use its name).

In [None]:
p2 <- colorRampPalette(brewer.pal(9,'YlOrRd'))(50)
p2

---

Let's use the cars data to visualize some aspects of the data set.


In [None]:
head(mtcars)
# Pick some variables
data=mtcars[ , c(1,3:6)]
 
#Make a plot to show if there's any visible correlation, use rgb() to choose a color and alpha transparency
plot(data , pch=20 , cex=1.5 , col=rgb(0.5, 0.8, 0.9, 0.7))


In [None]:
#Let's compute all the correlations and look at them 
data=cor(mtcars)
data

### Not very useful to look at numbers, let's use a visualization with the ellipse library.

In [None]:
library(ellipse)

The following represents correlations as ellipses; slope represents sign,
thickness represents strength of correlation: thinner is better.


In [None]:
plotcorr(data)

### Again not very clear.

Let's use an **adequate** color scheme to distinguish between good and weak correlations as well as negative and positive.

**So we are talking about a diverging color scheme, right?**  

In [None]:
# Build a panel of 100 colors with Rcolor Brewer

my_colors <- brewer.pal(5, "Spectral")
my_colors = colorRampPalette(my_colors)(100)
 

In [None]:
# SORT the correlation matrix
data[1,]

In [None]:
ord <- order(data[1, ])
ord

In [None]:
data_ord = data[ord, ord]
data_ord

Now, plot and pick a color from the palette based on the value of correlation. You are mapping from correlation values (\[-1,1\]) to the indices of your color palette (\[0,100\]).

**STUDY the following code to figure out what it is to do the mapping!**

In [None]:

plotcorr(data_ord, col=my_colors[data_ord*50+50], mar=c(1,1,1,1))

### This is better. 

It's a **diverging** color scheme to represent both positive and negative correlations, and we can easily distinguish the strongest correlations by darker colors thanks to preattentive processing of color by human visual system. 

Ordering also helps for easy grouping.

---

**Let's look at different ways of manipulating color in ggplot2.**
Start with a small sample from diamonds data set 


In [None]:
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
head(dsamp)
str(dsamp)

Plot carat vs price and encode the 'cut' variable with the **color visual channel.** 


In [None]:

# default color palette: not a good choice 
(gp <- ggplot(dsamp, aes(x=carat, y=price, color=cut)) + geom_point())



**'cut' is CATEGORICAL, but it does have an inherent ordering. Let's use a sequential color scheme:**

In [None]:
gp + scale_colour_brewer(palette = 'Oranges')


In [None]:
# This might be better if we want to emphasize the ideal cut 

gp + scale_colour_brewer(type="seq", palette=3)

In [None]:
# This one is a BAD choice 

gp + scale_colour_brewer(palette="Set1")


**We can also assign colors manually using their hexadecimal codes:** make sure to supply as many colors as number of categories in the variable.

In [None]:

gp + scale_color_manual(values=c("#0000FF", "#009F00", "#56B4E9", "#009E73", "#FFFFFF"))


# not a very good color scheme

**Let's see how we can deal with a CONTINUOUS variable:**

In [None]:
(gp <- ggplot(dsamp, aes(x=carat, y=price, color=depth )) + geom_point())  # depth is a floating point number 

In [None]:

# add our palette to it instead of the default ggplot chooses 

gp + scale_colour_gradient(low="blue", high="red")

In [None]:
# we can also choose a discrete palette and create a continuous palette out of it: 

gp + scale_colour_gradientn(colors=colorschemes$BluetoGray.8)

This is **NOT** a good color palette for this variable; the variable is not diverging (going in positive and negative directions), but the palette is. 

For a true diverging variable, we would use a slightly different function: `scale_colour_gradient2(low='blue', mid='white', high='red')`

**Let's create a histogram of carat variable.** 


In [None]:

(gp2 <- ggplot(data=dsamp, aes(x=carat))+ geom_histogram(binwidth=0.5,aes(fill = ..count..)))

We can add a color palette of our choice to **fill** the bars. 

Pay attention to the difference in `scale_fill_` and `scale_colour_` functions; color is used for point and line colors, fill is used to fill the bars, densities, etc. 

In [None]:
gp2 + scale_fill_gradient(low="blue", high="red")

---



`viridis` is another good color library we can use in our plots. Take a look at the following samples: 

In [None]:
library(viridis)

In [None]:
ggplot(dsamp, aes(x=carat, y=price, color=cut )) + geom_point() + scale_color_viridis(discrete=TRUE)


Pay attention to the option above; because 'cut' is a factor (discrete), we have to use the option `discrete=TRUE`, otherwise the function would supply a continuous palette. 

Following shows how to choose between palettes in viridis: 

In [None]:
ggplot(dsamp, aes(x=carat, y=price, color=cut )) + geom_point() + scale_color_viridis(option='plasma', discrete=TRUE)


In [None]:
# this is for continuous variable: 

ggplot(dsamp, aes(x=carat, y=price, color=depth )) + geom_point() + scale_color_viridis(option='plasma')

Other options we can use are `inferno`, `cividis`, `mako` ,`rocket`, `turbo`. 

[This is a good overview of the viridis library](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html)

---

**Now, it's your TURN:**

Create a palette with three colors using the three colors of the `Accent` brewer palette, and display a scatter plot for the iris data set: 

In [None]:
pal <-brewer.pal(3,'Accent')
pal

In [None]:
head(iris)

In [None]:
gi <-ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color = Species)) + geom_point() +  scale_color_manual(values=pal)
gi

Add the following to your code to display linear regression lines on different species: 

In [None]:
gi + geom_smooth(aes(fill = Species), method = "lm")

In [None]:
gi + geom_smooth(aes(fill = Species), method = "lm") + scale_fill_manual(values=pal)