# R for visualization

In [None]:
mydata_merge # note that the varialbes are not shared across notebooks

Load the example dataset.

In [None]:
data("WorldPhones") 
head(WorldPhones)

In [None]:
#Get only the data for 1951
phones_51 <- WorldPhones["1951",]

In [None]:
phones_51

# Bar Plot

In [None]:
#Create basic barplot
barplot(phones_51)

Add/change few options to the barplot
 - the option col = lets you specify the color
 - the option main = lets you change the title of the plot
 - the option ylim = c() lets you specify the initial and last value for the Y axis
 
For more, see help (cmd + i). Not accurate on arguments of the function. You should click `barplot` and look for options in there.

In [None]:
barplot(phones_51, col = "yellow", main = "#Phones in 1951", ylim = c(0, 50000))

# Histogram

In [None]:
#Load iris data
iris <- read.csv("iris.csv")
head(iris)

Create an histogram for the distribution of Sepal Width 

In the `hist()` command, first specify which column you want to use


the `xlim = c()` lets you specify the initial and final value of the X axis
the `xlab` lets you specify a label for the X axis

In [None]:
hist(iris$sepal_width, main = "Distribution of Sepal Width", col = "red", 
     ylim = c(0,40), xlim = c(2, 4.5), xlab ="Sepal Weight")

# Boxplot

In [None]:
#Create boxplot for Sepal width, by classes (or species)
boxplot(iris$sepal_width ~ iris$class, data = iris, ylim = c(1, 5), range = 0, xlab = "Class", ylab = "Sepal Width", main = "Boxplot of Sepal Width by Class")


If you want to know the statistics related to the boxplot,
first, save the boxplot as `bp` (just a name, nothing special)

statistics for the boxplot will be accessible through `bp$stats`

In [None]:
bp <-boxplot(iris$sepal_width ~ iris$class, data = iris, ylim = c(1, 5), range = 0, xlab = "Class", ylab = "Sepal Width", main = "Boxplot of Sepal Width by Class")
#then, run the command below. It will compute the statistics for the boxplot
text(x = col(bp$stats) - .5, y = bp$stats, labels = bp$stats)

# Scatter Plot

In [None]:
plot(iris$sepal_length, iris$sepal_width, xlab = "Sepal Length", 
     ylab = "Sepal Width", ylim = c(0, 5), col = "dark green", main = "Relationship between Sepal Length and Sepal Width")

# `ggplot2`

More advanced, more professional, more customizable. Most widely used visualization tool.

In [None]:
# install.package("ggplot")

library(ggplot2)

In [None]:
# just wanted to type less in the future

Sepal.Length <- iris$sepal_length
Sepal.Width <- iris$sepal_width

## Q(quick)plot

An easy way to create "quick plots". `qplot` will try to guess what type of plot you want based on the input.

First let's try one variable.

In [None]:
qplot(data = iris, x = Sepal.Length, main = "Distribution for Sepal Length", xlab="Sepal Length", ylab = "Count")


But if you specify two variables, the graph type changed to scatter plot.

In [None]:
qplot(data = iris, x = Sepal.Length, y=Sepal.Width, ylim = c(0,5), main = "Relationship between Sepal Length and Width")

option `col =` can also be used to assign a different color to different groups in your data.

In [None]:
qplot(data = iris, x = Sepal.Length, y=Sepal.Width, col=class, ylim = c(0,5), main = "Relationship between Sepal Length and Width")


## Histogram, revisited

Structure of `ggplot`. First, the skeleton:

`aes()` -> aesthetics

In [None]:
ggplot(data=iris, aes(x = Sepal.Length, fill = class)) #why empty? 

In [None]:
ggplot(data=iris, aes(x = Sepal.Length, fill = class)) + 
  geom_histogram(alpha=0.5) # specify graph type, y-axis is automatically determined

In [None]:
ggplot(data= iris, aes(x = Sepal.Length, fill = class)) + 
  geom_histogram(alpha=0.5) +
  ggtitle("Distribution of Sepal Lenght By Class") + # add title
  labs(x = "Sepal Length", y = "Count") # add label

## Boxplot

In [None]:
ggplot(data = iris, aes(x=class, y=sepal_width, color=class)) +
  geom_boxplot(notch = TRUE, outlier.colour="black", outlier.shape=8, outlier.size=3) +
  ggtitle("Boxplot of Sepal Width, by Classes") +
  labs(x = "Class", y = "Sepal Width")

## Scatterplot

In [None]:
ggplot(data = iris, aes(x=Sepal.Length, y=Sepal.Width, color=class, shape = class)) +
  geom_point(size=2) +
  ggtitle("Relationship between Sepal Length and Sepal Width, separated by Class") +
  labs(x = "Sepal Length", y = "Sepal Width")


In [None]:
ggplot(data = iris, aes(x=Sepal.Length, y=Sepal.Width, color=class, shape = class)) +
  geom_point(size=2) +
  ggtitle("Relationship between Sepal Length and Sepal Width, separated by Class") +
  labs(x = "Sepal Length", y = "Sepal Width") +
  geom_smooth(method=lm) # add linear regression

## Overlaying, or stacking multiple plots

In [None]:
ggplot(data = iris, aes(x = Sepal.Length)) +
    geom_histogram(aes(y=..density..), colour="black", fill="white") + # histgoram
    geom_density(alpha=.2, fill="light green") + # stacked with density plot
    labs(x="Sepal Length", y = "Density")+
    ggtitle("Distribution of Sepal Length with Density Plot")

In [None]:
ggplot(data= iris, aes(x=Sepal.Length, color=class, fill=class))+ # color coding in aes option
  geom_histogram(aes(y=..density..), position="identity", alpha=0.5)+
  geom_density(alpha=0.6, linetype = "dashed")+
  scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+ #additionally override some color options
  scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
  labs(x="Sepal Length", y = "Density")+
  ggtitle("Sepal Length histogram plot")