In [None]:
surveys_complete <- read_csv("data/surveys_complete.csv")

# Plotting with ggplot2

ggplot2 is a plotting package that makes it simple to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatterplot. This helps in creating publication quality plots with minimal amounts of adjustments and tweaking.

ggplot2 plots work best with data in the ‘long’ format, i.e., a column for every dimension, and a row for every observation. Well-structured data will save you lots of time when making figures with ggplot2

ggplot graphics are built layer by layer by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.

To build a ggplot, we will use the following basic template that can be used for different types of plots:

In [None]:
ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) +  <GEOM_FUNCTION>()

In [None]:
ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length))

# add ‘geoms’ – graphical

representations of the data in the plot (points, lines, bars). ggplot2 offers many different geoms; we will use some common ones today, including:

geom_point() for scatter plots, dot plots, etc.

geom_boxplot() for, well, boxplots!

geom_line() for trend lines, time series, etc.

To add a geom to the plot use + operator. Because we have two continuous variables, let’s use geom_point() first:

In [None]:
ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) +
  geom_point()

# another way 

In [None]:
# Assign plot to a variable
surveys_plot <- ggplot(data = surveys_complete, 
                       mapping = aes(x = weight, y = hindfoot_length))

# Draw the plot
surveys_plot + 
    geom_point()

In [None]:
ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length)) +
    geom_point(alpha = 0.1, color = "blue")


In [None]:
ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length)) +
    geom_point(alpha = 0.1, aes(color = species_id))

# Boxplot
We can use boxplots to visualize the distribution of weight within each species:

In [None]:
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = weight)) +
    geom_boxplot()

In [None]:
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = weight)) +
    geom_boxplot(alpha = 0) +
    geom_jitter(alpha = 0.3, color = "tomato")

# Plotting time series data
Let’s calculate number of counts per year for each genus. First we need to group the data and count records within each group:

In [None]:
yearly_counts <- surveys_complete %>%
  count(year, genus)

In [None]:
ggplot(data = yearly_counts, aes(x = year, y = n)) +
     geom_line()

In [None]:
ggplot(data = yearly_sex_counts, 
       mapping = aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_grid(rows = vars(sex), cols =  vars(genus))

In [None]:
ggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex)) +
    geom_line() +
    facet_wrap(vars(genus)) +
    labs(title = "Observed genera through time",
         x = "Year of observation",
         y = "Number of individuals")

# histogram plot

In [None]:
# Basic histogram
ggplot(df, aes(x=weight)) + geom_histogram()

# Change the width of bins
ggplot(df, aes(x=weight)) + 
  geom_histogram(binwidth=1)

# Change colors
p<-ggplot(df, aes(x=weight)) + 
  geom_histogram(color="black", fill="white")
p

In [None]:
# Change line color and fill color
ggplot(df, aes(x=weight))+
  geom_histogram(color="darkblue", fill="lightblue")

# Add mean line and density plot on the histogram

The histogram is plotted with density instead of count on y-axis

Overlay with transparent density plot. The value of alpha controls the level of transparency

In [None]:
# Add mean line
p+ geom_vline(aes(xintercept=mean(weight)),
            color="blue", linetype="dashed", size=1)
# Histogram with density plot
ggplot(df, aes(x=weight)) + 
 geom_histogram(aes(y=..density..), colour="black", fill="white")+
 geom_density(alpha=.2, fill="#FF6666") 

# nother histogram way 

not shape but count

In [None]:
ggplot(mydata, aes(OverTime,fill = Attrition))+
  geom_histogram(stat="count")

# denisty plot

In [None]:
library(ggplot2)
# Basic density
p <- ggplot(df, aes(x=weight)) + 
  geom_density()
p
# Add mean line
p+ geom_vline(aes(xintercept=mean(weight)),
            color="blue", linetype="dashed", size=1)

# Change density plot line types and colors

In [None]:
# Change line color and fill color
ggplot(df, aes(x=weight))+
  geom_density(color="darkblue", fill="lightblue")
# Change line type
ggplot(df, aes(x=weight))+
  geom_density(linetype="dashed")

# Create barplots

In [None]:
library(ggplot2)
# Basic barplot
p<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity")
p
   
# Horizontal bar plot
p + coord_flip()

# Change the width and the color of bars :

In [None]:
# Change the width of bars
ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", width=0.5)
# Change colors
ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", color="blue", fill="white")
# Minimal theme + blue fill color
p<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", fill="steelblue")+
  theme_minimal()
p

In [None]:
# Change barplot fill colors by groups
p<-ggplot(df, aes(x=dose, y=len, fill=dose)) +
  geom_bar(stat="identity")+theme_minimal()
p

A stacked barplot is created by default. You can use the function position_dodge() to change this. The barplot fill color is controlled by the levels of dose :

In [None]:
# Stacked barplot with multiple groups
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
  geom_bar(stat="identity")
# Use position=position_dodge()
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())

#  simple parameter

In [None]:
ggtitle(label) # for the main title
xlab(label) # for the x axis label
ylab(label) # for the y axis label
labs(...) # for the main title, axis labels and legend titles

In [None]:
library(ggplot2)
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) + geom_boxplot()
p

In [None]:
p + ggtitle("Plot of length \n by dose") +
  xlab("Dose (mg)") + ylab("Teeth length")