Skip to content

Scatter plots

Bioinformatics and Data Centre, Gothenburg edited this page Sep 28, 2023 · 16 revisions

Hands on Exercises scatter plots

Data preprocessing

Start by downloading our toy dataset from here (right click and save as) and save it in a folder of your choice. Now set your working directory to the one containing your data by typing the following in R console

Note: In Windows, when specifying a directory path in R you will get an error if the directory path contains a backslash. If you saved your data to c:\myprojects\project1 then you have two options:

  • Use two backslashes instead of a single one:
setwd("c:\\myprojects\\project1")
  • Use a forward slash:
setwd("c:/myprojects/project1")

Now read in the data

indata <- read.table("scatterdata.txt", header = TRUE, sep="\t", stringsAsFactors=TRUE)

Scatter plots with ggplot2

Start by loading the ggplot2 package

# install.packages("ggplot2")
library("ggplot2")

If you don't have it installed you can install it by uncommenting the first line which is done by removing the hashtag (#).

We are going to use the ggplot function so let's take a look at the documentation

?ggplot2
?ggplot

Below is some useful links from the documentation

# Useful links:   
# http://ggplot2.tidyverse.org 
# https://github.com/tidyverse/ggplot2

E1. Get to know the data by using the following code

head(indata)
summary(indata)

You should see that the data consists of id, gender, height and weight. The task is to create scatter plots for height and weight.

Let's start with the basics! Note that there are two alternatives writing your code. Both have it's advantages.

ggplot(data = indata, aes(x = height, y = weight)) + 
  geom_point()
  
ggplot(data = indata) + 
  geom_point(aes(x = height, y = weight))
  

There are som basic option for a scatter plot that is controlled via geom_point (size, shape, color)

E2. Let's start customising the plot above by using the geom_point() options size=2, shape=23 and color="blue"

Solution E2
ggplot(data = indata, aes(x = height, y = weight)) + 
geom_point(size=2, shape=23, color="blue")

E3. Let's customise the plot some more, now using aes() in combination with the factor variable gender from the indata. Set color=gender within the aes() in the ggplot function

Solution E3
ggplot(data = indata, aes(x = height, y = weight, color=gender)) + 
geom_point(size=2)

E4. There exists a lot of built-in themes of which two of the most popular are theme_classic() and theme_bw(). Reuse the code above and add theme_classic() as a layer

Solution E4
ggplot(data = indata, aes(x = height, y = weight, color=gender)) + 
geom_point(size=2) +
theme_classic()

E5. Next step is to add custom colors using the function scale_color_manual(values=c("black", "#E69F00"). Tip: You can easily explore colors by using the tool ColorBrewer

Solution E5
ggplot(data = indata, aes(x = height, y = weight, color=gender)) + 
geom_point(size=2) +
theme_classic() +
scale_color_manual(values=c("black", "#E69F00"))

E6. Next step is to add custom shape using the function scale_shape_manual(values=c(18, 19)). Tip: You can see all the available shapes using the command: plot(1:25, pch=1:25)

Solution E6
ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17))

E7. Add x- and y-labels by adding the layers xlab("Height (cm)") and ylab("Weight (kg)")

Solution E7
ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17)) + 
  xlab("Height (cm)") +
  ylab("Weight (kg)")

E8. You can specify both x- and y-limits. Here we change the y-limits by adding the layer xlim(c(70, NA)) to get a tick-mark at height 70 cm and the layer ylim(c(NA, 40)) to get a tick-mark at weight 40 kg

Solution E8
ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17)) + 
  xlab("Height (cm)") +
  ylab("Weight (kg)") +
  xlim(c(70, NA)) +
  ylim(c(NA, 40))

E9. Try moving the legend to the top by adding the layer theme(legend.position="top") (five possible position options: top, right, bottom. left and none). Keep it if you like it.

Solution E9
ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17)) + 
  xlab("Height (cm)") +
  ylab("Weight (kg)") +
  xlim(c(70, NA)) +
  ylim(c(NA, 40)) + 
  theme(legend.position="top")

E10. To fine tune the figure you can customize both the axis title and axis text size by specifying the arguments

axis.title.x = element_text(size = 13), 
axis.title.y = element_text(size = 13), 
axis.text.x = element_text(size = 12), 
axis.text.y = element_text(size = 12)

to the theme() layer. Experiment with the size setting.

Solution E10
ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17)) + 
  xlab("Height (cm)") +
  ylab("Weight (kg)") +
  xlim(c(70, NA)) +
  ylim(c(NA, 40)) + 
  theme(axis.title.x = element_text(size = 13),
       axis.title.y = element_text(size = 13),
       axis.text.x = element_text(size = 12),
       axis.text.y = element_text(size = 12))

E11. To visualize the linear trend add the layer geom_smooth(method="lm", col="blue", show.legend=FALSE) to the existing plot. Note that this gives a regression line for each group. If you want to do one for the entire group add the layer geom_smooth(method="lm", col="blue", show.legend=FALSE, aes(shape=NULL, color=NULL)) instead. This will instruct ggplot to ignore the groups created by shape and color.

Solution E11
ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17)) + 
  xlab("Height (cm)") +
  ylab("Weight (kg)") +
  xlim(c(70, NA)) +
  ylim(c(NA, 40)) + 
  theme(axis.title.x = element_text(size = 13),
       axis.title.y = element_text(size = 13),
       axis.text.x = element_text(size = 12),
       axis.text.y = element_text(size = 12)) +
 geom_smooth(method="lm", col="blue", show.legend=FALSE)

E12. Sometimes it is better to create individual plots for each group level. This can be done adding the layer facet_wrap(vars(gender)) which creates a panel of plots divided by gender.

Solution E12
ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17)) + 
  xlab("Height (cm)") +
  ylab("Weight (kg)") +
  xlim(c(70, NA)) +
  ylim(c(NA, 40)) + 
  theme(axis.title.x = element_text(size = 13),
       axis.title.y = element_text(size = 13),
       axis.text.x = element_text(size = 12),
       axis.text.y = element_text(size = 12)) +
 geom_smooth(method="lm", col="blue", show.legend=FALSE) +
 facet_wrap(vars(gender))

E13. This is a general recipe for generating publication ready plots. Play around with the settings and when you are satisfied save the final plot to an object named final_plot. Save the final plot to a tiff file using either ggsave() or the tiff() function.

Solution E13
final_plot <- ggplot(data = indata, aes(x = height, y = weight, color=gender, shape=gender)) + 
  geom_point(size=2) +
  theme_classic() +
  scale_color_manual(values=c("black", "#E69F00")) + 
  scale_shape_manual(values=c(16, 17)) + 
  xlab("Height (cm)") +
  ylab("Weight (kg)") +
  xlim(c(70, NA)) +
  ylim(c(NA, 40)) + 
  theme(axis.title.x = element_text(size = 13),
       axis.title.y = element_text(size = 13),
       axis.text.x = element_text(size = 12),
       axis.text.y = element_text(size = 12)) +
 geom_smooth(method="lm", col="blue", show.legend=FALSE) +
 facet_wrap(vars(gender))

# ggsave
ggsave("scatter_ggsave.tiff", final_plot)

## save figure using the tiff command
tiff(filename="scatter.tiff", width = 6.66*1.314, height = 6.66, res = 600,units = "in",compression = c("lzw"))
final_plot
dev.off()

Developed by Björn Andersson and Jari Martikainen, 2023