Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
126 lines (96 sloc) 3.83 KB
---
title: Quick post - detect and fix this ggplot2 antipattern
author: Roel M. Hogervorst
date: '2019-03-07'
slug: quick-post-detect-this-ggplot2-antipattern
categories:
- R
tags:
- ggplot2
- magrittr
- tidyverse
- dplyr
- tidyr
- mtcars
- antipattern
- quickthoughts
subtitle: ''
share_img: https://media.giphy.com/media/7Jpnmq5OGeOnb7nP3b/giphy.gif
---
Recently one of my coworkers showed me a ggplot and although it is not wrong, it is also not ideal. Here is the TL:DR :
> Whenever you find yourself adding multiple geom_* to show different groups, reshape your data
In software engineering there are things called [antipatterns](https://en.wikipedia.org/wiki/Anti-pattern#Software_engineering "Support wikipedia to save us all"), ways of programming
that lead you into potential trouble. This is one of them.
I'm not saying it is incorrect, but it might lead you into trouble.
example: we have some data, some different calculations and we want to plot that.
<details> <summary>
**I load tidyverse and create a modified mtcars set in this hidden part### this adds headers to the file
```{r}
library(tidyverse) # I started loading magrittr, ggplot2 and tidyr, and realised
# I needed dplyr too, at some point loading tidyverse is simply easiest.
very_serious_data <-
mtcars %>%
as_tibble(rownames = "carname") %>%
group_by(cyl) %>%
mutate(
mpg_hp = mpg/hp,
first_letter = str_extract(carname, "^[A-z]"),
mpg_hp_c = mpg_hp/mean(mpg_hp),# grouped mean
mpg_hp_am = mpg_hp+ am
)
```
</details>
Now the data (mtcars) and calculations don't really make sense but they are here to show you the
antipattern. I created 3 variants of dividing mpg (miles per gallon) by hp (horse power)
### The antipattern
We have a dataset with multiple variables (columns) and want to plot
one against the other, so far so good.
What is the effect of mpg_hp for every first letter of the cars?
```{r antipattern not yet}
very_serious_data %>%
ggplot(aes(first_letter, mpg_hp))+
geom_point()+
labs(caption = "So far so good")
```
But you might wonder what the other transformations of that variable do?
You can just add a new geom_point, but maybe with a different color?
And to see the dots that overlap you might make them a little opaque.
```{r antipattern adding_equavalent information}
very_serious_data %>%
ggplot(aes(first_letter, mpg_hp))+
geom_point(alpha = 2/3)+
geom_point(aes(y = mpg_hp_c), color = "red", alpha = 2/3)+
labs(caption = "adding equivalent information")
```
And maybe the third one too?
```{r antipattern starts to become clear}
very_serious_data %>%
ggplot(aes(first_letter, mpg_hp))+
geom_point(alpha = 2/3)+
geom_point(aes(y = mpg_hp_c), color = "red", alpha = 2/3)+
geom_point(aes(y = mpg_hp_am), color = "blue", alpha = 2/3)+
labs(caption = "soo much duplication in every geom_point call!")
```
This results in lots of code duplication for specifying what is essentially
the same for every `geom_point()` call. It's also really hard to add a legend
now.
### What is the alternative?
> Whenever you find yourself adding multiple geom_* to show different groups, reshape your data
Gather the columns that are essentially representing the group and reshape
the data into a format more suitable for plotting. Bonus: automatic correct labeling.
```{r}
very_serious_data %>%
gather(key = "ratio", value = "score", mpg_hp, mpg_hp_c, mpg_hp_am ) %>%
ggplot(aes(first_letter, score, color = ratio))+
geom_point(alpha = 2/3)+
labs(caption = "fixing the antipattern")
```
And that's it.
![Mari also tells you it will work ](https://media.giphy.com/media/7Jpnmq5OGeOnb7nP3b/giphy.gif)
### State of the machine
<details>
<summary> At the moment of creation (when I knitted this document ) this was the state of my machine: **click here to expand** </summary>
```{r}
sessioninfo::session_info()
```
</details>
You can’t perform that action at this time.