-
Notifications
You must be signed in to change notification settings - Fork 4
/
113-data-tables.rmd
156 lines (107 loc) · 7.06 KB
/
113-data-tables.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# Format tables {#format-tables}
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(palmerpenguins)
library(tidyverse)
library(knitr)
library(kableExtra)
library(gapminder)
library(janitor)
```
We're spending a lot of time in this course making graphic visualizations of data. In this one lesson we will take a detour and learn to make nice looking tables with text and numbers in them. There are a bunch of packages for doing this (see Assignment X). We'll use the `kable` table formatting function in this lesson (from the packages `knitr` and `kableExtra`).
Let's use a simple table to demonstrate the main features. I want some text, some numbers, and not too many rows or columns. I'll summarize the data in the `palmerpenguins` table into a small table.
```{r echo = FALSE}
t1 <- penguins %>% group_by(species, island) %>%
summarize(count = sum(!is.na(body_mass_g)),
body_mass_g = mean(body_mass_g, na.rm=TRUE),
.groups = "drop")
t1
```
The function `kable` turns this into a formatted table.
```{r}
kable(t1)
```
## Header rows
Often you will want to rename or reformat the column headings. There are two ways to do this: using `rename` to change the actual column names before formatting the table, or using column formatting to just affect the table appearance. You can rename just one column using `rename`, but with the second version below (`col.names`) you need to provide all the column names in the correct order.
```{r}
t1 %>% rename(`Body mass (g)` = body_mass_g) %>% kable()
```
```{r}
t1 %>% kable(col.names = c("Species", "Island", "Count", "Body mass (g)"))
```
## Column alignment
Commonly numbers are right justified and text is left justified. That's what's done automatically. You can specify each column as left, centre, or right justified using the letters l, c, or r for each column. Here we'll center the justify the numbers to demonstrate.
```{r}
t1 %>% kable(align = "llcc")
```
## Formatting numbers
Any number that comes from a calculation (such as a mean) will have a lot of decimal places displayed unless you change this. You can control the number of decimal places to show using rounding. (Use a negative number of digits to round to the left of the decimal point, for example `digits=-1` to round to the tens place.) Give either one number to use for all columns, or provide a vector to control the number of digits separately for each column.
You can add a comma (or space for SI or . for European styles) to separate the thousands or millions using `format.args = list(big.mark = ",")`. See the help for `format` for more options.
```{r}
t1 %>% kable(digits = 1, format.args = list(big.mark = ","))
```
## Color, highlights, and other styles
There are a lot of options for changing the appearance of text in the `kableExtra` package. If you are interested, look at the [vignette](https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html) linked in the further reading.
The two styles I use frequently are alternating shading to help you read across rows and making the columns only wide enough to display your data.
```{r}
t1 %>% kable() %>% kable_styling(full_width = FALSE, bootstrap_options = "striped")
```
## Captions
You can add a caption to a table with the `caption` argument to `kable`.
```{r}
t1 %>% kable(caption = "The number and average mass of penguins by species and island.")
```
## Putting it all together
Usually you will want to combine these features to get the look you want. Your goal should always be to make a table that clearly displays your data.
```{r}
t1 %>% kable(digits = 1, format.args = list(big.mark = ","),
col.names = c("Species", "Island", "n", "Body mass (g)"),
caption="The number and average mass of penguins by species and island.") %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped")
```
## Adding row and column totals
We frequently make tables of counts of categorical variables. In these tables it can be helpful to add column or row totals. Sometimes we want to report those totals as percentages of a grand total. The `janitor` package makes it easy to add totals and percentages to rows and columns.
We'll start with a matrix of counts showing the number of countries with life expectancy more than 75 years in each year and continent.
```{r}
t1 <- gapminder %>% filter(lifeExp > 75) %>%
group_by(year, continent) %>%
dplyr::summarize(n = n()) %>%
pivot_wider(names_from = "continent", values_from = "n", values_fill = 0)
t1
```
Now we will add row totals. No sum is computed for the first column since it is assumed to be a label.
```{r}
t1 %>% adorn_totals()
```
If you think about this, you'll realize this doesn't make much sense! So let's add column totals instead. Again the first column is ignored, assuming it is a label.
```{r}
t1 %>% adorn_totals(where = "col")
```
These tables are formatted differently compared to `data.frames` and `tibbles`, but they can still be reformatted as you would expect using `%>% kable() %>% kable_styling()`.
Incidentally, the `janitor` package has a powerful table generating function `tabyl` which does the counting we started this section with. We still need the `filter` to retain only rows with life expectancy more than 75 years. The columns are reported alphabetically instead of according to the order they appear in the original dataset.
```{r}
gapminder %>% filter(lifeExp > 75) %>%
tabyl(year, continent) # %>% adorn_totals(where="col")
```
## Adding grouping for rows and columns
Sometimes it is desirable to add a grouping label over a series of columns. For example, the in the table showing totals above, we can add a header "Continent" over the appropriate columns. We do this to a `kable` formatted table using the function `add_header_row`. This function takes an argument which is a vector of pairs: labels for each column and the number of columns that label should span. There must be enough lables to span all the columns. We will add a blank label for the first and last column (spanning 1 column each), and a label "Continent" spanning the 5 continents.
```{r}
t1 %>% adorn_totals(where = "col") %>%
kable() %>%
add_header_above(c(" " = 1,
"Continent" = 5,
" " = 1)) %>%
kable_styling(full_width = FALSE)
```
It can also be helpful to add grouping to rows. For example, we could label the first 5 rows as being part of the 20th century and the last 2 rows as being part of the 21st century. We add row labels one at a time, giving a label for the row and the numbers of the rows the header should span. We will do this with the `kableExtra` function `group_rows`.
```{r}
t1 %>% adorn_totals(where = "col") %>%
kable() %>%
group_rows("20th century", 1, 5) %>%
group_rows("21st century", 6, 7) %>%
kable_styling(full_width = FALSE)
```
## Further reading
* Using [kable](https://bookdown.org/yihui/rmarkdown-cookbook/kable.html) to format tables
* Vignette on using [kable and kableExtra](https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html)
* https://people.ok.ubc.ca/jpither/modules/Tables_markdown.html