/
576_communication_table.Rmd
280 lines (172 loc) · 8.81 KB
/
576_communication_table.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
<!--
This file by Martin Monkman
is licensed under a Creative Commons Attribution 4.0 International License
https://creativecommons.org/licenses/by/4.0/
-->
# Communication—table formatting {#communicationtable}
## Setup
This chunk of R code loads the table formatting packages that we will be using.
```{r setup_576}
library(kableExtra)
library(gt)
library(flextable)
```
## Introduction {#communication-table-intro}
The look of the default table in R markdown is not all that attractive. Here's a summary table of the Oakland Athletics' seasons from 1999-2008.
```{r, include=FALSE, message=FALSE}
mlb_pay_wl <- read_csv(here::here("data", "mlb_pay_wl.csv"))
```
```{r moneyball_oakland}
oakland_99 <- mlb_pay_wl |>
filter(tm == "OAK" & year_num < 2009)
oakland_99
```
There are a few R packages that allow us to format a table like this, ready for publication. Here we will take a quick look at two of them. Both have many, many more options for formatting; see the reference materials at the top of this chapter for those details.
## The kable package {#communication-table-kable}
First, {kableExtra} is an extension to the {kable} tool that is part of "knitting" our R Markdown files.
To create a kable table, use the `kable()` function:
```{r}
# print table with {kableExtra} formatting
oakland_99 |>
# create table
kable()
```
The package allow for additional formatting to be applied. In the code below, the bootstrap theme "striped" is applied, along with specific row formatting. You'll notice that the header row is number "0", and that the rows with data are numbered starting at "1".
```{r}
# print table with {kableExtra} formatting
oakland_99 |>
# create table, add title
kable(caption = "Oakland Athletics, 1999-2008") |>
kable_styling(bootstrap_options = "striped", font_size = 10) |>
# make variable names bold and large
row_spec(0, bold = T, font_size = 14) |>
# make 2002 season (row 4) bold
row_spec(4, bold = T)
```
Other {kableExtra} formatting resources are here:
* [KableExtra Tutorial](https://rstudio-pubs-static.s3.amazonaws.com/444395_76727eaf9c774fa8bf932bed16500a00.html)
* Yihui Xie, Christophe Dervieux, Emily Riederer, ["The kableExtra package", _R Markdown Cookbook_](https://bookdown.org/yihui/rmarkdown-cookbook/kableextra.html)
* Hao Zhu, [Create Awesome HTML Table with knitr::kable and kableExtra](https://haozhu233.github.io/kableExtra/awesome_table_in_html.html)
* Hao Zhu, [Using kableExtra in Bookdown](https://haozhu233.github.io/kableExtra/bookdown/index.html)
## The {gt} package {#communication-table-gt}
A recent package that provides a significant amount of formatting possibilities is {gt}.
We create a basic table object with the `gt()` function:
```{r}
oakland_99 |>
gt()
```
Additional formatting can then be applied to the parts of the table.
* `tab_header()` is how we apply the title and subtitle. Notice that the first and last years in the table are assigned as objects, which can then be added to the text using the `glue::glue()` function.
* There are a variety of number formats available. In this case, the columns are specified but it is also possible to apply a global formatting to a variable (for example, two or more columns of dates can be formatted consistently without having to name them).
- the `use_seps = TRUE` will place separators for thousands, millions, etc
- the `suffixing = TRUE` automatically determines the length of the number and assigns a letter suffix. Here, the salary figures are in millions, but a similar thing would happen with thousands and billions as well.
- `decimals = ` allows us to control the number of decimal places shown.
- Formatting column data references (with links to each of the different number formats): https://gt.rstudio.com/reference/index.html#formatting-column-data
```{r}
# define year range to add as variable to title
year_min <- min(oakland_99$year_num)
year_max <- max(oakland_99$year_num)
oakland_table <- oakland_99 |>
gt() |>
tab_header(
title = "Oakland Athletics",
subtitle = glue::glue("{year_min} to {year_max}")
) |>
# apply number formatting
fmt_integer(columns = attend_g, use_seps = TRUE) |>
fmt_number(columns = est_payroll, suffixing = TRUE) |>
fmt_number(columns = pay_index, decimals = 2)
oakland_table
```
In this next code chunk, we change the column headings to better describe the variables. `cols_label()` gives us control over how the column labels are displayed, rather than changing the variable names.
* `cols_label()` reference page: https://gt.rstudio.com/reference/cols_label.html
```{r}
oakland_table |>
# add column heading labels
cols_label(
year_num = "Year",
tm = "Team",
attend_g = "Average home game attendance",
est_payroll = "Estimated payroll",
pay_index = "Pay index (league average for season = 100)",
w = "Wins",
l = "Loses",
w_l_percent = "Win-Loss Percent"
)
```
To have more control over where the column label line breaks occur, we use the HTML tag `<br>`, and add `.fn = md` (which calls the `md()` for markdown function.)
```{r}
oakland_table <- oakland_table |>
# add column heading labels
cols_label(
year_num = "Year",
tm = "Team",
attend_g = "Average home<br>game attendance",
est_payroll = "Estimated payroll",
pay_index = "Pay index<br>(league average for<br>season = 100)",
w = "Wins",
l = "Loses",
w_l_percent = "Win-Loss Percent",
.fn = md
)
oakland_table
```
It is also possible to add summary rows that have calculations. In this instance, we will add a row with the average attendance.
* "Add grand summary rows using aggregation functions" reference page: https://gt.rstudio.com/reference/grand_summary_rows.html
```{r}
oakland_table |>
grand_summary_rows(
columns = attend_g,
fns = list(fn ="mean", id = "attend_g", label = "average"),
fmt = ~ fmt_integer(.)
)
```
More resources are here:
* [{gt} package site](https://gt.rstudio.com/index.html)
* [Introduction to Creating gt Tables](https://gt.rstudio.com/articles/intro-creating-gt-tables.html) (from the package site)
## The flextable package {#communication-table-flextable}
Another table formatting package is {flextable}.
In this first code chunk, we create use `flextable()` to turn our dataframe into a flextable object, and add a "Source" annotation as a footer to our table.
```{r}
library(flextable)
oakland_99 |>
flextable() |>
add_footer_row(values = "Source: baseball-reference.com", colwidths = 8)
```
You'll notice that the year has a comma separating the thousands. This is because the "year" variable is numeric. In the chunk below, this is dealt with by using the `as.character()`, _before_ the `flextable()` function is applied.
The code then adds a variety of formatting.
Note that the functions are format-first. For example, font size is the function `fontsize()` and which row is an argument. This is opposite to what we see with {kableExtra}, where the row or column is specified with `row_spec()`, and then the formatting applied.
It's also possible to make formatting conditional on values in the table. In this code, the background colour is set using the `bg()` function, but only for those cases where wins (the "w" variable) is 100 or higher.
```{r}
## print table with {flextable} formatting
oakland_99 |>
mutate(year_num = as.character(year_num)) |>
## create table
flextable() |>
## add row striping
theme_zebra() |>
## set font size to 10
fontsize(size = 10, part = "all") |>
## but set variable names larger
fontsize(size = 14, part = "header") |>
## make variable names bold
bold(part = "header") |>
## make 2002 season (row 4) bold
# bold(i = 4) |>
## or with conditional specification: where year_num is 2002
bold(~ `year_num` == 2002) |>
## conditional background color as gold where wins are greater than 100
bg(~ w >= 100, bg = "gold") |>
## set nicer column labels
set_header_labels(year_num = "Year", tm = "Team", attend_g = "Attendance",
est_payroll = "Payroll ($)", pay_index = "Pay Index",
w = "Wins", l = "Losses", w_l_percent = "Win-Loss Percentage") |>
## add footer
add_footer_row(values = "Source: baseball-reference.com", colwidths = 8)
```
Additional {flextable} resources can be found here:
* [{flextable} R package -- reference site](https://davidgohel.github.io/flextable/)
* David Gohel, [Using the flextable R package](https://ardata-fr.github.io/flextable-book/index.html)
## Another example
Earlier in this course, in the workflow example of the Statistics Canada New Housing Price Index (NHPI), we saw some other examples of data tables that were made publication-ready through {kableExtra} and {flextable} formatting. You are encouraged to review that code to see some of the other options that these packages make available.
-30-