-
Notifications
You must be signed in to change notification settings - Fork 5
/
search-updates.Rmd
141 lines (98 loc) · 5.02 KB
/
search-updates.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: "Updating a Systematic Search"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Updating a Systematic Search}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## **Setup**
First, install and load the ASySD package.
```{r setup}
# devtools::install_github("camaradesuk/ASySD")library(ASySD)
library(ASySD)
```
## **Loading citation data**
Load citations from an **existing search** file using the `load_search()` function. In this example, we use a csv format. You can change the *method* argument to upload alternative file types.
```{r}
existing_search <- load_search("old_sr_search.csv", method="csv")
```
Load citations from a **new systematic search.**
```{r}
new_search <- load_search("new_sr_search.csv", method="csv")
```
### Combine old and new citation data
Before deduplication, we must bind the citations into one dataframe. First, give each search a different source so that we can specify which citations to retain.
```{r}
existing_search$source <- "old"
new_search$source <- "new"
all_citations <- plyr::rbind.fill(existing_search, new_search)
```
## Automated deduplication
Remove duplicate citations automatically using the `dedup_citations` function. Here we have specified the argument `merge=TRUE` to indicate that we want to merge duplicate records and have a record of which citations have been merged into one. We have specified in the `keep_source` argument that we wish to preferentially retain old citations. In practice, this means that the duplicate_id chosen for a set of records will preferentially be the record_id of a citation in the OLD systematic search. This is to facilitate easy record linkage - see later.
```{r}
results <- dedup_citations(all_citations, merge_citations = TRUE, keep_source = "old")
```
The `dedup_citations` function returns a list of two dataframes by default. The first contains unique citations after duplicates were removed automatically by ASySD. In most cases, this will remove the vast majority of duplicates. There will likely be some duplicates remaining which need manual review by a human (see next step).
```{r}
unique_citations <- results$unique
```
## Manual deduplication
To check for additional duplicates, get the dataframe of citations for manual review. You can review within R or export as a csv / excel file to go through each row of pairs.
```{r}
potential_duplicates <- results$manual_dedup
```
After reviewing the pairs, create a dataframe contianing **only the true duplicate pairs.** Here, all the suggested duplicates look like REAL duplicates. If you exported to a file, you could remove the rows which are not duplicates and re-upload the rows with true duplicates.
```{r}
true_duplicates <- potential_duplicates
```
Now, to get the final deduplication results, use the `dedup_citations_add_manual()`function. To account for additional duplicates you have reviewed, add them into the *additional_pairs* argument.
```{r}
final_results <- dedup_citations_add_manual(unique_citations, additional_pairs = true_duplicates, merge_citations = TRUE, keep_source = "old")
```
## Find new citations identified in update
Now we have a final set of unique citations, how can we find the new citations we added with our latest systematic search?
```{r}
library(dplyr)
new_citations <- final_results %>%
filter(source == "new")
new_citations %>%
tail(3) %>%
gt::gt() %>%
gt::cols_hide(c(abstract))
```
Lets also have a look at the citations identified in both searches by removing citations with a single source.
```{r}
crossover <- final_results %>%
filter(!source == "new") %>%
filter(!source == "old")
crossover %>%
tail(3) %>%
gt::gt() %>%
gt::cols_hide(c(abstract))
```
To keep good records, we don't want to lose track of identifiers for studies we have already included in a review. This is why specifying the citation to keep was important! To illustrate this, look specifically at the citations present in the old search.
```{r}
old_citations <- final_results %>%
filter(grepl("old", source)) # find all citations in old search
```
We can check that the duplicate ids here refer to the original record id in the existing_citations dataframe we imported. As you can see, they are all present. In the record_ids column you can see the different record_ids that have merged into a single citation. In case you make a mistake or don't specify the record_id to keep as the duplicate_id, you can use these to trace back your citations to the original dataframes.
```{r}
old_citations_check <- old_citations %>%
filter(duplicate_id %in% existing_search$record_id) #check that all citations use the OLD record_id as the duplicate_id
crossover %>%
tail(3) %>%
gt::gt() %>%
gt::cols_hide(c(abstract))
```
## Exporting results
Once deduplication is complete, you can export the new unique records to a file for import into reference managers or systematic review software.
```{r}
write_citations(new_citations, type="txt", filename="unique.txt")
```