-
Notifications
You must be signed in to change notification settings - Fork 1
/
NEWS.Rmd
349 lines (303 loc) · 23.3 KB
/
NEWS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
---
output: github_document
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Version 0.5.1.9000
## New features
## Changes
## Bug fixes
# Version 0.5.1
## New features
## Changes
## Bug fixes
+ `links()` - Incorrect results in some situations. Resolved.
+ `links_af_probabilistic()` - Failed in some situations. Resolved.
# Version 0.5.0
## New features
+ New option (`"semi"`) for the `batched` argument in `links()`.
All matches are compared against the record-set in the next iteration.
Therefore, the number of record-pairs increase exponentially as new matches are found.
This means fewer record-pairs (memory usage) but a longer run time compared
to the `"no"` option. Conversely, it leads to more record-pairs (memory usage)
but a shorter run time compared to the `"yes"` option.
+ New argument (`batched`) in `episodes()`
+ New argument (`split`) in `episodes()`. Split the analysis in `N`-splits of `strata`.
This leads to fewer record-pairs (and memory usage) but a longer run time.
+ New argument (`decode`) in `as.data.frame.pid()`, `as.data.frame.epid()` and
`as.data.frame.pane()`
+ New function - `episodes_af_shift()`. A more vectorised approach to `episodes()` based on `epidm::group_time()`.
+ New function - `links_wf_episodes()`. Implantation of `episodes()` using `links()`.
## Changes
+ Optimised `episodes()` and `links()`. Each iteration now uses less
time and memory.
+ `link_id` slot in `pid` objects is now a `list`.
+ `links()` - records with missing values in a `sub_criteria` are now
skipped at the corresponding iteration.
+ Updated argument in `links()`- `recursive`. This now takes any of
three options
`[c("linked", "unlinked", "none")]` .
`[c("linked", "unlinked")]` collectively were previously `[TRUE]`,
while `["none"]` was previously `[FALSE]`.
+ `as.epids()` now calls `make_episodes()`.
+ The default value for the `window` argument in `partitions()` is now `NULL`
+ `as.data.frame()` and `as.data.list()` now only creates elements/fields
from non-empty fields
+ `id` and `gid` slots in `number_line` objects are now `integer(0)` by default.
+ `episode_group()`, `record_group()` and `range_match_legacy()` have been removed.
+ `["recurisve"]` episodes from `episodes()` are now presented as `["rolling"]`
episodes with `reference_event = "all_records"` i.e
+ `Old syntax ~ episodes(..., episode_type == "recursive")`
+ `New syntax ~ episodes(..., episode_type == "rolling", reference_event = "all_records")`
## Bug fixes
+ When `recursive` was `TRUE`, `links()` ended prematurely and therefore missed some matches. Resolved.
+ `recurrence_sub_criteria` in `episodes()` was not implemented correctly and lead to incorrect linkage result in some instances. Resolved.
+ `overlap_method()` - logical tests recycled incorrectly. Resolved.
+ `check_links` argument - Option `"g"` implemented as option `"l"`. Resolved.
+ `make_pairs_wf_source()`. Created incorrect pairs. Resolved.
+ `case_sub_criteria` and `recurrence_sub_criteria` in `episodes()` led to incorrect results. Resolved.
# Version 0.4.2
## New features
+ New argument in `merge_ids()` - `shrink` and `expand`.
+ New S3 method for class 'd_report' - `plot`.
+ New S3 method for class 'sub_criteria' - `format`.
+ New function - `true()`. Predefined logical test for use with `sub_criteria()`.
+ New function - `false()`. Predefined logical test for use with `sub_criteria()`.
+ New argument in `links()`- `batched`. Specify if all record pairs are created or compared at once (`"no"`) or in batches (`"yes"`).
+ New argument in `links()`- `repeats_allowed`. Specify if record-pairs with duplicate elements should be created.
+ New argument in `links()`- `permutations_allowed`. Specify if permutations of the same record-pair should be created.
+ New argument in `links()`- `ignore_same_source`. Specify if record-pairs from different datasets should be created.
<!-- + New argument in `links_wf_probabilistic()`- `return_weights`. XXXXXX. -->
+ New argument in `eval_sub_criteria()`- `depth`. First order of recursion.
+ New function - `sets()` and `make_sets()`. Create permutations of record-sets.
## Changes
+ `links()` - When `shrink` is `TRUE`, records in a record-group must meet every listed match `criteria` and `sub_criteria`. For example, if `pid_cri` is 3, then the record must have meet matched another on the the first three match criteria.
+ `links()` - `pid@iteration` now tracks when a record was dealt with instead of when it was assigned to a record-group. For example, a record can be closed (matched or not matched) at iteration 1 but assigned to a record-group at iteration 5.
+ `make_pairs()` - `x.*` and `y.*` values in the output are now swapped.
+ `sub_criteria` can now export any data created by `match_func`. To do this, `match_func` must export a `list`, where the first element is a logical object. See an example below.
```{r warning = FALSE}
library(diyar)
val <- rep(month.abb[1:5], 2); val
match_and_export <- function(x, y){
output <- list(x == y,
data.frame(x_val = x, y_val = y, is_match = x == y))
return(output)
}
sub.cri.1 <- sub_criteria(
val, match_funcs = list(match.export = match_and_export)
)
format(sub.cri.1, show_levels = TRUE)
eval_sub_criteria(sub.cri.1)
```
+ `links` can now export any data created within a `sub_criteria`. To do this, the `sub_criteria` must be created as described above. See an example below
```{r warning = FALSE}
val <- 1:5
diff_one_and_export <- function(x, y){
diff <- x - y
is_match <- diff <= 1
output <- list(is_match,
data.frame(x_val = x, y_val = y, diff = diff, is_match = is_match))
return(output)
}
sub.cri.2 <- sub_criteria(
val, match_funcs = list(diff.export = diff_one_and_export)
)
links(
criteria = "place_holder",
sub_criteria = list("cr1" = sub.cri.2))
```
## Bug fixes
+ `summary.epid()` - Incorrect count for '`by episode type`'. Resolved.
+ `episodes()` - Incorrect results in some instances with `skip_order`. Resolved.
+ `make_ids()` - Did not capture all records in that should be in a record-group when matches are recursive. Resolved.
+ `make_pairs()` - Incorrect record-pairs in some instances. Resolved.
+ `eval_sub_criteria()` - When output of `match_func` is length one, it's not recycled. Resolved.
+ `reverse_number_line()` - Incorrect results in some instances. Resolved.
+ `links()`- Incorrect `iteration` (`pids` slot) for non-matches. Resolved.
+ `links()` and `episodes()` - Timing for each iteration was incorrect. Resolved.
# Version 0.4.1
## New features
+ New function - `overlap_method_names()`. Overlap methods for a corresponding overlap method codes.
+ Memory usage added to `*with_report` options for display.
## Changes
+ `"chain"` overlap method split into `"x_chain_y"` and `"y_chain_x"`. `"chain"` will continue to be supported as a keyword for `"x_chain_y" OR "y_chain_x"` method
+ `"across"` overlap method split into `"x_across_y"` and `"y_across_x"`. `"across"` will continue to be supported as a keyword for `"x_across_y" OR "y_across_x"` methods
+ `"inbetween"` overlap method split into `"x_inbetween_y"` and `"y_inbetween_x"`. `"inbetween"` will continue to be supported as a keyword for `"x_inbetween_y" OR "y_inbetween_x"` methods
+ Optimised `overlaps()`.
+ Some overlap method codes have changed. Please review any previously specified codes with `overlap_method_names()`.
## Bug fixes
+ `make_batch_pairs()` (internal) created invalid record pairs. Resolved.
# Version 0.4.0
## New features
+ New function - `reframe()`. Modify the attributes of a `sub_criteria` object.
+ New function - `link_records()`. Record linkage by creating all record pairs as opposed to batches as with `link()`.
+ New function - `make_pairs()`. Create every combination of records-pairs for a given dataset.
+ New function - `make_pairs_wf_source()`. Create records-pairs from different sources only.
+ New function - `make_ids()`. Convert an edge list to a group identifier.
+ New function - `merge_ids()`. Merge two group identifiers.
+ New function - `attrs()`. Pass a set of attributes to one instance of `match_funcs` or `equal_funcs`.
## Changes
+ Optimised `episodes_wf_splits()`
+ Optimised `episodes()` and `links()`. Reduced processing times.
+ Three new options for the `display` argument. `"progress_with_report"`, `"stats_with_report"` and `"none_with_report"`. Creates a `d_report`; a status of the analysis over its run time.
+ `eval_sub_criteria()`. Record-pairs are no longer created in the function. Therefore, `index_record` and `sn` arguments have been replaced with `x_pos` and `y_pos`.
+ `link_records()` and `links_wf_probabilistic()`. The `cmp_threshold` argument has been renamed to `attr_threshold`.
+ `show_labels` argument in `schema()`. Two new options - `"wind_nm"` and `"length"` to replace `"length_label"`.
## Bug fixes
+ Incorrect `wind_id` list in `episodes(..., data_link = "XX")` in . Resolved.
+ Incorrect `link_id` in `links(..., recursive = TRUE)`. Resolved.
+ `iteration` not recorded in some situations with `episodes()`. Resolved.
+ `skip_order` ends an open episode. Resolved.
+ `NA` in `dist_wind_index` and `dist_epid_index` when `sn` is supplied. Resolved.
+ `overlap_method_codes()` - overlap method codes not recycled properly. Resolved.
# Version 0.3.1
## New features
+ New function - `delink()`. Unlink identifiers.
+ New function - `episodes_wf_splits()`. Wrapper function of `episodes()`. Better optimised for handling datasets with many duplicate records.
+ New function - `combi()`. Numeric codes for unique combination of vectors.
+ New function - `attr_eval()`. Recursive evaluation of a function on each attribute of a `sub_criteria`.
## Changes
+ Two new `case_nm` values - `Case_CR` and `Recurrence_CR` which are `Case` and `Recurrence` without a sub-criteria match.
## Bug fixes
+ Corrected length arrows in `schema.epid`.
+ Corrected outcome of `eval_sub_criteria` with 1 result.
# Version 0.3.0
## New features
+ New function - `links_wf_probabilistic()`. Probabilistic record linkage.
+ New function - `partitions()`. Spilt events into sections in time.
+ New function - `schema()`. Plot schema diagrams for `pid`, `epid`, `pane` and `number_line` objects.
+ New functions - `encode()` and `decode()`. Encode and decode slots values to minimise memory usage.
+ New argument in `episodes()` - `case_sub_criteria` and `recurrence_sub_criteria`. Additional matching conditions for temporal links.
+ New argument in `episodes()`- `case_length_total` and `recurrence_length_total`. Number of temporal links required for a `window`/`episode`.
+ New argument in `links()` - `recursive`. Control if matches can spawn new matches.
+ New argument in `links()` - `check_duplicates`. Control the checking of logical tests on duplicate values. If `FALSE`, results are recycled for the duplicates.
+ `as.data.frame` and `as.list` S3 methods for the `pid`, `number_line`, `epid`, `pane` objects.
+ New option for `episode_type` in `episodes()` - "recursive". For recursive episodes where every linked events can be used as a subsequent index event.
+ `recurrence_from_last` renamed to `reference_event` and given two new options.
## Changes
+ `episodes()` and `links()`. Speed improvements.
+ Default time zone for an `epid_interval` or `pane_interval` with `POSIXct` objects is now "GMT".
+ `number_line_sequence()` - splits number_line objects. Also available as a `seq` method.
+ `epid_total`, `pid_total` and `pane_total` slots are populated by default. No need to used `group_stats` to get these.
+ `to_df()` - Removed. Use `as.data.frame()` instead.
+ `to_s4()` - Now an internal function. It's no longer exported.
+ `compress_number_line()` - Now an internal function. It's no longer exported. Use `episodes()` instead.
+ `sub_criteria()` - produces a `sub_criteria` object. Nested "AND" and "OR" conditions are now possible.
+ `case_overlap_methods`, `recurrence_overlap_methods` and `overlap_methods` now take `integer` codes for different combinations of overlap methods. See `overlap_methods$options` for the full list. `character` inputs are still supported.
## Bug fixes
+ `"Single-record"` was wrong in `links` summary output. Resolved.
# Version 0.2.0
## New features
+ Better support for `Inf` in `number_line` objects.
+ Can now use multiple `case_length` or `recurrence_length` for the same event.
+ Can now use multiple `overlap_methods` for the corresponding `case_length` and `recurrence_length`.
+ New function `links()` to replace `record_group()`.
+ New function `sub_criteria()`. The new way of supplying a `sub_criteria` in `links()`.
+ New functions `exact_match()`, `range_match()` and `range_match_legacy()`. Predefined logical tests for use with `sub_criteria()`. User-defined tests can also be used. See `?sub_criteria`.
+ New function `custom_sort()` for nested sorting.
+ New function `epid_lengths()` to show the required `case_length` or `recurrence_length` for an analyses. Useful in confirming the required `case_length` or `recurrence_length` for episode tracking.
+ New function `epid_windows()`. Shows the period a `date` will overlap with given a particular `case_length` or `recurrence_length`. Useful in confirming the required `case_length` or `recurrence_length` for episode tracking.
+ New argument - `strata` in `links()`. Useful for stratified data linkage. As in stratified episode tracking, a record with a missing `strata` (`NA_character_`) is skipped from data linkage.
+ New argument - `data_links` in `links()`. Unlink record groups that do not include records from certain data sources
+ New convenience functions
+ `listr()`. Format `atomic` vectors as a written list.
+ `combns()`. An extension of `combn` to generate permutations not ordinarily captured by `combn`.
+ New `iteration` slot for `pid` and `epid` objects
+ New `overlap_method` - `reverse()`
## Changes
+ `number_line()` - `l` and `r` must have the same length or be `1`.
+ `episodes()` - `case_nm` differentiates between duplicates of `"Case"` (`"Duplicate_C"`) and `"Recurrent"` events (`"Duplicate_R"`).
+ Strata and episode-level options for most arguments. This gives greater flexibility within the same instance of `episodes()`.
+ Episode-level - The behaviour for each episode is determined by the corresponding option for its index event (`"Case"`).
+ `episode_type` - simultaneously track both `"fixed"` and `"rolling"` episodes.
+ `skip_if_b4_lengths` - simultaneously track episodes where events before a cut-off range are both skipped and not skipped.
+ `episode_unit` - simultaneously track episodes by different units of time.
+ `case_for_recurrence` - simultaneously track `"rolling"` episodes with and without an additional case window for recurrent events.
+ `recurrence_from_last` - simultaneously track `"rolling"` episodes with reference windows calculated from the first and last event of the previous window.
+ Strata-level - The behaviour for each episode is determined by the corresponding option for its `strata`. Options must be the same in each strata.
+ `from_last` - simultaneously track episodes in both directions of time - past to present and present to past.
+ `episodes_max` - simultaneously track different number of episodes within the dataset.
+ `include_overlap_method` - `"overlap"` and `"none"` will not be combined with other methods.
+ `"overlap"` - mutually inclusive with the other methods, so their inclusion is not necessary.
+ `"none"` - mutually exclusive and prioritised over the other methods (including `"none"`), so their inclusion is not necessary.
+ Events can now have missing cut-off points (`NA_real_`) or periods (`number_line(NA_real_, NA_real_)`) `case_length` and `recurrence_length`. This ensures that the event does not become an index case however, it can still be part of different episode. For reference, an event with a missing `strata` (`NA_character_`) ensures that the event does not become an index case nor part of any episode.
## Bug fixes
+ `fixed_episodes`, `rolling_episodes` and `episode_group` - `include_index_period` didn't work in certain situations. Corrected.
+ `fixed_episodes`, `rolling_episodes` and `episode_group` - `dist_from_wind` was wrong in certain situations. Corrected.
# Version 0.1.0
## New features
+ New argument in `record_group()` - `strata`. Perform record linkage separately within subsets of a dataset.
+ New argument in `overlap()`, `compress_number_line()`, `fixed_sepisodes()`, `rolling_episodes()` and `episode_group()` - `overlap_methods` and `methods`. Replaces `overlap_method` and `method` respectively. Use different sets of methods within the same dataset when grouping episodes or collapsing `number_line` objects. `overlap_method` and `method` only permits 1 method per per dataset.
+ New slot in `epid` objects - `win_nm`. Shows the type of window each event belongs to i.e. case or recurrence window
+ New slot in `epid` objects - `win_id`. Unique ID for each window. The ID is the `sn` of the reference event for each window
+ Format of `epid` objects updated to reflect this
+ New slot in `epid` objects - `dist_from_wind`. Shows the duration of each event from its window's reference event
+ New slot in `epid` objects - `dist_from_epid`. Shows the duration of each event from its episode's reference event
+ New argument in `episode_group()` and `rolling_episodes()` - `recurrence_from_last`. Determine if reference events should be the first or last event from the previous window.
+ New argument in `episode_group()` and `rolling_episodes()` - `case_for_recurrence`. Determine if recurrent events should have their own case windows or not.
+ New argument in `episode_group()`, `fixed_episodes()` and `rolling_episodes()` - `data_links`. Unlink episodes that do not include records from certain `data_source(s)`.
+ `episode_group()`, `fixed_episodes()` and `rolling_episodes()` - `case_length` and `recurrence_length` arguments. You can now use a range (`number_line` object).
+ New argument in `episode_group()`, `fixed_episodes()` and `rolling_episodes()` - `include_index_period`. If `TRUE`, overlaps with the index event or period are grouped together even if they are outside the cut-off range (`case_length` or `recurrence_length`).
+ New slot in `pid` objects - `link_id`. Shows the record (`sn` slot) to which every record in the dataset has matched to.
+ New function - `invert_number_line()`. Invert the `left` and/or `right` points to the opposite end of the number line
+ New accessor functions -`left_point(x)<-`, `right_point(x)<-`, `start_point(x)<-` and `end_point(x)<-`
## Changes
+ `overlap()` renamed to `overlaps()`. `overlap()` is now a convenience `overlap_method` to capture ANY kind of overlap.
+ `"none"` is another convenience `overlap_method` for NO kind of overlap
+ `expand_number_line()` - new options for `point`; `"left"` and `"right"`
+ `compress_number_line()` - compressed `number_line` object inherits the direction of the widest `number_line` among overlapping group of `number_line` objects
+ `overlap_methods` - have been changed such that each pair of `number_line` objects can only overlap in one way. E.g.
+ `"chain"` and `"aligns_end"` used to be possible but this is now considered a `"chain"` overlap only
+ `"aligns_start"` and `"aligns_end"` use to be possible but this is now considered an `"exact"` overlap
+ `number_line_sequence()` - Output is now a `list`.
+ `number_line_sequence()` - now works across multiple `number_line` objects.
+ `to_df()` - can now change `number_line` objects to data.frames.
+ `to_s4()` can do the reverse.
+ `epid` objects are the default outputs for `fixed_episodes()`, `rolling_episodes()` and `episode_group()`
+ `pid` objects are the default outputs for `record_group()`
+ In episode grouping, the `case_nm` for events that were skipped due to `rolls_max` or `episodes_max` is now `"Skipped"`.
+ In `episode_group()` and `record_group()`, `sn` can be negative numbers but must still be unique
+ Optimised `episode_group()` and `record_group()`. Runs just a little bit faster ...
+ Relaxed the requirement for `x` and `y` to have the same lengths in overlap functions.
+ The behaviour of overlap functions will now be the same as that of standard R logical tests
+ `episode_group` - `case_length` and `recurrence_length` arguments. Now accepts negative numbers.
+ negative "lengths" will collapse two periods into one, if the second one is within some days before the `end_point()` of the first period.
+ if the "lengths" are larger than the `number_line_width()`, both will be collapsed if the second one is within some days (or any other `episode_unit`) before the `start_point()` of the first period.
+ cheat sheet updated
## Bug fixes
+ Recurrence was not checked if the initial case event had no duplicates. Resolved
+ `case_nm` wasn't right for rolling episodes. Resolved
# Version 0.0.3
## Changes
+ [#7](https://github.com/OlisaNsonwu/diyar/issues/7) `episode_group()`, `fixed_episodes()` and `rolling_episodes()` - optimized to take less time when working with large datasets
+ `episode_group()`, `fixed_episodes()` and `rolling_episodes()` - `date` argument now supports numeric values
+ `compress_number_line()` - the output (`gid` slot) is now a group identifier just like in `epid` objects (`epid_interval`)
# Version 0.0.2
## New feature
+ `pid` S4 object class for results of `record_group()`. This will replace the current default (`data.frame`) in the next major release
+ `epid` S4 object class for results of `episode_group()`, `fixed_episodes()` and `rolling_episodes()`. This will replace the current default (`data.frame`) in the next release
+ `to_s4()` and `to_s4` argument in `record_group()`, `episode_group()`, `fixed_episodes()` and `rolling_episodes()`. Changes their output from a `data.frame` (current default) to `epid` or `pid` objects
+ `to_df()` changes `epid` or `pid` objects to a `data.frame`
+ `deduplicate` argument from `fixed_episodes()` and `rolling_episodes()` added to `episode_group()`
## Changes
+ `fixed_episodes()` and `rolling_episodes()` are now wrapper functions of `episode_group()`. Functionality remains the same but now includes all arguments available to `episode_group()`
+ Changed the output of `fixed_episodes()` and `rolling_episodes()` from `number_line` to `data.frame`, pending the change to `epid` objects
+ `pid_cri` column returned in `record_group` is now `numeric`. `0` indicates no match.
+ columns can now be used as `criteria` multiple times `record_group()`
+ [#6](https://github.com/OlisaNsonwu/diyar/issues/6) `number_line` objects can now be used as a `criteria` in `record_group()`
## Bug fixes
+ [#3](https://github.com/OlisaNsonwu/diyar/issues/3) - Resolved a bug with `episode_unit` in `episode_group()`
+ [#4](https://github.com/OlisaNsonwu/diyar/issues/4) - Resolved a bug with `bi_direction` in `episode_group()`
# Version 0.0.1
## Features
+ `fixed_episodes()` and `rolling_episodes()` - Group records into fixed or rolling episodes of events or period of events.
+ `episode_group()` - A more comprehensive implementation of `fixed_episodes()` and `rolling_episodes()`, with additional features such as user defined case assignment.
+ `record_group()` - Multistage deterministic linkage that addresses missing data.
+ `number_line` S4 object.
+ Used to represent a range of numeric values to match using `record_group()`
+ Used to represent a period in time to be grouped using `fixed_episodes()`, `rolling_episodes()` and `episode_group()`
+ Used as the returned output of `fixed_episodes()` and `rolling_episodes()`