Releases: OlisaNsonwu/diyar
Releases · OlisaNsonwu/diyar
v.0.5.1
v.0.5.0
Version 0.5.0
New features
- New option (
"semi"
) for thebatched
argument inlinks()
. All
matches are compared against the record-set in the next iteration.
Therefore, the number of record-pairs increase exponentially as new
matches are found. This means fewer record-pairs (memory usage) but a
longer run time compared to the"no"
option. Conversely, it leads to
more record-pairs (memory usage) but a shorter run time compared to
the"yes"
option. - New argument (
batched
) inepisodes()
- New argument (
split
) inepisodes()
. Split the analysis in
N
-splits ofstrata
. This leads to fewer record-pairs (and memory
usage) but a longer run time. - New argument (
decode
) inas.data.frame.pid()
,
as.data.frame.epid()
andas.data.frame.pane()
- New function -
episodes_af_shift()
. A more vectorised approach to
episodes()
based onepidm::group_time()
. - New function -
links_wf_episodes()
. Implantation ofepisodes()
usinglinks()
.
Changes
- Optimised
episodes()
andlinks()
. Each iteration now uses less
time and memory. link_id
slot inpid
objects is now alist
.links()
- records with missing values in asub_criteria
are now
skipped at the corresponding iteration.- Updated argument in
links()
-recursive
. This now takes any of
three options[c("linked", "unlinked", "none")]
.
[c("linked", "unlinked")]
collectively were previously[TRUE]
,
while["none"]
was previously[FALSE]
. as.epids()
now callsmake_episodes()
.- The default value for the
window
argument inpartitions()
is now
NULL
as.data.frame()
andas.data.list()
now only creates
elements/fields from non-empty fieldsid
andgid
slots innumber_line
objects are nowinteger(0)
by
default.episode_group()
,record_group()
andrange_match_legacy()
have
been removed.["recurisve"]
episodes fromepisodes()
are now presented as
["rolling"]
episodes withreference_event = "all_records"
i.eOld syntax ~ episodes(..., episode_type == "recursive")
New syntax ~ episodes(..., episode_type == "rolling", reference_event = "all_records")
Bug fixes
- When
recursive
wasTRUE
,links()
ended prematurely and therefore
missed some matches. Resolved. recurrence_sub_criteria
inepisodes()
was not implemented
correctly and lead to incorrect linkage result in some instances.
Resolved.overlap_method()
- logical tests recycled incorrectly. Resolved.check_links
argument - Option"g"
implemented as option"l"
.
Resolved.make_pairs_wf_source()
. Created incorrect pairs. Resolved.case_sub_criteria
andrecurrence_sub_criteria
inepisodes()
led
to incorrect results. Resolved.
v.0.4.3
Version 0.4.3
New features
Changes
Bug fixes
case_sub_criteria
andrecurrence_sub_criteria
inepisodes()
led to incorrect results. Resolved.
0.4.2
New features
- New argument in
merge_ids()
-shrink
andexpand
. - New S3 method for class ‘d_report’ -
plot
. - New S3 method for class ‘sub_criteria’ -
format
. - New function -
true()
. Predefined logical test for use with
sub_criteria()
. - New function -
false()
. Predefined logical test for use with
sub_criteria()
. - New argument in
links()
-batched
. Specify if all record pairs
are created or compared at once ("no"
) or in batches ("yes"
). - New argument in
links()
-repeats_allowed
. Specify if
the record pairs with duplicate elements should be created. - New argument in
links()
-permutations_allowed
. Specify if
permutations of the same record pair should be created. - New argument in
links()
-ignore_same_source
. Specify if
record pairs from different datasets should be created. - New argument in
eval_sub_criteria()
-depth
. First order of
recursion. - New function -
sets()
andmake_sets()
. Create permutations of
record sets.
Changes
links()
- Whenshrink
isTRUE
, records in a record group must
meet every listed matchcriteria
andsub_criteria
. For example,
ifpid_cri
is 3, then the record must have met matched another on
the first three match criteria.links()
-pid@iteration
now tracks when a record was dealt with
instead of when it was assigned to a record group. For example, a
record can be closed (matched or not matched) at iteration 1 but
assigned to a record group at iteration 5.make_pairs()
-x.*
andy.*
values in the output are now
swapped.sub_criteria
can now export any data created bymatch_func
. To
do this,match_func
must export alist
, where the first element
is a logical object. See the example below.
library(diyar)
val <- rep(month.abb[1:5], 2); val
#> [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
match_and_export <- function(x, y){
output <- list(x == y,
data.frame(x_val = x, y_val = y, is_match = x == y))
return(output)
}
sub.cri.1 <- sub_criteria(
val, match_funcs = list(match.export = match_and_export)
)
format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#> [1] 1 0 0 0 0 1 0 0 0 0
#>
#> $mf.0.1
#> x_val y_val is_match
#> 1 Jan Jan TRUE
#> 2 Feb Jan FALSE
#> 3 Mar Jan FALSE
#> 4 Apr Jan FALSE
#> 5 May Jan FALSE
#> 6 Jan Jan TRUE
#> 7 Feb Jan FALSE
#> 8 Mar Jan FALSE
#> 9 Apr Jan FALSE
#> 10 May Jan FALSE
links
can now export any data created within asub_criteria
. To
do this, thesub_criteria
must be created as described above. See
an example below
val <- 1:5
diff_one_and_export <- function(x, y){
diff <- x - y
is_match <- diff <= 1
output <- list(is_match,
data.frame(x_val = x, y_val = y, diff = diff, is_match = is_match))
return(output)
}
sub.cri.2 <- sub_criteria(
val, match_funcs = list(diff.export = diff_one_and_export)
)
links(
criteria = "place_holder",
sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (No hits)"
#>
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#> x_val y_val diff is_match
#> 1 5 1 4 FALSE
#> 2 4 1 3 FALSE
#> 3 3 1 2 FALSE
#> 4 2 1 1 TRUE
#> 5 1 1 0 TRUE
#>
#>
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#> x_val y_val diff is_match
#> 1 5 3 2 FALSE
#> 2 4 3 1 TRUE
#> 3 3 3 0 TRUE
#>
#>
#> $export$cri.1$iteration.3
#> $export$cri.1$iteration.3$mf.0.1
#> x_val y_val diff is_match
#> 1 5 5 0 TRUE
Bug fixes
summary.epid()
- Incorrect count for ‘by episode type
’.
Resolved.episodes()
- Incorrect results in some instances with
skip_order
. Resolved.make_ids()
- Did not capture all records in that should be in a
record-group when matches are recursive. Resolved.make_pairs()
- Incorrect record-pairs in some instances. Resolved.eval_sub_criteria()
- When output ofmatch_func
is length one,
it’s not recycled. Resolved.reverse_number_line()
- Incorrect results in some instances.
Resolved.links()
- Incorrectiteration
(pids
slot) for non-matches.
Resolved.links()
andepisodes()
- Timing for each iteration was
incorrect. Resolved.
v.0.4.1
New features
New function - overlap_method_names()
. Overlap methods for a
corresponding overlap method codes.
Changes
"chain"
overlap method split into"x_chain_y"
and"y_chain_x"
.
"chain"
will continue to be supported as a keyword for
"x_chain_y" OR "y_chain_x"
method"across"
overlap method split into"x_across_y"
and
"y_across_x"
."across"
will continue to be supported as a
keyword for"x_across_y" OR "y_across_x"
methods"inbetween"
overlap method split into"x_inbetween_y"
and
"y_inbetween_x"
."inbetween"
will continue to be supported as a
keyword for"x_inbetween_y" OR "y_inbetween_x"
methods- Optimised
overlaps()
. - Changed overlap method codes. Please review any previously specified
codes withoverlap_method_names()
.
Bug fixes
make_batch_pairs()
(internal) created invalid record pairs.
Resolved.
v.0.4.0
New features
- New function -
reframe()
. Modify the attributes of a
sub_criteria
object. - New function -
link_records()
. Record linkage by creating all
record pairs as opposed to batches as withlink()
. - New function -
make_pairs()
. Create every combination of
records-pairs for a given dataset. - New function -
make_pairs_wf_source()
. Create records-pairs from
different sources only. - New function -
make_ids()
. Convert an edge list to a group
identifier. - New function -
merge_ids()
. Merge two group identifiers. - New function -
attrs()
. Pass a set of attributes to one instance
ofmatch_funcs
orequal_funcs
.
Changes
- Optimised
episodes_wf_splits()
- Optimised
episodes()
andlinks()
. Reduced processing times. - Three new options for the
display
argument.
"progress_with_report"
,"stats_with_report"
and
"none_with_report"
. Creates ad_report
; a status of the analysis
over its run time. eval_sub_criteria()
. Record-pairs are no longer created in the
function. Therefore,index_record
andsn
arguments have been
replaced withx_pos
andy_pos
.link_records()
andlinks_wf_probabilistic()
. Thecmp_threshold
argument has been renamed toattr_threshold
.show_labels
argument inschema()
. Two new options -"wind_nm"
and"length"
to replace"length_label"
.
Bug fixes
- Incorrect
wind_id
list inepisodes()
whendata_link
is used.
Resolved. - Incorrect
link_id
inlinks()
whenrecursive
is used. Resolved. iteration
not recorded in some situations withepisodes()
.
Resolved.skip_order
ends an open episode. Resolved.NA
indist_wind_index
anddist_epid_index
whensn
is
supplied. Resolved.overlap_method_codes()
- overlap method codes not recycled
properly. Resolved.
v.0.3.1
New features
- New function -
delink()
. Unlink identifiers. - New function -
episodes_wf_splits()
. Wrapper function of
episodes()
for better optimised handling of duplicates records. - New function -
combi()
. Numeric codes for unique combination of
vectors. - New function -
attr_eval()
. Recursive evaluation of a function on
each attribute of asub_criteria
.
Changes
- Two new
case_nm
values -Case_CR
andRecurrence_CR
which are
Case
andRecurrence
without a sub-criteria match.
Bug fixes
- Corrected length arrows in
schema.epid
. - Corrected outcome of
eval_sub_criteria
with 1 result.
v.0.3.0
New features
- New function -
links_wf_probabilistic()
. Probabilistic record
linkage. - New function -
partitions()
. Spilt events into sections in time. - New function -
schema()
. Plot schema diagrams forpid
,epid
,
pane
andnumber_line
objects. - New functions -
encode()
anddecode()
. Encoding and decoding
slots values to minimise memory usage. - New argument -
case_sub_criteria
andrecurrence_sub_criteria
in
episodes()
. Additional matching conditions for temporal links. - New argument -
case_length_total
andrecurrence_length_total
in
episodes()
. Number of temporal links required for a
window
/episode
. - New argument -
recursive
inlinks()
. Control if matches can
spawn new matches. - New argument -
check_duplicates
inlinks()
. Control the checking
of logical tests on duplicate values. IfFALSE
, results are
recycled for the duplicates. as.data.frame
andas.list
for thepid
,number_line
,epid
,
pane
objects.- A new type of episode - “recursive” episodes.
recurrence_from_last
renamed toreference_event
and given two
new options.- Optimised
episodes()
andlinks()
. Speed improvements.
Changes
- Default time zone for an
epid_interval
orpane_interval
with
POSIXct
objects is now “GMT”. number_line_sequence()
- splits number_line objects. Also
available as aseq
method.epid_total
,pid_total
andpane_total
slots are populated by
default. No need to usedgroup_stats
to get these.to_df()
- Removed. Useas.data.frame()
instead.to_s4()
- Now an internal function. It’s no longer exported.compress_number_line()
- Now an internal function. It’s no longer
exported. Useepisodes()
instead.sub_criteria()
- produces asub_criteria
object. Nested “AND”
and “OR” conditions are now possible.case_overlap_methods
,recurrence_overlap_methods
and
overlap_methods
now takeinteger
codes for different
combinations of overlap methods. Seeoverlap_methods$options
for
the full list.character
inputs are still supported.
Bug fixes
"Single-record"
was wrong inlinks
summary output. Resolved.
v.0.2.0
New features
- Better support for
Inf
innumber_line
objects. - Can now use multiple
case_lengths
orrecurrence_lengths
for the same event.- Can now use multiple
overlap_methods
for the correspondingcase_lengths
andrecurrence_lengths
.
- Can now use multiple
- New function
links()
to replacerecord_group()
. - New function
sub_criteria()
. The new way of supplying asub_criteria
inlinks()
. - New functions
exact_match()
,range_match()
andrange_match_legacy()
. Predefined logical tests for use withsub_criteria()
. User-defined tests can also be used. See?sub_criteria
. - New function
custom_sort()
for nested sorting. - New function
epid_lengths()
to show the requiredcase_length
orrecurrence_length
for an analyses. Useful in confirming the requiredcase_length
orrecurrence_length
for episode tracking. - New function
epid_windows()
. Shows the period adate
will overlap with given particularcase_lengths
orrecurrence_lengths
. Useful in confirming the requiredcase_length
orrecurrence_length
for episode tracking. - New argument -
strata
inlinks()
. Useful for stratified data linkage. As in stratified episode tracking, a record with a missingstrata
(NA_character_
) is skipped from data linkage. - New argument -
data_links
inlinks()
. Unlink record groups that do not include records from certain data sources - New convenience functions
listr()
. Formatatomic
vectors as a written list.combns()
. An extension ofcombn
to generate permutations not ordinarily captured bycombn
.
- New
iteration
slot forpid
andepid
objects - New
overlap_method
-reverse()
Changes
number_line()
-l
andr
must have the same length or be1
.episodes()
-case_nm
differentiates between duplicates of"Case"
("Duplicate_C"
) and"Recurrent"
events ("Duplicate_R"
).- Strata and episode-level options for most arguments. This gives greater flexibility within the same instance of
episodes()
.- Episode-level - The behaviour for each episode is determined by the corresponding option for its index event (
"Case"
).episode_type
- simultaneously track both"fixed"
and"rolling"
episodes.skip_if_b4_lengths
- simultaneously track episodes where events before a cut-off range are both skipped and not skipped.episode_unit
- simultaneously track episodes by different units of time.case_for_recurrence
- simultaneously track"rolling"
episodes with and without an additional case window for recurrent events.recurrence_from_last
- simultaneously track"rolling"
episodes with reference windows calculated from the first and last event of the previous window.
- Strata-level - The behaviour for each episode is determined by the corresponding option for its
strata
. Options must be the same in each strata.from_last
- simultaneously track episodes in both directions of time - past to present and present to past.episodes_max
- simultaneously track different number of episodes within the dataset.
- Episode-level - The behaviour for each episode is determined by the corresponding option for its index event (
include_overlap_method
-"overlap"
and"none"
will not be combined with other methods."overlap"
- mutually inclusive with the other methods, so their inclusion is not necessary."none"
- mutually exclusive and prioritised over the other methods (including"none"
), so their inclusion is not necessary.
- Events can now have missing cut-off points (
NA_real_
) or periods (number_line(NA_real_, NA_real_)
)case_length
andrecurrence_length
. This ensures that the event does not become an index case however, it can still be part of different episode. For reference, an event with a missingstrata
(NA_character_
) ensures that the event does not become an index case nor part of any episode.
Bug fixes
fixed_episodes
,rolling_episodes
andepisode_group
-include_index_period
didn't work in certain situations. Corrected.fixed_episodes
,rolling_episodes
andepisode_group
-dist_from_wind
was wrong in certain situations. Corrected.
v0.1.0
##New features
record_group()
-strata
argument. Perform record grouping separately within subsets of a dataset.overlap()
,compress_number_line()
,fixed_sepisodes()
,rolling_episodes()
andepisode_group()
-overlap_methods
andmethods
arguments replacesoverlap_method
andmethod
respectively. Use different sets of methods within the same dataset when grouping episodes or collapsingnumber_line
objects.overlap_method
andmethod
only permits 1 method per per dataset.epid
objects -win_nm
slot. Shows the type of window each event belongs to i.e. case or recurrence windowepid
objects -win_id
slot. Unique ID for each window. The ID is thesn
of the reference event for each window- Format of
epid
objects updated to reflect this
- Format of
epid
objects -dist_from_wind
slot. Shows the duration of each event from its window's reference eventepid
objects -dist_from_epid
slot. Shows the duration of each event from its episode's reference eventepisode_group()
androlling_episodes()
-recurrence_from_last
argument. Determine if reference events should be the first or last event from the previous window.episode_group()
androlling_episodes()
-case_for_recurrence
argument. Determine if recurrent events should have their own case windows or not.episode_group()
,fixed_episodes()
androlling_episodes()
-data_links
argument. Ungroup episodes that do not include records from certaindata_source(s)
.episode_group()
,fixed_episodes()
androlling_episodes()
-case_length
andrecurrence_length
arguments. You can now use a range (number_line
object).episode_group()
,fixed_episodes()
androlling_episodes()
-case_length
andrecurrence_length
arguments. You can now use a range (number_line
object).episode_group()
,fixed_episodes()
androlling_episodes()
-include_index_period
argument. IfTRUE
, overlaps with the index event or period are groupped together even if they are outside the cut-off range (case_length
orrecurrence_length
).pid
objects -link_id
slot. Shows the record (sn
slot) to which every record in the dataset has matched to.invert_number_line()
- Invert theleft
and/orright
points to the opposite end of the number lineleft_point(x)<-
,right_point(x)<-
,start_point(x)<-
andend_point(x)<-
accessor functions
##Changes
overlap()
renamed tooverlaps()
.overlap()
is now a convenienceoverlap_method
for ANY kind of overlap"none"
is another convenienceoverlap_method
for NO kind of overlapexpand_number_line()
- new options forpoint
;"left"
and"right"
compress_number_line()
- compressednumber_line
object inherits the direction of the widestnumber_line
among overlapping group ofnumber_line
objectsoverlap_methods
- have been changed such that each pair ofnumber_line
objects can only overlap in one way. E.g."chain"
and"aligns_end"
used to be possible but this is now considered a"chain"
overlap only"aligns_start"
and"aligns_end"
use to be possible but this is now considered an"exact"
overlap
number_line_sequence()
- Output is now alist
.number_line_sequence()
- now works across multiplenumber_line
objects.to_df()
- can now changenumber_line
objects to data.frames.to_s4()
can do the reverse.
epid
objects are the default outputs forfixed_episodes()
,rolling_episodes()
andepisode_group()
pid
objects are the default outputs forrecord_group()
- In episode grouping, the
case_nm
for events that were skipped due torolls_max
orepisodes_max
is now"Skipped"
. - In
episode_group()
andrecord_group()
,sn
can be negative numbers but must still be unique - Optimised
episode_group()
andrecord_group()
. Runs just a little bit faster ... - Relaxed the requirement for
x
andy
to have the same lengths in overlap functions.- The behaviour of overlap functions will now be the same as that of standard R logical tests
episode_group
-case_length
andrecurrence_length
arguments. Now accepts negative numbers.- negative "lengths" will collapse two periods into one, if the second one is within some days before the
end_point()
of the first period.- if the "lengths" are larger than the
number_line_width()
, both will be collapsed if the second one is within some days (or any otherepisode_unit
) before thestart_point()
of the first period.
- if the "lengths" are larger than the
- negative "lengths" will collapse two periods into one, if the second one is within some days before the
- cheat sheet updated