Skip to content

Releases: OlisaNsonwu/diyar

v.0.5.1

20 Nov 10:14
Compare
Choose a tag to compare

Version 0.5.1

New features

Changes

Bug fixes

  • links() - Incorrect results in some situations. Resolved.
  • links_af_probabilistic() - Failed in some situations. Resolved.

v.0.5.0

06 Nov 10:53
Compare
Choose a tag to compare

Version 0.5.0

New features

  • New option ("semi") for the batched argument in links(). All
    matches are compared against the record-set in the next iteration.
    Therefore, the number of record-pairs increase exponentially as new
    matches are found. This means fewer record-pairs (memory usage) but a
    longer run time compared to the "no" option. Conversely, it leads to
    more record-pairs (memory usage) but a shorter run time compared to
    the "yes" option.
  • New argument (batched) in episodes()
  • New argument (split) in episodes(). Split the analysis in
    N-splits of strata. This leads to fewer record-pairs (and memory
    usage) but a longer run time.
  • New argument (decode) in as.data.frame.pid(),
    as.data.frame.epid() and as.data.frame.pane()
  • New function - episodes_af_shift(). A more vectorised approach to
    episodes() based on epidm::group_time().
  • New function - links_wf_episodes(). Implantation of episodes()
    using links().

Changes

  • Optimised episodes() and links(). Each iteration now uses less
    time and memory.
  • link_id slot in pid objects is now a list.
  • links() - records with missing values in a sub_criteria are now
    skipped at the corresponding iteration.
  • Updated argument in links()- recursive. This now takes any of
    three options [c("linked", "unlinked", "none")] .
    [c("linked", "unlinked")] collectively were previously [TRUE],
    while ["none"] was previously [FALSE].
  • as.epids() now calls make_episodes().
  • The default value for the window argument in partitions() is now
    NULL
  • as.data.frame() and as.data.list() now only creates
    elements/fields from non-empty fields
  • id and gid slots in number_line objects are now integer(0) by
    default.
  • episode_group(), record_group() and range_match_legacy() have
    been removed.
  • ["recurisve"] episodes from episodes() are now presented as
    ["rolling"] episodes with reference_event = "all_records" i.e
    • Old syntax ~ episodes(..., episode_type == "recursive")
    • New syntax ~ episodes(..., episode_type == "rolling", reference_event = "all_records")

Bug fixes

  • When recursive was TRUE, links() ended prematurely and therefore
    missed some matches. Resolved.
  • recurrence_sub_criteria in episodes() was not implemented
    correctly and lead to incorrect linkage result in some instances.
    Resolved.
  • overlap_method() - logical tests recycled incorrectly. Resolved.
  • check_links argument - Option "g" implemented as option "l".
    Resolved.
  • make_pairs_wf_source(). Created incorrect pairs. Resolved.
  • case_sub_criteria and recurrence_sub_criteria in episodes() led
    to incorrect results. Resolved.

v.0.4.3

25 Dec 21:08
Compare
Choose a tag to compare

Version 0.4.3

New features

Changes

Bug fixes

  • case_sub_criteria and recurrence_sub_criteria in episodes()
    led to incorrect results. Resolved.

0.4.2

21 Dec 11:07
Compare
Choose a tag to compare

New features

  • New argument in merge_ids() - shrink and expand.
  • New S3 method for class ‘d_report’ - plot.
  • New S3 method for class ‘sub_criteria’ - format.
  • New function - true(). Predefined logical test for use with
    sub_criteria().
  • New function - false(). Predefined logical test for use with
    sub_criteria().
  • New argument in links()- batched. Specify if all record pairs
    are created or compared at once ("no") or in batches ("yes").
  • New argument in links()- repeats_allowed. Specify if
    the record pairs with duplicate elements should be created.
  • New argument in links()- permutations_allowed. Specify if
    permutations of the same record pair should be created.
  • New argument in links()- ignore_same_source. Specify if
    record pairs from different datasets should be created.
  • New argument in eval_sub_criteria()- depth. First order of
    recursion.
  • New function - sets() and make_sets(). Create permutations of
    record sets.

Changes

  • links() - When shrink is TRUE, records in a record group must
    meet every listed match criteria and sub_criteria. For example,
    if pid_cri is 3, then the record must have met matched another on
    the first three match criteria.
  • links() - pid@iteration now tracks when a record was dealt with
    instead of when it was assigned to a record group. For example, a
    record can be closed (matched or not matched) at iteration 1 but
    assigned to a record group at iteration 5.
  • make_pairs() - x.* and y.* values in the output are now
    swapped.
  • sub_criteria can now export any data created by match_func. To
    do this, match_func must export a list, where the first element
    is a logical object. See the example below.
library(diyar)
val <- rep(month.abb[1:5], 2); val
#>  [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
match_and_export <- function(x, y){
  output <- list(x == y, 
                 data.frame(x_val = x, y_val = y, is_match = x == y))
  return(output)
}
sub.cri.1 <- sub_criteria(
  val, match_funcs = list(match.export = match_and_export)
)

format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#>  [1] 1 0 0 0 0 1 0 0 0 0
#> 
#> $mf.0.1
#>    x_val y_val is_match
#> 1    Jan   Jan     TRUE
#> 2    Feb   Jan    FALSE
#> 3    Mar   Jan    FALSE
#> 4    Apr   Jan    FALSE
#> 5    May   Jan    FALSE
#> 6    Jan   Jan     TRUE
#> 7    Feb   Jan    FALSE
#> 8    Mar   Jan    FALSE
#> 9    Apr   Jan    FALSE
#> 10   May   Jan    FALSE
  • links can now export any data created within a sub_criteria. To
    do this, the sub_criteria must be created as described above. See
    an example below
val <- 1:5
diff_one_and_export <- function(x, y){
  diff <- x - y
  is_match <- diff <= 1
  output <- list(is_match, 
                 data.frame(x_val = x, y_val = y, diff = diff,  is_match = is_match))
  return(output)
}
sub.cri.2 <- sub_criteria(
  val, match_funcs = list(diff.export = diff_one_and_export)
)
links(
  criteria = "place_holder", 
  sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (No hits)"
#> 
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#>   x_val y_val diff is_match
#> 1     5     1    4    FALSE
#> 2     4     1    3    FALSE
#> 3     3     1    2    FALSE
#> 4     2     1    1     TRUE
#> 5     1     1    0     TRUE
#> 
#> 
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#>   x_val y_val diff is_match
#> 1     5     3    2    FALSE
#> 2     4     3    1     TRUE
#> 3     3     3    0     TRUE
#> 
#> 
#> $export$cri.1$iteration.3
#> $export$cri.1$iteration.3$mf.0.1
#>   x_val y_val diff is_match
#> 1     5     5    0     TRUE

Bug fixes

  • summary.epid() - Incorrect count for ‘by episode type’.
    Resolved.
  • episodes() - Incorrect results in some instances with
    skip_order. Resolved.
  • make_ids() - Did not capture all records in that should be in a
    record-group when matches are recursive. Resolved.
  • make_pairs() - Incorrect record-pairs in some instances. Resolved.
  • eval_sub_criteria() - When output of match_func is length one,
    it’s not recycled. Resolved.
  • reverse_number_line() - Incorrect results in some instances.
    Resolved.
  • links()- Incorrect iteration (pids slot) for non-matches.
    Resolved.
  • links() and episodes() - Timing for each iteration was
    incorrect. Resolved.

v.0.4.1

10 Dec 22:44
Compare
Choose a tag to compare

New features

New function - overlap_method_names(). Overlap methods for a
corresponding overlap method codes.

Changes

  • "chain" overlap method split into "x_chain_y" and "y_chain_x".
    "chain" will continue to be supported as a keyword for
    "x_chain_y" OR "y_chain_x" method
  • "across" overlap method split into "x_across_y" and
    "y_across_x". "across" will continue to be supported as a
    keyword for "x_across_y" OR "y_across_x" methods
  • "inbetween" overlap method split into "x_inbetween_y" and
    "y_inbetween_x". "inbetween" will continue to be supported as a
    keyword for "x_inbetween_y" OR "y_inbetween_x" methods
  • Optimised overlaps().
  • Changed overlap method codes. Please review any previously specified
    codes with overlap_method_names().

Bug fixes

  • make_batch_pairs() (internal) created invalid record pairs.
    Resolved.

v.0.4.0

01 Dec 22:20
Compare
Choose a tag to compare

New features

  • New function - reframe(). Modify the attributes of a
    sub_criteria object.
  • New function - link_records(). Record linkage by creating all
    record pairs as opposed to batches as with link().
  • New function - make_pairs(). Create every combination of
    records-pairs for a given dataset.
  • New function - make_pairs_wf_source(). Create records-pairs from
    different sources only.
  • New function - make_ids(). Convert an edge list to a group
    identifier.
  • New function - merge_ids(). Merge two group identifiers.
  • New function - attrs(). Pass a set of attributes to one instance
    of match_funcs or equal_funcs.

Changes

  • Optimised episodes_wf_splits()
  • Optimised episodes() and links(). Reduced processing times.
  • Three new options for the display argument.
    "progress_with_report", "stats_with_report" and
    "none_with_report". Creates a d_report; a status of the analysis
    over its run time.
  • eval_sub_criteria(). Record-pairs are no longer created in the
    function. Therefore, index_record and sn arguments have been
    replaced with x_pos and y_pos.
  • link_records() and links_wf_probabilistic(). The cmp_threshold
    argument has been renamed to attr_threshold.
  • show_labels argument in schema(). Two new options - "wind_nm"
    and "length" to replace "length_label".

Bug fixes

  • Incorrect wind_id list in episodes() when data_link is used.
    Resolved.
  • Incorrect link_id in links() when recursive is used. Resolved.
  • iteration not recorded in some situations with episodes().
    Resolved.
  • skip_order ends an open episode. Resolved.
  • NA in dist_wind_index and dist_epid_index when sn is
    supplied. Resolved.
  • overlap_method_codes() - overlap method codes not recycled
    properly. Resolved.

v.0.3.1

19 Aug 18:38
Compare
Choose a tag to compare

New features

  • New function - delink(). Unlink identifiers.
  • New function - episodes_wf_splits(). Wrapper function of
    episodes() for better optimised handling of duplicates records.
  • New function - combi(). Numeric codes for unique combination of
    vectors.
  • New function - attr_eval(). Recursive evaluation of a function on
    each attribute of a sub_criteria.

Changes

  • Two new case_nm values - Case_CR and Recurrence_CR which are
    Case and Recurrence without a sub-criteria match.

Bug fixes

  • Corrected length arrows in schema.epid.
  • Corrected outcome of eval_sub_criteria with 1 result.

v.0.3.0

29 Apr 06:39
Compare
Choose a tag to compare

New features

  • New function - links_wf_probabilistic(). Probabilistic record
    linkage.
  • New function - partitions(). Spilt events into sections in time.
  • New function - schema(). Plot schema diagrams for pid, epid,
    pane and number_line objects.
  • New functions - encode() and decode(). Encoding and decoding
    slots values to minimise memory usage.
  • New argument - case_sub_criteria and recurrence_sub_criteria in
    episodes(). Additional matching conditions for temporal links.
  • New argument - case_length_total and recurrence_length_total in
    episodes(). Number of temporal links required for a
    window/episode.
  • New argument - recursive in links(). Control if matches can
    spawn new matches.
  • New argument - check_duplicates in links(). Control the checking
    of logical tests on duplicate values. If FALSE, results are
    recycled for the duplicates.
  • as.data.frame and as.list for the pid, number_line, epid,
    pane objects.
  • A new type of episode - “recursive” episodes.
  • recurrence_from_last renamed to reference_event and given two
    new options.
  • Optimised episodes() and links(). Speed improvements.

Changes

  • Default time zone for an epid_interval or pane_interval with
    POSIXct objects is now “GMT”.
  • number_line_sequence() - splits number_line objects. Also
    available as a seq method.
  • epid_total, pid_total and pane_total slots are populated by
    default. No need to used group_stats to get these.
  • to_df() - Removed. Use as.data.frame() instead.
  • to_s4() - Now an internal function. It’s no longer exported.
  • compress_number_line() - Now an internal function. It’s no longer
    exported. Use episodes() instead.
  • sub_criteria() - produces a sub_criteria object. Nested “AND”
    and “OR” conditions are now possible.
  • case_overlap_methods, recurrence_overlap_methods and
    overlap_methods now take integer codes for different
    combinations of overlap methods. See overlap_methods$options for
    the full list. character inputs are still supported.

Bug fixes

  • "Single-record" was wrong in links summary output. Resolved.

v.0.2.0

20 Sep 12:08
Compare
Choose a tag to compare

New features

  • Better support for Inf in number_line objects.
  • Can now use multiple case_lengths or recurrence_lengths for the same event.
    • Can now use multiple overlap_methods for the corresponding case_lengths and recurrence_lengths.
  • New function links() to replace record_group().
  • New function sub_criteria(). The new way of supplying a sub_criteria in links().
  • New functions exact_match(), range_match() and range_match_legacy(). Predefined logical tests for use with sub_criteria(). User-defined tests can also be used. See ?sub_criteria.
  • New function custom_sort() for nested sorting.
  • New function epid_lengths() to show the required case_length or recurrence_length for an analyses. Useful in confirming the required case_length or recurrence_length for episode tracking.
  • New function epid_windows(). Shows the period a date will overlap with given particular case_lengths or recurrence_lengths. Useful in confirming the required case_length or recurrence_length for episode tracking.
  • New argument - strata in links(). Useful for stratified data linkage. As in stratified episode tracking, a record with a missing strata (NA_character_) is skipped from data linkage.
  • New argument - data_links in links(). Unlink record groups that do not include records from certain data sources
  • New convenience functions
    • listr(). Format atomic vectors as a written list.
    • combns(). An extension of combn to generate permutations not ordinarily captured by combn.
  • New iteration slot for pid and epid objects
  • New overlap_method - reverse()

Changes

  • number_line() - l and r must have the same length or be 1.
  • episodes() - case_nm differentiates between duplicates of "Case" ("Duplicate_C") and "Recurrent" events ("Duplicate_R").
  • Strata and episode-level options for most arguments. This gives greater flexibility within the same instance of episodes().
    • Episode-level - The behaviour for each episode is determined by the corresponding option for its index event ("Case").
      • episode_type - simultaneously track both "fixed" and "rolling" episodes.
      • skip_if_b4_lengths - simultaneously track episodes where events before a cut-off range are both skipped and not skipped.
      • episode_unit - simultaneously track episodes by different units of time.
      • case_for_recurrence - simultaneously track "rolling" episodes with and without an additional case window for recurrent events.
      • recurrence_from_last - simultaneously track "rolling" episodes with reference windows calculated from the first and last event of the previous window.
    • Strata-level - The behaviour for each episode is determined by the corresponding option for its strata. Options must be the same in each strata.
      • from_last - simultaneously track episodes in both directions of time - past to present and present to past.
      • episodes_max - simultaneously track different number of episodes within the dataset.
  • include_overlap_method - "overlap" and "none" will not be combined with other methods.
    • "overlap" - mutually inclusive with the other methods, so their inclusion is not necessary.
    • "none" - mutually exclusive and prioritised over the other methods (including "none"), so their inclusion is not necessary.
  • Events can now have missing cut-off points (NA_real_) or periods (number_line(NA_real_, NA_real_)) case_length and recurrence_length. This ensures that the event does not become an index case however, it can still be part of different episode. For reference, an event with a missing strata (NA_character_) ensures that the event does not become an index case nor part of any episode.

Bug fixes

  • fixed_episodes, rolling_episodes and episode_group - include_index_period didn't work in certain situations. Corrected.
  • fixed_episodes, rolling_episodes and episode_group - dist_from_wind was wrong in certain situations. Corrected.

v0.1.0

13 Jun 23:27
Compare
Choose a tag to compare

##New features

  • record_group() - strata argument. Perform record grouping separately within subsets of a dataset.
  • overlap(), compress_number_line(), fixed_sepisodes(), rolling_episodes() and episode_group() - overlap_methods and methods arguments replaces overlap_method and method respectively. Use different sets of methods within the same dataset when grouping episodes or collapsing number_line objects. overlap_method and method only permits 1 method per per dataset.
  • epid objects - win_nm slot. Shows the type of window each event belongs to i.e. case or recurrence window
  • epid objects - win_id slot. Unique ID for each window. The ID is the sn of the reference event for each window
    • Format of epid objects updated to reflect this
  • epid objects - dist_from_wind slot. Shows the duration of each event from its window's reference event
  • epid objects - dist_from_epid slot. Shows the duration of each event from its episode's reference event
  • episode_group() and rolling_episodes() - recurrence_from_last argument. Determine if reference events should be the first or last event from the previous window.
  • episode_group() and rolling_episodes() - case_for_recurrence argument. Determine if recurrent events should have their own case windows or not.
  • episode_group(), fixed_episodes() and rolling_episodes() - data_links argument. Ungroup episodes that do not include records from certain data_source(s).
  • episode_group(), fixed_episodes() and rolling_episodes() - case_length and recurrence_length arguments. You can now use a range (number_line object).
  • episode_group(), fixed_episodes() and rolling_episodes() - case_length and recurrence_length arguments. You can now use a range (number_line object).
  • episode_group(), fixed_episodes() and rolling_episodes() - include_index_period argument. If TRUE, overlaps with the index event or period are groupped together even if they are outside the cut-off range (case_length or recurrence_length).
  • pid objects - link_id slot. Shows the record (sn slot) to which every record in the dataset has matched to.
  • invert_number_line() - Invert the left and/or right points to the opposite end of the number line
  • left_point(x)<-, right_point(x)<-, start_point(x)<- and end_point(x)<- accessor functions

##Changes

  • overlap() renamed to overlaps(). overlap() is now a convenience overlap_method for ANY kind of overlap
  • "none" is another convenience overlap_method for NO kind of overlap
  • expand_number_line() - new options for point; "left" and "right"
  • compress_number_line() - compressed number_line object inherits the direction of the widest number_line among overlapping group of number_line objects
  • overlap_methods - have been changed such that each pair of number_line objects can only overlap in one way. E.g.
    • "chain" and "aligns_end" used to be possible but this is now considered a "chain" overlap only
    • "aligns_start" and "aligns_end" use to be possible but this is now considered an "exact" overlap
  • number_line_sequence() - Output is now a list.
  • number_line_sequence() - now works across multiple number_line objects.
  • to_df() - can now change number_line objects to data.frames.
    • to_s4() can do the reverse.
  • epid objects are the default outputs for fixed_episodes(), rolling_episodes() and episode_group()
  • pid objects are the default outputs for record_group()
  • In episode grouping, the case_nm for events that were skipped due to rolls_max or episodes_max is now "Skipped".
  • In episode_group() and record_group(), sn can be negative numbers but must still be unique
  • Optimised episode_group() and record_group(). Runs just a little bit faster ...
  • Relaxed the requirement for x and y to have the same lengths in overlap functions.
    • The behaviour of overlap functions will now be the same as that of standard R logical tests
  • episode_group - case_length and recurrence_length arguments. Now accepts negative numbers.
    • negative "lengths" will collapse two periods into one, if the second one is within some days before the end_point() of the first period.
      • if the "lengths" are larger than the number_line_width(), both will be collapsed if the second one is within some days (or any other episode_unit) before the start_point() of the first period.
  • cheat sheet updated