multiple similarities on the same column #6

hswerdfe · 2021-04-14T13:03:09Z

In some cases people may want multiple similarities type on the same column. This does not seem to be supported well on first glance.

I was using the following function to generate multiple similarities across the same variable, but it seems like the result interacts poorly with score_simsum later in the evaluation. support for multiple similarity scores across the same column would allow techniques like problink_em and other [un]supervised techniques to find the best similarity metric score across multiple columns, which may be different.

#'
#'
compare_pairs_multi <- function(p, 
                                by, 
                                default_comparators = list("lcs" = lcs(), "jw" = jaro_winkler()), 
                                ...){
    bind_cols(
        p,  
        names(default_comparators) %>% 
            map_dfc(function(comp_nm){
                  p %>% compare_pairs(by = by, 
                          default_comparator = default_comparators[[comp_nm]], ...)  %>%
                          select(by) %>% 
                          rename_all(~paste0(.x, "_", comp_nm))
            })
    )
}

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple similarities on the same column #6

multiple similarities on the same column #6

hswerdfe commented Apr 14, 2021

multiple similarities on the same column #6

multiple similarities on the same column #6

Comments

hswerdfe commented Apr 14, 2021