Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple similarities on the same column #6

Open
hswerdfe opened this issue Apr 14, 2021 · 0 comments
Open

multiple similarities on the same column #6

hswerdfe opened this issue Apr 14, 2021 · 0 comments

Comments

@hswerdfe
Copy link

In some cases people may want multiple similarities type on the same column. This does not seem to be supported well on first glance.

I was using the following function to generate multiple similarities across the same variable, but it seems like the result interacts poorly with score_simsum later in the evaluation. support for multiple similarity scores across the same column would allow techniques like problink_em and other [un]supervised techniques to find the best similarity metric score across multiple columns, which may be different.

#'
#'
compare_pairs_multi <- function(p, 
                                by, 
                                default_comparators = list("lcs" = lcs(), "jw" = jaro_winkler()), 
                                ...){
    bind_cols(
        p,  
        names(default_comparators) %>% 
            map_dfc(function(comp_nm){
                  p %>% compare_pairs(by = by, 
                          default_comparator = default_comparators[[comp_nm]], ...)  %>%
                          select(by) %>% 
                          rename_all(~paste0(.x, "_", comp_nm))
            })
    )
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant