Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upMerge values not taking most frequent string #5
Comments
|
Solution was to add this function to // Given a CharacterVector, return the string that appears most frequently.
// Ties are determined by the string that appears first alphabetically.
// [[Rcpp::export]]
String most_freq_str(CharacterVector x) {
IntegerVector x_tab = table(x);
CharacterVector tab_names = x_tab.attr("names");
return(tab_names[which_max(x_tab)]);
}The new function gets sourced during both Going back to the example: refinr::key_collision_merge(c("cat bike", "bike cat", "bike cat"))
#> "bike cat" "bike cat" "bike cat" |
Seeing a bug in which the edit value assigned to a cluster is not the most frequent string in that cluster. Example:
I think the issue is within this line from file
key_collision_merge_funcs.cpp:// Get the string that appears most often in curr_vect. String most_freq_string = curr_vect[which_max(table(curr_vect))];I think the solution is to apply
.sort()tocurr_vectprior to callingtableon it. Need to do some more testing.