You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Given a CharacterVector, return the string that appears most frequently.// Ties are determined by the string that appears first alphabetically.// [[Rcpp::export]]
String most_freq_str(CharacterVector x) {
IntegerVector x_tab = table(x);
CharacterVector tab_names = x_tab.attr("names");
return(tab_names[which_max(x_tab)]);
}
The new function gets sourced during both key_collision_merge and n_gram_merge.
Seeing a bug in which the edit value assigned to a cluster is not the most frequent string in that cluster. Example:
I think the issue is within this line from file
key_collision_merge_funcs.cpp
:// Get the string that appears most often in curr_vect. String most_freq_string = curr_vect[which_max(table(curr_vect))];
I think the solution is to apply
.sort()
tocurr_vect
prior to callingtable
on it. Need to do some more testing.The text was updated successfully, but these errors were encountered: