-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new cpp functions #28
Comments
I have updated the benchmark. devtools::load_all("~/Software/philentropy/")
#> ℹ Loading philentropy
set.seed(2021-07-14)
m = matrix(sample(1:10, size = 1000, replace = TRUE), ncol = 10)
f1 = function(){
r = vector(length = nrow(m))
for (i in seq_len(nrow(m))){
dist_sum = 0
dist_count = 0
for (j in seq_len(nrow(m))){
x = rbind(m[i, ], m[j, ])
dist_sum = dist_sum + distance(x, method = "euclidean", mute.message = TRUE)
dist_count = dist_count + 1
}
r[i] = dist_sum / dist_count
}
r
}
f2 = function(){
r = vector(length = nrow(m))
for (i in seq_len(nrow(m))){
dist_sum = 0
dist_count = 0
for (j in seq_len(nrow(m))){
dist_sum = dist_sum + euclidean(m[i, ], m[j, ], FALSE)
dist_count = dist_count + 1
}
r[i] = dist_sum / dist_count
}
r
}
f3 = function(){
r = vector(length = nrow(m))
for (i in seq_len(nrow(m))){
dist_sum = 0
dist_count = 0
for (j in seq_len(nrow(m))){
dist_sum = dist_sum + single_distance(m[i, ], m[j, ], "euclidean", FALSE, "")
dist_count = dist_count + 1
}
r[i] = dist_sum / dist_count
}
r
}
f4 = function(){
r = vector(length = nrow(m))
for (i in seq_len(nrow(m))){
dists = dist_one_many(m[i, ], m, "euclidean", FALSE, "")
r[i] = mean(dists)
}
r
}
f5 = function(){
dists = dist_many_many(m, m, "euclidean", FALSE, "")
r = rowMeans(dists)
r
}
bench::mark(f1(), f2(), f3(), f4(), f5())
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 5 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 f1() 297.39ms 306.42ms 3.26 39.27MB 8.16
#> 2 f2() 57.12ms 67.73ms 14.5 24.39MB 23.6
#> 3 f3() 74.83ms 78.22ms 12.2 24.39MB 19.2
#> 4 f4() 9.46ms 9.98ms 93.1 1.15MB 11.9
#> 5 f5() 6.22ms 6.81ms 138. 121.14KB 9.98 Created on 2021-07-14 by the reprex package (v2.0.0) |
Hi @Nowosad I am very happy with the proposed changes and will now merge them into the master branch. Your help with improving Many thanks, |
This was referenced Aug 18, 2021
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear @HajkD,
I am a regular user of philentropy as I find it very useful.
The package is usually very fast, but I have some cases when the current speed is not enough. Therefore, I added some new Rcpp functions:
single_distance
- to calculate a distance between two vectors without the need torbind
them firstdist_one_to_many()
- to calculate distances between one vector and many vectors (in the form of a matrix) of other valuesdist_many_to_many()
- to calculate distances between many vectors (a matrix) and many vectors (a matrix)There are two main goals of the changes:
single_distance
)You can see an early draft of my proposed changes at #27. If you like the overall idea, then I will implement the rest of the metrics in a similar fashion.
I have also benchmarked the new functions on a simple problem of comparing many distances.
f1()
uses the existingdistance()
function - it is slower than the rest, but it allows to select distance measure easily.f2()
uses a hard-coded distance measure - it is fast, but hard to customize.f3()
is slightly slower thanf2()
, but much faster thanf1()
and allows to select distance measure.f4()
is much faster than the rest of the functions and it is easy to customize.Let me know what you think about this idea.
All the best,
Jakub
The text was updated successfully, but these errors were encountered: