Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new cpp functions #28

Closed
Nowosad opened this issue Jul 13, 2021 · 2 comments
Closed

Adding new cpp functions #28

Nowosad opened this issue Jul 13, 2021 · 2 comments

Comments

@Nowosad
Copy link
Contributor

Nowosad commented Jul 13, 2021

Dear @HajkD,

I am a regular user of philentropy as I find it very useful.
The package is usually very fast, but I have some cases when the current speed is not enough. Therefore, I added some new Rcpp functions:

  1. single_distance - to calculate a distance between two vectors without the need to rbind them first
  2. dist_one_to_many() - to calculate distances between one vector and many vectors (in the form of a matrix) of other values
  3. dist_many_to_many() - to calculate distances between many vectors (a matrix) and many vectors (a matrix)

There are two main goals of the changes:

  1. To allow easier (more customizable) selection of distances (single_distance)
  2. To speed up one-to-many and many-to-many calculations

You can see an early draft of my proposed changes at #27. If you like the overall idea, then I will implement the rest of the metrics in a similar fashion.

I have also benchmarked the new functions on a simple problem of comparing many distances.

f1() uses the existing distance() function - it is slower than the rest, but it allows to select distance measure easily. f2() uses a hard-coded distance measure - it is fast, but hard to customize. f3() is slightly slower than f2(), but much faster than f1() and allows to select distance measure. f4() is much faster than the rest of the functions and it is easy to customize.

Let me know what you think about this idea.

All the best,
Jakub

@Nowosad
Copy link
Contributor Author

Nowosad commented Jul 14, 2021

I have updated the benchmark. f5() uses dist_many_many() function - it could be related to #22.

devtools::load_all("~/Software/philentropy/")
#> ℹ Loading philentropy
set.seed(2021-07-14)
m = matrix(sample(1:10, size = 1000, replace = TRUE), ncol = 10)

f1 = function(){
  r = vector(length = nrow(m))
  for (i in seq_len(nrow(m))){
    dist_sum = 0
    dist_count = 0
    for (j in seq_len(nrow(m))){
      x = rbind(m[i, ], m[j, ])
      dist_sum = dist_sum + distance(x, method = "euclidean", mute.message = TRUE)
      dist_count = dist_count + 1
    }
    r[i] = dist_sum / dist_count
  }
  r
}

f2 = function(){
  r = vector(length = nrow(m))
  for (i in seq_len(nrow(m))){
    dist_sum = 0
    dist_count = 0
    for (j in seq_len(nrow(m))){
      dist_sum = dist_sum + euclidean(m[i, ], m[j, ], FALSE)
      dist_count = dist_count + 1
    }
    r[i] = dist_sum / dist_count
  }
  r
}

f3 = function(){
  r = vector(length = nrow(m))
  for (i in seq_len(nrow(m))){
    dist_sum = 0
    dist_count = 0
    for (j in seq_len(nrow(m))){
      dist_sum = dist_sum + single_distance(m[i, ], m[j, ], "euclidean", FALSE, "")
      dist_count = dist_count + 1
    }
    r[i] = dist_sum / dist_count
  }
  r
}

f4 = function(){
  r = vector(length = nrow(m))
  for (i in seq_len(nrow(m))){
    dists = dist_one_many(m[i, ], m, "euclidean", FALSE, "")
    r[i] = mean(dists)
  }
  r
}

f5 = function(){
  dists = dist_many_many(m, m, "euclidean", FALSE, "")
  r = rowMeans(dists)
  r
}

bench::mark(f1(), f2(), f3(), f4(), f5())
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 5 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 f1()       297.39ms 306.42ms      3.26   39.27MB     8.16
#> 2 f2()        57.12ms  67.73ms     14.5    24.39MB    23.6 
#> 3 f3()        74.83ms  78.22ms     12.2    24.39MB    19.2 
#> 4 f4()         9.46ms   9.98ms     93.1     1.15MB    11.9 
#> 5 f5()         6.22ms   6.81ms    138.    121.14KB     9.98

Created on 2021-07-14 by the reprex package (v2.0.0)

@HajkD
Copy link
Member

HajkD commented Aug 18, 2021

Hi @Nowosad

I am very happy with the proposed changes and will now merge them into the master branch. Your help with improving philentropy is greatly appreciated!

Many thanks,
Hajk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants