Skip to content

Commit

Permalink
descr
Browse files Browse the repository at this point in the history
  • Loading branch information
JonasRieger committed Aug 26, 2020
1 parent 889f93f commit e0fb1fb
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 10 deletions.
28 changes: 21 additions & 7 deletions R/rboTopics.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,28 @@
#'
#' @details
#' The RBO Similarity for two topics \eqn{\bm z_{i}} and \eqn{\bm z_{j}}
#' is calculated by ...
#' is calculated by
#' \deqn{RBO(\bm z_{i}, \bm z_{j} \mid k, p) = 2p^k\frac{\left|Z_{i}^{(k)} \cap Z_{j}^{(k)}\right|}{\left|Z_{i}^{(k)}\right| + \left|Z_{j}^{(k)}\right|} + \frac{1-p}{p} \sum_{d=1}^k 2 p^d\frac{\left|Z_{i}^{(d)} \cap Z_{j}^{(d)}\right|}{\left|Z_{i}^{(d)}\right| + \left|Z_{j}^{(d)}\right|}}
#' with \eqn{Z_{i}^{(d)}} is the vocabulary set of topic \eqn{\bm z_{i}} down to
#' rank \eqn{d}. Ties in ranks are resolved by taking the minimum.
#'
#' The value \code{wordsconsidered} describes the number of words per topic
#' ranked at rank \eqn{k} or above.
#'
#' @references Webber, William, Alistair Moffat and Justin Zobel (2010).
#' "A similarity measure for indefinite rankings".
#' In: \emph{ACM Transations on Information Systems} 28(4), p.20:1–-20:38,
#' DOI 10.1145/1852102.1852106,
#' URL \url{http://doi.acm.org/10.1145/1852102.1852106}
#'
#' @family TopicSimilarity functions
#'
#' @param topics [\code{named matrix}]\cr
#' The counts of vocabularies/words (row wise) in topics (column wise).
#' @param k tba
#' @param p tba
#' @param k [\code{integer(1)}]\cr
#' Maximum depth for evaluation. Words down to this rank are considered for the calculation of similarities.
#' @param p [0,1]\cr
#' Weighting parameter. Higher values emphasizes top ranked words.
#' @param progress [\code{logical(1)}]\cr
#' Should a nice progress bar be shown? Turning it off, could lead to significantly
#' faster calculation. Default is \code{TRUE}.
Expand Down Expand Up @@ -72,7 +86,7 @@ rboTopics.parallel = function(topics, k, p, pm.backend, ncpus){
tmp1 = ranks[,i] < d+1
tmp2 = ranks[,(i+1):N] < d+1
2 * colSums(tmp1&tmp2) / (sum(tmp1)+colSums(tmp2)) * p^d
# colSums(tmp1&tmp2) / colSums(tmp1|tmp2) * p^d
# colSums(tmp1&tmp2) / colSums(tmp1|tmp2) * p^d ## would be Jaccard instead of overlap
}))
tmp[k,] + colSums(tmp) * (1-p) / p
})
Expand All @@ -95,7 +109,7 @@ rboTopics.parallel = function(topics, k, p, pm.backend, ncpus){
tmp1 = ranks[,N-1] < d+1
tmp2 = ranks[,N] < d+1
2 * sum(tmp1&tmp2) / (sum(tmp1)+sum(tmp2)) * p^d
# sum(tmp1&tmp2) / sum(tmp1|tmp2) * p^d
# sum(tmp1&tmp2) / sum(tmp1|tmp2) * p^d ## would be Jaccard instead of overlap
})
sims[lower.tri(sims)] = c(unlist(rearrangedlist), tmp[k] + sum(tmp) * (1-p) / p)

Expand All @@ -121,7 +135,7 @@ rboTopics.serial = function(topics, k, p, progress = TRUE){
tmp1 = ranks[,i] < d+1
tmp2 = ranks[,(i+1):N] < d+1
2 * colSums(tmp1&tmp2) / (sum(tmp1)+colSums(tmp2)) * p^d
# colSums(tmp1&tmp2) / colSums(tmp1|tmp2) * p^d
# colSums(tmp1&tmp2) / colSums(tmp1|tmp2) * p^d ## would be Jaccard instead of overlap
}))
sims[(i+1):N,i] = tmp[k,] + colSums(tmp) * (1-p) / p
pb$tick()
Expand All @@ -130,7 +144,7 @@ rboTopics.serial = function(topics, k, p, progress = TRUE){
tmp1 = ranks[,N-1] < d+1
tmp2 = ranks[,N] < d+1
2 * sum(tmp1&tmp2) / (sum(tmp1)+sum(tmp2)) * p^d
# sum(tmp1&tmp2) / sum(tmp1|tmp2) * p^d
# sum(tmp1&tmp2) / sum(tmp1|tmp2) * p^d ## would be Jaccard instead of overlap
})
sims[N, N-1] = tmp[k] + sum(tmp) * (1-p) / p
pb$tick()
Expand Down
21 changes: 18 additions & 3 deletions man/rboTopics.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit e0fb1fb

Please sign in to comment.