New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Size of Corpus != size of summed up speeches #86
Comments
This is certainly an issue! Thanks for raising it. A new polmineR version on the development branch (v0.7.11.9024) addresses this and removes the bug. devtools::install_github("PolMine/polmineR", ref = "dev") To explain: Two or three updates ago, I substantially reworked the In the "SL"-corpus, this concerns the speakers "Beck", "LawaIl" (typo!) and "Weisweiler", see: library(polmineR)
use("PopParl")
corpus("SL") %>% subset(speaker == "Beck") %>% slot("cpos")
corpus("SL") %>% subset(speaker == "LawaIl") %>% slot("cpos")
corpus("SL") %>% subset(speaker == "Weisweiler") %>% slot("cpos") Not handling this situation adequately is definitely I bug. Using the new polmineR version, the bug is gone. See the following example that is a somewhat simplified version of the instructive example you provided: library(polmineR)
use("PopParl")
corpus("SL") %>% size()
sl_speeches <- corpus("SL") %>% as.speeches(s_attribute_name = "speaker")
merge(sl_speeches) %>% size()
summary(sl_speeches)[["size"]] %>% sum() |
This indeed seems to solve the issue. Thank you for the in-depth explanation which makes a lot of sense. |
I am not sure if that is an actual issue but I observed the following odd behavior in some corpora:
I get the size of a corpus as follows:
size_of_corpus <- size("SL")
If I would split the corpus into speeches and merge the resulting partition bundle again, the size is the same:
`
sl_speeches <- as.speeches("SL", s_attribute_name = "speaker")
size_of_remerged_speeches <- merge(sl_speeches) %>% size()
size_of_corpus == size_of_remerged_speeches
`
The odd part is: If I iterate over the sl_speeches partition bundle and sum up the size of each speech, the resulting sum of sizes is slightly smaller than the actual size of the merged partition bundle.
size_of_summed_up_speeches <- lapply(sl_speeches@objects, function(x) x@size) %>% Reduce("+", .)
The entire sample script:
version: polmineR 0.7.11.9023
The text was updated successfully, but these errors were encountered: