Skip to content

Commit

Permalink
fix file.path problems in tests
Browse files Browse the repository at this point in the history
  • Loading branch information
lkoppers committed Sep 13, 2018
1 parent 0b781b7 commit 741ba10
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 8 deletions.
6 changes: 3 additions & 3 deletions tests/testthat/test_LDAgen.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ counttest <- LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iteratio
csvTest2 <- read.csv(file.path(tempdir(),"lda-result-k3i20b70s24602alpha0.33eta0.33.csv"))
expect_equal(csvTest, csvTest2)

expect_equal(lda1, LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=24601, folder=tempdir(), num.words = 10L, LDA = TRUE))
expect_equal(lda1, LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=24601, folder=tempdir(), num.words = 10L, LDA = FALSE))
expect_equal(lda1, LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=24601, folder=tempdir(), num.words = 10L, LDA = TRUE))
expect_equal(lda1, LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=24601, num.words = 10L, LDA = TRUE))
expect_equal(lda1, LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=24601, num.words = 10L, LDA = FALSE))
expect_equal(lda1, LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=24601, num.words = 10L, LDA = TRUE))
})


Expand Down
4 changes: 2 additions & 2 deletions tests/testthat/test_mergeLDA.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ text <- list(A= "Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiu
text2 <- cleanTexts(text = text)
wordlist <- makeWordlist(text2)
LDAdoc <- LDAprep(text2, wordlist$words)
lda1 <- LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=124601, folder=tempdir(), num.words = 10L, LDA = TRUE)
lda1 <- LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=124601, num.words = 10L, LDA = TRUE)

text2 <- cleanTexts(text = text[c("B", "C", "F", "H")])
wordlist <- makeWordlist(text2)
LDAdoc <- LDAprep(text2, wordlist$words)
lda2 <- LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=124601, folder=tempdir(), num.words = 10L, LDA = TRUE)
lda2 <- LDAgen(documents=LDAdoc, K = 3L, vocab=wordlist$words, num.iterations = 20L, burnin = 70L, seed=124601, num.words = 10L, LDA = TRUE)


## mL1 <- mergeLDA(x=list(lda1=lda1, lda2=lda2))
Expand Down
6 changes: 3 additions & 3 deletions vignettes/Vignette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The current version of the package can be installed with the \texttt{devtools} p
devtools::install_github("DoCMA-TU/tosca")
library(tosca)
```
The actual version on CRAN can be installed with \texttt{instal.packages}.
The actual version on CRAN can be installed with \texttt{install.packages}.
```{r, eval = FALSE}
install.packages("tosca")
library(tosca)
Expand Down Expand Up @@ -372,7 +372,7 @@ load(file.path(tempdir(),"lda-result-k10i200b70s123alpha0.1eta0.1.RData"))
For validation of the LDA result and further analysis, the result is loaded back to the workspace.

## Validation of LDA Results - \texttt{intruderWords}, \texttt{intruderTopics}
For validation of LDA results there are two functions in the package. \texttt{intruderWords} and \texttt{intruderTopics} are extended version of the method Chang et al. present in their paper "Reading Tea Leaves: How Humans Interpret Topic Models". These functions expect user input, the user works like a document labeler. The LDA result is committed by setting \texttt{beta = result\$topics}. From the function \texttt{intruderWords} the labeler gets a set of words. The number of words can be set by \texttt{numOutwords} (default: 5). This set represents one topic. It includes a number of intruders (default: \texttt{numIntruder = 1}), which can also be zero. In general, if the user identifies the intruder(s) correctly this is an identifier for a good topic allocation. You can set options \texttt{numTopwords} (default: 30) to control how many top words of each topic are considered for this validation. In addition it is possible to enable or disable the possibility for the user to mark nonsense topics. By default this option is enabled (\texttt{noTopic = TRUE}). The true intruder can be printed to the console after each step with \texttt{printSolution = TRUE} (default: \texttt{FALSE}).
For validation of LDA results there are two functions in the package. \texttt{intruderWords} and \texttt{intruderTopics} are extended version of the method Chang et al. (2009) present in their paper "Reading Tea Leaves: How Humans Interpret Topic Models". These functions expect user input, the user works like a document labeler. The LDA result is committed by setting \texttt{beta = result\$topics}. From the function \texttt{intruderWords} the labeler gets a set of words. The number of words can be set by \texttt{numOutwords} (default: 5). This set represents one topic. It includes a number of intruders (default: \texttt{numIntruder = 1}), which can also be zero. In general, if the user identifies the intruder(s) correctly this is an identifier for a good topic allocation. You can set options \texttt{numTopwords} (default: 30) to control how many top words of each topic are considered for this validation. In addition it is possible to enable or disable the possibility for the user to mark nonsense topics. By default this option is enabled (\texttt{noTopic = TRUE}). The true intruder can be printed to the console after each step with \texttt{printSolution = TRUE} (default: \texttt{FALSE}).

The LDA result of the example corpus is checked by \texttt{intruderWords} with the number of intruders being either 0 or 1.
```{r, eval = FALSE}
Expand Down Expand Up @@ -570,7 +570,7 @@ LDAresult <- LDAgen(documents = pagesLDA, K = 10L, vocab = words5)
After generating the lda model further analysis depends on the specific aims of the project.

# Conclusion
Uor package tosca is a addition to the existing textmining packages on CRAN. It contains functions for a typical pipeline used for content analysis and uses the implementation of standard preprocessing of existing packages. Additional tosca provides functionality for visual exploration of corpora and topics resulting from the latent Dirichlet allocation. tosca focusses on analysis over time, so it needs texts with a date as meta data. The actual version of the package offers an implementation of Chang's intruder topics and intruder words. For future versions a framework for effective sampling in (sub-) corpora is under preparation. There are plans for a better connection to the frameworks of the tm and the quanteda package.
Our package tosca is an addition to the existing textmining packages on CRAN. It contains functions for a typical pipeline used for content analysis and uses the implementation of standard preprocessing of existing packages. Additionaly tosca provides functionality for visual exploration of corpora and topics resulting from the latent Dirichlet allocation. tosca focusses on analysis over time, so it needs texts with a date as meta data. The actual version of the package offers an implementation of intruder topics and intruder words (Chang et al., 2009). For future versions a framework for effective sampling in (sub-) corpora is under preparation. There are plans for a better connection to the frameworks of the tm and the quanteda package.

```{r, include = FALSE, eval = FALSE}
library(knitr)
Expand Down

0 comments on commit 741ba10

Please sign in to comment.