Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for latent semantic analysis #46

Open
BobMuenchen opened this issue Mar 7, 2017 · 0 comments
Open

Adding support for latent semantic analysis #46

BobMuenchen opened this issue Mar 7, 2017 · 0 comments
Labels
feature a feature request or enhancement

Comments

@BobMuenchen
Copy link

I think it would be fairly easy to add support for the lsa package to tidytext and broom. See example below.

# Put some docs in a vector
library("dplyr")
doc1 <- c("pets dog cat ferret")
doc2 <- c("sandwiches turkey ham")
doc3 <- c("cat ferret cat bird")
doc4 <- c("turkey beef sandwiches")
myvector <- c(doc1,doc2,doc3,doc4)
mydf <- data_frame(id = 1:4, text = myvector)

# Create a corpus
library("quanteda")
mycorpus <- corpus(mydf, text_field = "text")
mytokens <- tokens(mycorpus)
mydfm <- dfm(mytokens)

# Perform LSA
mytdm <- convert(mydfm, to = "lsa")
mytdm_weighted = lw_logtf(mytdm) * gw_idf(mytdm)
myLSAspace = lsa(mytdm_weighted, dims=2)

# Here's how broom::augment could add 
# factor scores back to the original data frame
factor_scores <- as_data_frame(myLSAspace$dk)
(augmented <- bind_cols(mydf, factor_scores))

# Here's how tidytext:tidy could tidy the factor loadings
library("tidyverse")
# as.data.frame is used to maintain row names until
# rownames_to_column can get them
loadings_tidy <- as.data.frame(myLSAspace$tk) %>%
  rownames_to_column() %>%
  rename(term = rowname) %>%
  gather(factor, loading, # The new variables.
         starts_with("V"), # These go into "loading".
         -term) %>%  # term is not "gathered".
  arrange(factor, desc(loading)) %>% # Sort
  select(factor, term, loading) # Change var order to enhance readablity.

print(loadings_tidy)
@dgrtwo dgrtwo self-assigned this May 2, 2017
@dgrtwo dgrtwo added this to the v0.1.4 milestone May 2, 2017
@juliasilge juliasilge modified the milestones: v0.1.4, v0.1.5 Sep 29, 2017
@juliasilge juliasilge removed this from the v0.1.5 milestone Apr 3, 2020
@juliasilge juliasilge added the feature a feature request or enhancement label Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants