New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending functions like checkResiduals() and make.heldout() to take the same inputs as stm() #134

Closed
juliasilge opened this Issue Jan 20, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@juliasilge

juliasilge commented Jan 20, 2018

Hello! Thank you so much for your work on this great package; I am so pleased to have such a great package for topic modeling in R.

As I have been developing support for stm in tidytext, one thing I have been having trouble with is the way parameters are passed into the functions used for evaluating models. The main stm() function can take as its parameter a sparse matrix or a quanteda dfm, but the functions like make.heldout() and checkResiduals() cannot. This is a barrier to being able to evaluate models that were trained on either a sparse matrix or a quanteda dfm. What do you think about changing the documents input parameter in these evaluation functions to take the same data structure types as the main training function stm()?

@bstewart

This comment has been minimized.

Owner

bstewart commented Jan 20, 2018

Julia- Thanks so much for the kind words! It was really lovely to see your tweet about stm. The tidytext package is awesome and I love the idea of more cross-platform support. Let me get into this and update it. I think it is a great idea.

@bstewart bstewart self-assigned this Jan 20, 2018

@chris-billingham

This comment has been minimized.

chris-billingham commented Jan 20, 2018

Hi there, I just want to echo the call for interoperability between various other packages as much of the work I do bounces between tm, quanteda and tidytext implementations so anything to reduce the friction would be appreciated. I'm also in the process of putting together a vignette that looks at stm - three ways (if you will), which I'd be happy to share closer to completion.

@bstewart

This comment has been minimized.

Owner

bstewart commented Jan 20, 2018

Thanks Chris! Huge thanks to @patperry and @kbenoit who have done the interoperability with corpus and quanteda respectively.

I'd definitely be interested in seeing your vignette whenever it is ready.

bstewart pushed a commit that referenced this issue Jan 25, 2018

Brandon Stewart
@bstewart

This comment has been minimized.

Owner

bstewart commented Jan 25, 2018

@juliasilge I've started working on this. A pair of functions where this really isn't straightforward is alignCorpus and fitNewDocuments. The issue in both cases is that both need to preserve unused vocabulary indices which is normally not allowed. You didn't mention either of these- but unfortunately I don't see a straightforward way of addressing this without rewriting asSTMCorpus which does most of the conversion work. Is that going to get in your way? If so, any suggestions?

Everything else should be set in the development branch though and I'll get that merged into master after a bit more testing.

@juliasilge

This comment has been minimized.

juliasilge commented Jan 25, 2018

For the current work I'm doing, those functions are not blockers.

Eventually, being able to predict probabilities for a new document would be nice across different kinds of inputs, but that is obviously more complicated, just like you said. I am very satisfied to wait on that, and/or try to help out with what might make sense there.

I am so excited to see these changes!

@bstewart

This comment has been minimized.

Owner

bstewart commented Jan 25, 2018

Great! I'll close this issue out when I've uploaded the version to CRAN.

I'm going to start a new issue to track the longer run goal of figuring out how to do this for those last two functions.

@bstewart

This comment has been minimized.

Owner

bstewart commented Jan 28, 2018

This version is now on CRAN.

@bstewart bstewart closed this Jan 28, 2018

@juliasilge

This comment has been minimized.

juliasilge commented Jan 29, 2018

Thank you so much for working on this so quickly! I am looking forward to using these functions this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment