[MRG] Dynamic Topic Models in python. #739

bhargavvader · 2016-06-09T13:42:34Z

A python translation from C of Dynamic Topic Models, as described by Blei at al in this paper, originally written in C here.

Dynamic Topic Models are used to topic model time-series tagged documents such that the topics evolve over each time-slice. This is useful in finding out how certain key words in topics change over time and in finding document similarity between documents far apart in time but contextually similar.

This was my Google Summer of Code 2016 project.

Future Works:

Make the corpus streaming similar to LdaModel
Include Document Influence Model (DIM) mode. Most of the infrastructure for this is in place.
See if LdaPost can be replaced by LdaModel completely without breaking anything.
- in particular, a lot of DIM depends on LdaPost being in place.
Heavy lifting going on in the sslm class - efforts can be made to cythonise mathematical methods.
- in particular, update_obs and the optimization takes a lot time.
Try and make it distributed, especially around the E and M step.
Get rid of all C/C++ coding styles if left behind.

tmylk · 2016-06-21T23:16:53Z

Could you please update it to be more up to date with your private fork?

bhargavvader · 2016-06-30T13:07:08Z

@tmylk all the methods are more or less done. I still have to see scipy implementations of the gsl methods and replace them, as well as add tests for the entire module.

bhargavvader · 2016-07-04T07:22:54Z

Note: as of now, the classes and methods are not well arranged, and there are a few mock classes (which will be removed) to help me with testing. Once all the testing is done and a crude DTM is fit, I will rearrange the structure of the DTM code.

tmylk · 2016-08-03T13:53:17Z

gensim/models/ldaseqmodel.py

+import sys
+
+# this is a mock LDA class to help with testing until this is figured out
+class mockLDA(utils.SaveLoad):


should the mock class be in test folder rather than in prod file?

bhargavvader · 2016-08-04T18:49:28Z

Cleaned up code, added docstrings so that it is more reviewable now. Made changes to many core classes so most of the tests would be failing; will fix them and further clean tomorrow.

tmylk · 2016-08-15T15:16:26Z

gensim/models/ldaseqmodel.py

+        totals = numpy.zeros(counts.shape[1])
+
+        for w in range(0, W):
+            self.variance[w], self.fwd_variance[w] = self.compute_post_variance(w, self.chain_variance)


zip* and list comprehension is easier to read

tmylk · 2016-08-16T10:54:32Z

gensim/models/ldaseqmodel.py

+
+    1) Include DIM mode. Most of the infrastructure for this is in place.
+    2) See if LdaPost can be replaces by LdaModel completely without breakign anything.
+    3) Heavy lifting going on in the sslm class - efforts can be made to cythonise mathematical methods.


how much time is spent there according to line profiler? is it really a perf bottleneck?

fit_sslm takes up majority of the time with update_obs being particularly slow.

bhargavvader · 2016-08-16T19:02:16Z

@tmylk , @piskvorky , I've added a tutorial notebook and tests to accompany the code.
Could you do a review so I can start working on changes?

tmylk · 2016-08-17T04:48:49Z

@piskvorky I think this is ready for your review. (Though I still have some comments about the code and notebook to discuss with Bhargav today.)

piskvorky · 2016-08-18T04:59:18Z

OK, I'll have a look, probably this weekend. Thanks @bhargavvader !

One thing I see immediately is that this PR needs a better description. When people come here from the CHANGELOG/web/google/blog etc, they shouldn't be greeted with "This is a very, very rough draft". Add some high-level motivational/solution explanation section to the description (possibly taken from the notebook, if it's there).

bhargavvader · 2016-08-18T11:16:30Z

@piskvorky , haha, yes. I've changed the title and description with a brief explanation and TODO for further works.

piskvorky · 2016-08-19T00:11:12Z

@tmylk I see a lot of TODOs in the description, and a request for review. Was this really meant to be merged?

bhargavvader · 2016-08-19T04:55:51Z

@piskvorky , the TODOs in the description are more of further works to improve the code- things like improving performance and making it distributed. I've changed it from TODO to Future Works to reflect this.
I am opening another PR (#831) to work on the tutorial notebook, and on incorporating suggestions.

bhargavvader added 5 commits June 9, 2016 19:09

DTM sample classes, helper methods

fe9367c

Formatting

2a23639

sslm_init

2574a4b

Finished init_lda_from_ss

3255551

FIxed failing test

f258c12

bhargavvader added 7 commits June 22, 2016 10:16

Added new classes and methods

3b2643f

Fixed failing test

0cd25e3

Merge branch 'develop' of https://github.com/piskvorky/gensim into DTM

80b687f

Update with new methods, tests

18ae9ad

Added test data files

17f7873

Added more functions

96b7f38

All methods completed

7c60f90

bhargavvader added 4 commits July 1, 2016 19:39

Added functions

0989ba3

Added more methods

bf4c416

Replaces gsl functions

7987f35

Added tests

2389177

bhargavvader and others added 5 commits July 6, 2016 19:04

Wrote all tests

6fe8524

Changed structure

9b2b347

Improved optimize

142e1c7

Added Blei LDA

d00eff7

Format changes

1fddf69

tmylk reviewed Aug 3, 2016
View reviewed changes

Added docstrings, made corpus streamable

14c5501

Updated inits

4eff614

tmylk reviewed Aug 15, 2016
View reviewed changes

Addressed suggestions

c7e9275

tmylk reviewed Aug 16, 2016
View reviewed changes

bhargavvader added 2 commits August 16, 2016 19:01

Addressed comments, cleaned code

30b4d45

Updated Notebook

9c7b0eb

bhargavvader changed the title ~~[WIP] DTM sample classes, helper methods.~~ [MRG] Dynamic Topic Models in python. Aug 18, 2016

tmylk merged commit 1ae1338 into piskvorky:develop Aug 18, 2016

bhargavvader mentioned this pull request Aug 25, 2016

Improvements to Dynamic Topic Models #840

Open

5 tasks

bhargavvader deleted the DTM branch February 23, 2017 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Dynamic Topic Models in python. #739

[MRG] Dynamic Topic Models in python. #739

bhargavvader commented Jun 9, 2016 •

edited

Loading

tmylk commented Jun 21, 2016

bhargavvader commented Jun 30, 2016

bhargavvader commented Jul 4, 2016

tmylk Aug 3, 2016 •

edited

Loading

bhargavvader commented Aug 4, 2016

tmylk Aug 15, 2016

bhargavvader Aug 15, 2016

tmylk Aug 16, 2016

bhargavvader Aug 16, 2016

bhargavvader commented Aug 16, 2016

tmylk commented Aug 17, 2016

piskvorky commented Aug 18, 2016 •

edited

Loading

bhargavvader commented Aug 18, 2016

piskvorky commented Aug 19, 2016 •

edited

Loading

bhargavvader commented Aug 19, 2016 •

edited

Loading

[MRG] Dynamic Topic Models in python. #739

[MRG] Dynamic Topic Models in python. #739

Conversation

bhargavvader commented Jun 9, 2016 • edited Loading

tmylk commented Jun 21, 2016

bhargavvader commented Jun 30, 2016

bhargavvader commented Jul 4, 2016

tmylk Aug 3, 2016 • edited Loading

Choose a reason for hiding this comment

bhargavvader commented Aug 4, 2016

tmylk Aug 15, 2016

Choose a reason for hiding this comment

bhargavvader Aug 15, 2016

Choose a reason for hiding this comment

tmylk Aug 16, 2016

Choose a reason for hiding this comment

bhargavvader Aug 16, 2016

Choose a reason for hiding this comment

bhargavvader commented Aug 16, 2016

tmylk commented Aug 17, 2016

piskvorky commented Aug 18, 2016 • edited Loading

bhargavvader commented Aug 18, 2016

piskvorky commented Aug 19, 2016 • edited Loading

bhargavvader commented Aug 19, 2016 • edited Loading

bhargavvader commented Jun 9, 2016 •

edited

Loading

tmylk Aug 3, 2016 •

edited

Loading

piskvorky commented Aug 18, 2016 •

edited

Loading

piskvorky commented Aug 19, 2016 •

edited

Loading

bhargavvader commented Aug 19, 2016 •

edited

Loading