LDAModel depracation warning (issue 494) #513

mattilyra · 2015-11-07T21:37:50Z

I decided to not call utils.grouper with as_numpy=False and break the Pyro stuff as I thought we can work around that. So now there's just a check in .inference that makes sure the indices are ints when a numpy array is passed in.

This seemed like a better choice as it doesn't introduce any breaking changes unnecessarily.

…y converting term IDs to ints.

piskvorky · 2015-11-08T03:26:05Z

gensim/models/ldamodel.py

@@ -367,7 +367,15 @@ def inference(self, chunk, collect_sstats=False):
        # Lee&Seung trick which speeds things up by an order of magnitude, compared
        # to Blei's original LDA-C code, cool!).
        for d, doc in enumerate(chunk):
-            ids = [id for id, _ in doc]
+            if isinstance(doc, numpy.ndarray):
+                if len(doc) == 0:


len(array) will return the length along the first axis (=number of rows). Is that always the case here, is it what we want?

piskvorky · 2015-11-08T03:31:55Z

Thanks Matti.

We can do these ad-hoc fixes in various places, though I still think it's cleaner to solve this at the place of origin = where the float values are generated (as_numpy=False).

Also, what's the advantage of your if solution vs. a direct ids = [int(id) for id, _ in doc]?

mattilyra · 2015-11-08T08:38:26Z

If the doc is already a numpy array it's faster to cast it to int_ (the default numpy integer, 64bit I guess on most machines) instead of calling int() in the list comprehension - at least on the machine I tried.

I didn't consider the case of 1-d documents, the len(doc) also won't be semantically consistent with them. I'll change the call to utils.grouper in the end that'll end up being much cleaner as you say.

I thought I was being so clever.

piskvorky · 2015-11-08T09:09:07Z

:D

You are clever, just in this case I think simplicity and consistency are more important (it's not the bottleneck).

…for calling .update with chunks_as_numpy=True/False.

mattilyra · 2015-11-08T18:51:36Z

Ok, so I decided to keep the check for the IDs being ints since I assume numpy will complain about it, pyro or no pyro. The default is now for the chunks to be not numpy, and there's an option on .update to make them numpy.

I had an issue with some tests failing because my distro wasn't picking up on the added keyword parameter to .update, but I think that was just a local issue on my machine.

piskvorky · 2015-11-09T01:42:24Z

That looks like a nice solution! Thanks.

If it later turns out this "make sure ids are ints" functionality is more widely used, we could factor it out into a separate function, something like matutils.check_vector().

Travis reports all tests as failing though:

======================================================================
ERROR: testTransform (gensim.test.test_ldamodel.TestLdaMulticore)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/piskvorky/gensim/gensim/test/test_ldamodel.py", line 382, in setUp
    self.model = self.class_(corpus, id2word=dictionary, num_topics=2, passes=100)
  File "/home/travis/build/piskvorky/gensim/gensim/models/ldamulticore.py", line 141, in __init__
    gamma_threshold=gamma_threshold)
  File "/home/travis/build/piskvorky/gensim/gensim/models/ldamodel.py", line 331, in __init__
    self.update(corpus, chunks_as_numpy=use_numpy)
TypeError: update() got an unexpected keyword argument 'chunks_as_numpy'

piskvorky · 2015-11-09T01:44:57Z

Ah I see, the fails are coming from LdaMulticore, which inherits from LdaModel class but probably overrides its update(). Can you add the same parameter there as well?

piskvorky · 2015-11-17T06:28:53Z

Thank you very much Matti, ready to merge!

Can you add yourself + a short description of this change to CHANGELOG?

mattilyra · 2015-11-17T08:53:22Z

I always forget the CHANGELOG

LDAModel depracation warning (issue 494)

piskvorky · 2015-11-17T13:16:06Z

Great, thanks!

tmylk · 2015-11-17T20:57:42Z

Unfortunately LDA doesn't converge on Windows.

====================================================================


FAIL: testTransform (gensim.test.test_ldamodel.TestLdaMulticore)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python35-x64\lib\site-packages\gensim\test\test_ldamodel.py", line 79, in testTransform
    self.assertTrue(passed)
AssertionError: False is not true
-------------------- >> begin captured logging << --------------------

piskvorky · 2015-11-18T05:30:02Z

That's probably unrelated to this PR -- or do you see a connection?

In general, on these "Windows failures": maybe the RNG works differently on Windows, which causes tests that pass on *NIX to fail on Windows?

tmylk · 2015-11-18T08:20:57Z

It was passing on Sunday and it's the only change I can find around it.
Numpy and scipy versions remained the same.
On 18 Nov 2015 05:30, "Radim Řehůřek" notifications@github.com wrote:

That's probably unrelated to this PR -- or do you see a connection?

—
Reply to this email directly or view it on GitHub
#513 (comment).

mattilyra added 3 commits October 22, 2015 10:00

Added a special case for handling being ndarray in LDAModel.inference.

3c898ca

Added a catch for empty documents.

196388a

Added a catch for when utils.grouper return numpy array and explicitl…

951b562

…y converting term IDs to ints.

piskvorky reviewed Nov 8, 2015
View reviewed changes

Changed the chunks to be python lists by default and added an option …

0c946ea

…for calling .update with chunks_as_numpy=True/False.

Added chunks_as_numpy to LdaMulticore.update.

2d814cb

Added message to changelog.

c5fd181

piskvorky added a commit that referenced this pull request Nov 17, 2015

Merge pull request #513 from mattilyra/pr_494

d25ae02

LDAModel depracation warning (issue 494)

piskvorky merged commit d25ae02 into piskvorky:develop Nov 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LDAModel depracation warning (issue 494) #513

LDAModel depracation warning (issue 494) #513

mattilyra commented Nov 7, 2015

piskvorky Nov 8, 2015

piskvorky commented Nov 8, 2015

mattilyra commented Nov 8, 2015

piskvorky commented Nov 8, 2015

mattilyra commented Nov 8, 2015

piskvorky commented Nov 9, 2015

piskvorky commented Nov 9, 2015

piskvorky commented Nov 17, 2015

mattilyra commented Nov 17, 2015

piskvorky commented Nov 17, 2015

tmylk commented Nov 17, 2015

piskvorky commented Nov 18, 2015

tmylk commented Nov 18, 2015

LDAModel depracation warning (issue 494) #513

LDAModel depracation warning (issue 494) #513

Conversation

mattilyra commented Nov 7, 2015

piskvorky Nov 8, 2015

Choose a reason for hiding this comment

piskvorky commented Nov 8, 2015

mattilyra commented Nov 8, 2015

piskvorky commented Nov 8, 2015

mattilyra commented Nov 8, 2015

piskvorky commented Nov 9, 2015

piskvorky commented Nov 9, 2015

piskvorky commented Nov 17, 2015

mattilyra commented Nov 17, 2015

piskvorky commented Nov 17, 2015

tmylk commented Nov 17, 2015

piskvorky commented Nov 18, 2015

tmylk commented Nov 18, 2015