[MRG] Topic coherence update 3 #793

devashishd12 · 2016-07-16T13:52:41Z

Changes:

Added backtracking dictionary for storing the context vector for (w_prime or w_star, w) pair. Each such tuple is mapped one to one to a context vector. Earlier each context vector was calculated every time. Changed window size to 110.
Added c_uci coherence measure.
Added c_npmi coherence measure.
Added window_size parameter to CoherenceModel init.

devashishd12 · 2016-07-16T13:59:42Z

@tmylk should I add the benchmark testing notebooks for 20NG and Movies dataset too?

devashishd12 · 2016-07-16T14:03:50Z

Correcting the tests.

devashishd12 · 2016-07-18T18:34:42Z

@tmylk I've added c_uci, c_npmi coherence measures and window_size parameter to CoherenceModel init. This pr also addresses issue #765.

devashishd12 · 2016-07-20T16:26:00Z

@tmylk I've added the benchmark testing notebook on movies dataset.

tmylk · 2016-08-03T13:22:33Z

gensim/models/coherencemodel.py

@@ -91,7 +101,7 @@ def __init__(self, model=None, topics=None, texts=None, corpus=None, dictionary=
        else:
            self.dictionary = dictionary


what is the point of checking if isinstance(model.id2word, FakeDict): above? why is it note enough to check for None?

None doesn't work here actually. When no word->id mapping is provided while creating an LdaModel this line is called which returns a FakeDict from here. So this code:

tm1 = LdaModel(corpus=corpus, num_topics=2) if tm1.id2word is None: print 'aye' else: print 'naye'

actually prints naye but if I change it to isinstance(tm1.id2word, FakeDict) it outputs correctly.
Am I correct here?

devashishd12 · 2016-08-04T22:14:06Z

@tmylk I've addressed your initial comments. I hope I've addressed them correctly.

devashishd12 · 2016-08-05T13:25:11Z

@tmylk should I change all Args to Parameters?

tmylk · 2016-08-05T14:59:35Z

gensim/topic_coherence/indirect_confirmation_measure.py

@@ -8,12 +8,12 @@
 This module contains functions to compute confirmation on a pair of words or word subsets.



Explain how the indirect confirmation measures work and why they are useful similar to the competing car brands explanation in the paper.

tmylk · 2016-08-05T15:32:32Z

gensim/topic_coherence/segmentation.py

+    for top_words in topics:
+        s_one_one_t = []
+        for w_prime in top_words:
+            w_prime_index = int(np.where(top_words == int(w_prime))[0])  # To get index of w_prime in top_words


same as above

devashishd12 · 2016-08-07T14:33:53Z

@tmylk I've addressed your comments.

tmylk · 2016-08-18T16:21:37Z

Merged in 6f53b31

devashishd12 · 2016-10-03T06:54:55Z

Linking to #750 and #710.

devashishd12 added 2 commits July 14, 2016 13:01

changed window size

3fa71ad

sign, backtracking, window_size change.

8f9fd67

devashishd12 added 3 commits July 16, 2016 19:43

Correct nlr test.

d7c6092

removed tuple parameter

fb56b81

Added c_uci, c_npmi, window_size parameter

240edd4

devashishd12 changed the title ~~Improved backtracking algorithm for context vector calculation~~ Topic coherence update 3 Jul 18, 2016

devashishd12 mentioned this pull request Jul 18, 2016

LDA u_mass coherence calculation triggers numpy deprecation warning #765

Closed

added benchmark testing notebook on movies

5e44630

devashishd12 changed the title ~~Topic coherence update 3~~ [MRG] Topic coherence update 3 Jul 20, 2016

tmylk reviewed Aug 3, 2016
View reviewed changes

Refactored code, addressed intial comments.

20e2d6d

devashishd12 force-pushed the benchmark_testing branch from 05a1e4d to 20e2d6d Compare August 5, 2016 14:09

tmylk reviewed Aug 5, 2016
View reviewed changes

Added ind_conf_m explanation, refactoring in prob_est, seg modules

0ca2672

devashishd12 force-pushed the benchmark_testing branch from a15e5c3 to 0ca2672 Compare August 7, 2016 14:44

tmylk added a commit that referenced this pull request Aug 18, 2016

Merge Topic Coherence update 3 PR #793

977232e

tmylk closed this Aug 18, 2016

devashishd12 deleted the benchmark_testing branch August 21, 2016 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Topic coherence update 3 #793

[MRG] Topic coherence update 3 #793

devashishd12 commented Jul 16, 2016 •

edited

Loading

devashishd12 commented Jul 16, 2016

devashishd12 commented Jul 16, 2016

devashishd12 commented Jul 18, 2016 •

edited

Loading

devashishd12 commented Jul 20, 2016

tmylk Aug 3, 2016

devashishd12 Aug 4, 2016 •

edited

Loading

devashishd12 commented Aug 4, 2016

devashishd12 commented Aug 5, 2016

tmylk Aug 5, 2016

tmylk Aug 5, 2016

devashishd12 commented Aug 7, 2016

tmylk commented Aug 18, 2016

devashishd12 commented Oct 3, 2016 •

edited

Loading

		@@ -91,7 +101,7 @@ def __init__(self, model=None, topics=None, texts=None, corpus=None, dictionary=
		else:
		self.dictionary = dictionary

		@@ -8,12 +8,12 @@
		This module contains functions to compute confirmation on a pair of words or word subsets.

[MRG] Topic coherence update 3 #793

[MRG] Topic coherence update 3 #793

Conversation

devashishd12 commented Jul 16, 2016 • edited Loading

devashishd12 commented Jul 16, 2016

devashishd12 commented Jul 16, 2016

devashishd12 commented Jul 18, 2016 • edited Loading

devashishd12 commented Jul 20, 2016

tmylk Aug 3, 2016

Choose a reason for hiding this comment

devashishd12 Aug 4, 2016 • edited Loading

Choose a reason for hiding this comment

devashishd12 commented Aug 4, 2016

devashishd12 commented Aug 5, 2016

tmylk Aug 5, 2016

Choose a reason for hiding this comment

tmylk Aug 5, 2016

Choose a reason for hiding this comment

devashishd12 commented Aug 7, 2016

tmylk commented Aug 18, 2016

devashishd12 commented Oct 3, 2016 • edited Loading

devashishd12 commented Jul 16, 2016 •

edited

Loading

devashishd12 commented Jul 18, 2016 •

edited

Loading

devashishd12 Aug 4, 2016 •

edited

Loading

devashishd12 commented Oct 3, 2016 •

edited

Loading