Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Topic coherence update 3 #793

Closed

Conversation

devashishd12
Copy link
Contributor

@devashishd12 devashishd12 commented Jul 16, 2016

Changes:

  • Added backtracking dictionary for storing the context vector for (w_prime or w_star, w) pair. Each such tuple is mapped one to one to a context vector. Earlier each context vector was calculated every time. Changed window size to 110.
  • Added c_uci coherence measure.
  • Added c_npmi coherence measure.
  • Added window_size parameter to CoherenceModel init.

@devashishd12
Copy link
Contributor Author

@tmylk should I add the benchmark testing notebooks for 20NG and Movies dataset too?

@devashishd12
Copy link
Contributor Author

Correcting the tests.

@devashishd12 devashishd12 changed the title Improved backtracking algorithm for context vector calculation Topic coherence update 3 Jul 18, 2016
@devashishd12
Copy link
Contributor Author

devashishd12 commented Jul 18, 2016

@tmylk I've added c_uci, c_npmi coherence measures and window_size parameter to CoherenceModel init. This pr also addresses issue #765.

@devashishd12
Copy link
Contributor Author

@tmylk I've added the benchmark testing notebook on movies dataset.

@devashishd12 devashishd12 changed the title Topic coherence update 3 [MRG] Topic coherence update 3 Jul 20, 2016
@@ -91,7 +101,7 @@ def __init__(self, model=None, topics=None, texts=None, corpus=None, dictionary=
else:
self.dictionary = dictionary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of checking if isinstance(model.id2word, FakeDict): above? why is it note enough to check for None?

Copy link
Contributor Author

@devashishd12 devashishd12 Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None doesn't work here actually. When no word->id mapping is provided while creating an LdaModel this line is called which returns a FakeDict from here. So this code:

tm1 = LdaModel(corpus=corpus, num_topics=2)
if tm1.id2word is None:
    print 'aye'
else:
    print 'naye'

actually prints naye but if I change it to isinstance(tm1.id2word, FakeDict) it outputs correctly.
Am I correct here?

@devashishd12
Copy link
Contributor Author

@tmylk I've addressed your initial comments. I hope I've addressed them correctly.

@devashishd12
Copy link
Contributor Author

@tmylk should I change all Args to Parameters?

@@ -8,12 +8,12 @@
This module contains functions to compute confirmation on a pair of words or word subsets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain how the indirect confirmation measures work and why they are useful similar to the competing car brands explanation in the paper.

for top_words in topics:
s_one_one_t = []
for w_prime in top_words:
w_prime_index = int(np.where(top_words == int(w_prime))[0]) # To get index of w_prime in top_words
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@devashishd12
Copy link
Contributor Author

@tmylk I've addressed your comments.

@tmylk
Copy link
Contributor

tmylk commented Aug 18, 2016

Merged in 6f53b31

@tmylk tmylk closed this Aug 18, 2016
@devashishd12 devashishd12 deleted the benchmark_testing branch August 21, 2016 08:44
@devashishd12
Copy link
Contributor Author

devashishd12 commented Oct 3, 2016

Linking to #750 and #710.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants