Credibility Of LDA #34

amritbhanu · 2016-10-11T16:41:47Z

IDEA:

ACTUAL
          T1        T2         T3      .. .. . . .
Doc1
Doc2
Doc3

PREDICTED - Selected from Dominant topic from doc topic distribution.
          W1        W2         W3      .. .. . . .
Doc1
Doc2
Doc3

**According to literature, If a document is asked to belong to one of the dominant 
topic (hard assignment), the top words from the dominant topic should be in the 
actual document. If not:
 - then the probability of dominant topic is very less and there might be other topic which 
can be made dominant.
- or the top words are wrongly selected. The weights of words could be better to find 
the same dominant topic.**

Experiment:

Once top n words are selected from each topic, now those topics are represented with those n words.
A dominant topic is selected to represent a document, we call that as actual.
we will check for each topic which are now represented with n words. We will find most 'm' words out of those 'n' in a document. Whichever topic will have the most 'm' words, according to this, now that document is represented with this topic.

We have now x no of documents. For eg x=4, k(no of topics)=3
for x=4, we have [D1,D2,D3,D4]
Actual=[1,1,2,0]
Predicted=[1,0,2,0]
The score is = 2/4=0.50

Results:

Higher the better

Conclusion:

tuned with top 7 words is performing much better than untuned (default, k=10) top 7 words.
tuned with top 7 words is performing better or same than untuned (default, k=10) top 10 words.
With tuning we have better top 7 words defining that topic.

The text was updated successfully, but these errors were encountered:

timm · 2016-10-12T02:21:55Z

plz clarify:

was this with using lda as the terms for a subsequent use of SVM?
the above results show 5 cases where tuend wa as good or better than other things. so why are you reporting this as a negative result?

amritbhanu · 2016-10-12T02:29:37Z

We have 2 tracks in lda now:

for reporting stable conclusions. (related to model stability)
another one for using LDA features into svm. (related to classification)

This one is related to the first track. We want to report stable topics generation and that only top 7 words are important after tuning rather than reporting 10 words with default.

I am reporting positive results.

timm · 2016-10-12T02:30:37Z

ur reporting positive results for...

for reporting stable conclusions.
another one for using LDA features into svm.

amritbhanu · 2016-10-12T02:32:03Z

just for the first right now.

amritbhanu added the Result label Oct 11, 2016

amritbhanu assigned timm Oct 11, 2016

amritbhanu mentioned this issue Oct 11, 2016

Weekly Report - 10/11/2016 #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Credibility Of LDA #34

Credibility Of LDA #34

amritbhanu commented Oct 11, 2016 •

edited

Loading

timm commented Oct 12, 2016

amritbhanu commented Oct 12, 2016 •

edited

Loading

timm commented Oct 12, 2016

amritbhanu commented Oct 12, 2016

Credibility Of LDA #34

Credibility Of LDA #34

Comments

amritbhanu commented Oct 11, 2016 • edited Loading

IDEA:

Experiment:

Results:

Conclusion:

timm commented Oct 12, 2016

amritbhanu commented Oct 12, 2016 • edited Loading

timm commented Oct 12, 2016

amritbhanu commented Oct 12, 2016

amritbhanu commented Oct 11, 2016 •

edited

Loading

amritbhanu commented Oct 12, 2016 •

edited

Loading