You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ACTUAL
T1 T2 T3 .. .. . . .
Doc1
Doc2
Doc3
PREDICTED - Selected from Dominant topic from doc topic distribution.
W1 W2 W3 .. .. . . .
Doc1
Doc2
Doc3
**According to literature, If a document is asked to belong to one of the dominant
topic (hard assignment), the top words from the dominant topic should be in the
actual document. If not:
- then the probability of dominant topic is very less and there might be other topic which
can be made dominant.
- or the top words are wrongly selected. The weights of words could be better to find
the same dominant topic.**
Experiment:
Once top n words are selected from each topic, now those topics are represented with those n words.
A dominant topic is selected to represent a document, we call that as actual.
we will check for each topic which are now represented with n words. We will find most 'm' words out of those 'n' in a document. Whichever topic will have the most 'm' words, according to this, now that document is represented with this topic.
We have now x no of documents. For eg x=4, k(no of topics)=3
for x=4, we have [D1,D2,D3,D4]
Actual=[1,1,2,0]
Predicted=[1,0,2,0]
The score is = 2/4=0.50
Results:
Higher the better
Conclusion:
tuned with top 7 words is performing much better than untuned (default, k=10) top 7 words.
tuned with top 7 words is performing better or same than untuned (default, k=10) top 10 words.
With tuning we have better top 7 words defining that topic.
The text was updated successfully, but these errors were encountered:
for reporting stable conclusions. (related to model stability)
another one for using LDA features into svm. (related to classification)
This one is related to the first track. We want to report stable topics generation and that only top 7 words are important after tuning rather than reporting 10 words with default.
IDEA:
Experiment:
Results:
Conclusion:
The text was updated successfully, but these errors were encountered: