You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[bibtex](@PhDThesis{yang2015improving,
title={Improving the Usability of Topic Models},
author={Yang, Yi},
year={2015},
school={NORTHWESTERN UNIVERSITY}
})
Problems:
Gibbs sampling inference method for LDA runs too slow for large dataset with many topics.
The topics learned by LDA sometimes are difficult to interpret by end users.
LDA suffers from instability problem
Motivation:
like to efficiently train a big topic model with prior knowledge
Terminologies:
Markov random field
First-Order Logic
Stability Measures:
try running the algorithm many times and choose the model with the highest likelihood
document-level stability and token-level stability.
the number of topics was set to 20, the number of iterations was 1000. We use a uniform α with a value of 1.0, a uniform β with a value of 0.01
General:
LDA can be viewed as dimension reduction tool for document modeling by reducing the dataset dimension from the vocabulary size V to the number of topics T.
users have external knowledge regarding word correlation, which can be taken into account to improve the semantic coherence of topic modeling
Methods:
SCLDA can handle different kinds of knowledge such as word correlation, document correlation,
document label and so on. One advantage of SC-LDA over existing methods is that it is very fast to converge.
the mental map Jane has built for the paper collection is disrupted, resulting in confusion and frustration. The tool has become less useful to Jane unless she puts in some effort to update her mental map, which significantly increases her cognitive load
Interactive Topic Modeling (ITM): [26] proposes the first interactive framework for allowing users to iteratively refine the topics discovered by LDA by adding constraints that enforce that sets of words must appear together in the same topic
[47] proposes Fast-LDA by constructing an adaptive upper bound on the sampling distribution and achieves a faster inference
Summary:
Labeled LDA can only handle document label knowledge. Dirichlet Forest LDA, Quad-LDA, NMF-LDA and ITM can only handle word correlation knowledge. MRTF can only handle document correlation knowledge. Logic LDA can handle word correlation , document label knowledge and other kinds of knowledge. However, each knowledge has to be encoded as First Order Logic
The text was updated successfully, but these errors were encountered:
Improving the Usability of Topic Models
[bibtex](@PhDThesis{yang2015improving,
title={Improving the Usability of Topic Models},
author={Yang, Yi},
year={2015},
school={NORTHWESTERN UNIVERSITY}
})
Problems:
Motivation:
Terminologies:
Stability Measures:
General:
Methods:
document label and so on. One advantage of SC-LDA over existing methods is that it is very fast to converge.
Datasets:
Analogy when topics are unstable:
References:
Summary:
The text was updated successfully, but these errors were encountered: