Skip to content
/ CA-LDA Public
forked from mimno/Mallet

A Mallet fork with support for CA-LDA described in paper: Immersive Recommendation at WWW'2016

License

Notifications You must be signed in to change notification settings

changun/CA-LDA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CA-LDA

=======

  • The model is implemented on top of and requires Mallet 2.0.7's ParallelTopicModel.
  • Trained models are available for download:

Input/Output

  • Given a document from one of the context it has been trained on (e.g. Mail, Meetup, Twitter, and News), CA-LDA returns the K-dimensional topic distribution of the document along with the proportion of the background words in the document.
  • The returned K-dimensional topic distribution can be used to estimate the similarity between two documents from different context with the influence of the context-depandent background words removed.
  • Cosine similairty is recommended similarity metric.

Usage

Preprocessing

  • Create a preprocessing Pipe.
    • The pipe structure needs to be exactly the same as the one we used when training the model (See example.)
  • Put raw documents into an InstanceList throuhg the pipe (See example.)

Inference

  • Load the model using ObjectStreamInput.readObject()
  • Call model.getInferencer(contextName) to get a TopicInferencer for a specific context.
  • Infer the topic distribution by calling inferencer.getSampledDistribution(instance, 100, 1, 5).
  • The function returns a dobule[] of length K+1 which consists of the distribution of each K topic plus the proportion of the background words (at the last element of the array).
    • Usually, we discard the background proportion and only use the K-dimentional topic distribution to estimate the document similarity

Demo

  • For details, please see Demo App

    java -classpath [CLASSES_LOCATION] cc.mallet.examples.RunContextAwareLDA [MODEL_FILE] [DOCUMENT] [CONTEXT_NAME]
    • Sample result from inferring topics in one of the EnronSent document (using Mail context)
    > java -classpath CALDA.jar cc.mallet.examples.RunContextAwareLDA CA_LDA_dim_500.bin.gz enronsent04 Mail
    
    Loading model ...
    
    Found 4 contexts: Mail, Meetup, News, Twitter
    
    Topic distribution:
    0.044 of Topic 217: energi power solar electr gas renew fuel wind util effici
    0.029 of Topic 159: messag send account list inform receiv contact pleas updat issu
    0.025 of Topic 424: invest stock market investor compani fund financi firm report ceo
    0.023 of Topic 279: law legal court rule lawyer protect licens patent copyright contract
    0.023 of Topic 395: program fund propos feder provid public budget administr nation bill
    ....
    

About

A Mallet fork with support for CA-LDA described in paper: Immersive Recommendation at WWW'2016

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 99.5%
  • Other 0.5%