Karpovich S. N. Multi-label text classification with Probabilistic Topic Model ml-PLSI.
Keywords: Multi-label classification, supervised learning, topic model, natural language processing.
SUMMARY The paper proposes a method of multi-label classification for documents with topic model. A lot of researches of clustering and classification algorithms have one label for one document when one document can be relevant to several labels. The task is very actual. A comparative analysis of algorithms for multi-label classification is made. The article describes technology tools for the multi-label classification algorithm. A Topic Model is created by a supervised learning. We have estimated the classification quality and made a list of proposed categories for a word. The developed approach has shown its efficiency. Probabilistic estimations of the assignment of a document to a category allow to use it in the collective recognition and associative classification. Further we will research the opportunities of multi-label classification with probabilistic topic model.
BigARTM is released under New BSD License that allowes unlimited redistribution for any purpose (even for commercial use) as long as its copyright notices and the license’s disclaimers of warranty are maintained.