# Example of GibbsLDA

This example requires to install three nltk corpora:nltk.corpus.reuters, nltk.corpus.words, nltk.corpus.stopwords.

You can download the corpora via `nltk.download()`

In [1]:
import numpy as np

from ptm import GibbsLDA
from ptm.nltk_corpus import get_reuters_cnt_ids
from ptm.utils import convert_cnt_to_list, get_top_words

## Loading Reuter corpus from NLTK

Load reuter corpus including 1000 documents with maximum vocabulary size of 10000 from NLTK corpus

In [2]:
n_doc = 1000
voca, doc_ids, doc_cnt = get_reuters_cnt_ids(num_doc=n_doc, max_voca=10000)
docs = convert_cnt_to_list(doc_ids, doc_cnt)
print('Vocabulary size:%d' % len(voca))

Vocabulary size:4654


### Inferencen through the Gibbs sampling

In [3]:
max_iter=100
n_topic=30
model = GibbsLDA(n_doc, len(voca), n_topic)
model.fit(docs, max_iter=max_iter)

2016-02-10 10:39:05 INFO:GibbsLDA:[ITER] 0, 0.95, -492103.33
INFO:GibbsLDA:[ITER] 0, 0.95, -492103.33
2016-02-10 10:39:06 INFO:GibbsLDA:[ITER] 1, 0.97, -449060.17
INFO:GibbsLDA:[ITER] 1, 0.97, -449060.17
2016-02-10 10:39:07 INFO:GibbsLDA:[ITER] 2, 0.99, -425090.80
INFO:GibbsLDA:[ITER] 2, 0.99, -425090.80
2016-02-10 10:39:08 INFO:GibbsLDA:[ITER] 3, 1.04, -409972.35
INFO:GibbsLDA:[ITER] 3, 1.04, -409972.35
2016-02-10 10:39:09 INFO:GibbsLDA:[ITER] 4, 0.98, -399907.10
INFO:GibbsLDA:[ITER] 4, 0.98, -399907.10
2016-02-10 10:39:10 INFO:GibbsLDA:[ITER] 5, 0.96, -392280.97
INFO:GibbsLDA:[ITER] 5, 0.96, -392280.97
2016-02-10 10:39:11 INFO:GibbsLDA:[ITER] 6, 0.97, -387048.46
INFO:GibbsLDA:[ITER] 6, 0.97, -387048.46
2016-02-10 10:39:12 INFO:GibbsLDA:[ITER] 7, 1.03, -383034.02
INFO:GibbsLDA:[ITER] 7, 1.03, -383034.02
2016-02-10 10:39:13 INFO:GibbsLDA:[ITER] 8, 1.01, -378981.12
INFO:GibbsLDA:[ITER] 8, 1.01, -378981.12
2016-02-10 10:39:14 INFO:GibbsLDA:[ITER] 9, 0.97, -376289.35
INFO:GibbsLDA:[ITER] 

### Print top 10 probability words for each topic

In [4]:
for ti in range(n_topic):
    top_words = get_top_words(model.TW, voca, ti, n_words=10)
    print('Topic', ti ,':\t', ','.join(top_words))

Topic 0 :	 chemical,group,company,also,total,general,rubber,capital,sold,used
Topic 1 :	 dollar,yen,japan,bank,central,g,west,exchange,currency,policy
Topic 2 :	 april,record,one,may,prior,pay,div,split,dividend,note
Topic 3 :	 deficit,government,major,finance,economic,trade,cut,current,also,industrial
Topic 4 :	 stocks,production,use,start,end,supply,x,demand,total,cotton
Topic 5 :	 oil,dome,gas,debt,days,term,energy,plan,new,natural
Topic 6 :	 quarter,first,earnings,company,share,per,ago,period,fiscal,strong
Topic 7 :	 would,told,price,house,committee,de,government,official,consumer,meat
Topic 8 :	 fed,reserve,federal,two,market,repurchase,system,spokesman,wednesday,one
Topic 9 :	 week,march,february,april,last,fell,average,report,previous,ended
Topic 10 :	 nil,o,e,c,n,f,p,b,total,d
Topic 11 :	 fund,free,mine,april,yesterday,port,grain,franklin,management,stockpile
Topic 12 :	 national,new,coffee,sale,international,american,york,business,sell,unit
Topic 13 :	 stock,company,share,corp