What is LDA?

Latent Dirichlet allocation (LDA) is a topic model that generates topics based on word frequency from a set of documents. LDA is particularly useful for finding reasonably accurate mixtures of topics within a given document set.

LDA walkthrough

This walkthrough goes through the process of generating an LDA model with a highly simplified document set. This is not an exhaustive explanation of LDA. The goal of this walkthrough is to guide users through key steps in preparing their data and providing example output.

Packages required

This walkthrough uses the following Python packages:

NLTK, a natural language toolkit for Python. A useful package for any natural language processing.

For Mac/Unix with pip: $ sudo pip install -U nltk.

stop_words, a Python package containing stop words.

For Mac/Unix with pip: $ sudo pip install stop-words.

gensim, a topic modeling package containing our LDA model.

For Mac/Unix with pip: $ sudo pip install gensim.


So what does LDA actually do?


This explanation is a little lengthy, but useful for understanding the model we worked so hard to generate.

LDA assumes documents are produced from a mixture of topics. Those topics then generate words based on their probability distribution, like the ones in our walkthrough model. In other words, LDA assumes a document is made from the following steps:

Determine the number of words in a document. Let’s say our document has 6 words.
Determine the mixture of topics in that document. For example, the document might contain 1/2 the topic “health” and 1/2 the topic “vegetables.”
Using each topic’s multinomial distribution, output words to fill the document’s word slots. In our example, the “health” topic is 1/2 our document, or 3 words. The “health” topic might have the word “diet” at 20% probability or “exercise” at 15%, so it will fill the document word slots based on those probabilities.
Given this assumption of how documents are created, LDA backtracks and tries to figure out what topics would create those documents in the first place.

In [1]:
doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
doc_d = "I often feel pressure to perform well at school, but my mother never seems to drive my brother to do better."
doc_e = "Health professionals say that brocolli is good for your health."

# compile sample documents into a list
doc_set = [doc_a, doc_b, doc_c, doc_d, doc_e]

Tokenization

In [2]:
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')

In [6]:
raw = doc_a.lower()
tokens = tokenizer.tokenize(raw)
print(tokens)

['brocolli', 'is', 'good', 'to', 'eat', 'my', 'brother', 'likes', 'to', 'eat', 'good', 'brocolli', 'but', 'not', 'my', 'mother']


Stop words

In [8]:
from stop_words import get_stop_words
# create Engligh stop words list
en_stop = get_stop_words('en')

In [10]:
stopped_tokens=[i for i in tokens if not i in en_stop]
print(stopped_tokens)

['brocolli', 'good', 'eat', 'brother', 'likes', 'eat', 'good', 'brocolli', 'mother']


Stemming

In [14]:
from nltk.stem.porter import PorterStemmer

# Create p_stemmer of class PorterStemmer
p_stemmer = PorterStemmer()

In [23]:
# stem token
texts = [p_stemmer.stem(i) for i in stopped_tokens]

print(texts)

['brocolli', 'good', 'eat', 'brother', 'like', 'eat', 'good', 'brocolli', 'mother']


Constructing a document-term matrix

In [25]:
from gensim import corpora, models

dictionary = corpora.Dictionary(texts)

TypeError: doc2bow expects an array of unicode tokens on input, not a single string

In [9]:
from nltk.tokenize import RegexpTokenizer
from stop_words import get_stop_words
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
import gensim

In [28]:
tokenizer = RegexpTokenizer(r'\w+')
#create english stop words list
en_stop = get_stop_words('en')
#create p_stremmer of class Porterstemmer
p_stemmer = PorterStemmer()
# create sample documents
# doc_a = "Brocolli is good to eat. My brother likes to eat good brocolli, but not my mother."
# doc_b = "My mother spends a lot of time driving my brother around to baseball practice."
# doc_c = "Some health experts suggest that driving may cause increased tension and blood pressure."
# doc_d = "‘Lihaaf’ is a text that challenges some of the key tenets of a certain kind of feminism.What are we to make of the Begum’s transformation into a sexual predator? Are we to see her de-formation as itself a response to her patriarchy domination by the Nawab and by her immediate hyper conservative milieu? Or would we say, since we want to called her agential that she’s a hero and villain of her own making.Begum Jan is more complex than a simple victim of patriarchy. Women back in 1900’s married in their teen’s and early twenties. They have too little, even no rights, and largely depend on their fathers, husbands or sons. In this short and simple yet bravely told story, Ismat Chughtai address an unspeakable issue to her audience – deeply religious and conservative Indians. ."
# doc_e = "Begum Jan is trapped in a disappointing marriage. She is neglected by her husband Nawab. Begum’s frustration and the events that took around her, the author captured them in a sympathetic, almost comical manner through the eyes of a child that does not completely understand what she’s witnessing. Begum, surrounded by attendees, supplies and luxurious, is lonely and yearn for great love and attentions. The funny things, the child witnessed of Begum and Rabbu’s relationship traumatised the child. Left alone by her husband, she takes charge of her life and navigates her way through the binding of the patriarchal setup to express her sexual urges and satiate them. She might be secluded in her husband’s household but she used the imposed seclusion to her advantage, she created a world for herself. Once in there, she is no longer at the mercy of her husband, she can unhesitatingly voice an ‘itch’ – on her entire existence revolved – and find the necessary means in Rabbu to tend it. And she does. Would the Begum has turned to Rabbu if her husband has been more attentive to her needs? Or perhaps if her situation was- despite knowing from a young age she was gay- she might knew telling the parents would cause a rift that might put insurmountable. Even if she was in a little bit more of modern and understanding family, most psychologists in the 19th and 20th centuries classified homosexuality as a form of mental illness. Many were subjected to psychiatric ‘treatment’ with the aim to cure their homosexuality. And eventually she had to face it. However that was not in her case, even if Begum ‘come clean’ before her parents, before her societies, it would be impossible. In Islam, as in many Christian orthodox denominations as in Orthodox Judaism, homosexuality is seen as a sin. ." 
# doc_f = "Most of all, for a woman ignored and victimized by patriarchy, and self-empowered, Begum Jan’s behaviour towards the protagonist in absence of Rabbu however was questionable. Reading the text merely as a feminist text has also led to our misplaced identification of who is the feminist in Chughtai’s story. The child narrator who can think of an egalitarian, open relationship with her brothers and common male friends and who even at her most terrified, gathers courage and speaks up. Resulting her mother to send her to Begum Jan and the zenana, that was supposed to empower her, punished her instead- silencing and pacifying her. No wonder it led to a great controversy back then, when Ismat Chughtai boldly bashing the social orders of deeply conservative societies where upbringing of a person was actually based on deep rooted conditionings."
#compile sample document into list
doc_a= """Lihaaf’ is a text that challenges some of the key tenets of a certain kind of feminism. 
What are we to make of the Begum’s transformation into a sexual predator? Are we to see her ‘’de-formation’’ as itself a response to her patriarchy domination by the Nawab and by her immediate hyper conservative milieu? Or would we say, since we want to called her agential that she’s a hero and villain of her own making. 
Begum Jan is more complex than a simple victim of patriarchy. 
Women back in 1900’s married in their teen’s and early twenties. They have too little, even no rights, and largely depend on their fathers, husbands or sons. 
In this short and simple yet bravely told story, Ismat Chughtai address an unspeakable issue to her audience – deeply religious and conservative Indians. 
Begum Jan is trapped in a disappointing marriage. She is neglected by her husband Nawab. Begum’s frustration and the events that took around her, the author captured them in a sympathetic, almost comical manner through the eyes of a child that does not completely understand what she’s witnessing. Begum, surrounded by attendees, supplies and luxurious, is lonely and yearn for great love and attentions. The funny things, the child witnessed of Begum and Rabbu’s relationship traumatised the child. 
Left alone by her husband, she takes charge of her life and navigates her way through the binding of the patriarchal setup to express her sexual urges and satiate them. She might be secluded in her husband’s household but she used the imposed seclusion to her advantage, she created a world for herself. Once in there, she is no longer at the mercy of her husband, she can unhesitatingly voice an ‘itch’ – on her entire existence revolved – and find the necessary means in Rabbu to tend it. And she does. 
Would the Begum has turned to Rabbu if her husband has been more attentive to her needs? 
Or perhaps if her situation was- despite knowing from a young age she was gay- she might knew telling the parents would cause a rift that might put insurmountable. Even if she was in a little bit more of modern and understanding family, most psychologists in the 19th and 20th centuries classified homosexuality as a form of mental illness. Many were subjected to psychiatric ‘treatment’ with the aim to cure their homosexuality. And eventually she had to face it. However that was not in her case, even if Begum ‘come clean’ before her parents, before her societies, it would be impossible. 
In Islam, as in many Christian orthodox denominations as in Orthodox Judaism, homosexuality is seen as a sin."""
 
#doc_set = [doc_a,doc_b,doc_c,doc_d,doc_e, doc_f]
doc_set = [doc_a]
#list for tokenized document in loop
texts=[]

for i in doc_set:
    # clean and tokenize document string
    raw = i.lower()
    tokens = tokenizer.tokenize(raw)
    print(tokens)
    # remove stop words from tokens
    stoped_tokens = [i for i in tokens if not i in en_stop]
    # stem token
    stemmed_tokens= [ p_stemmer.stem(i) for i in stoped_tokens]
    
    texts.append(stemmed_tokens)
    
    # turn our tokenized document into a id ---> term dictonary
    
dictionary = corpora.Dictionary(texts)
    # print(dictionary)
    
    #convert tokenised document into a document-term matrix
    
corpus = [dictionary.doc2bow(text) for text in texts]
    
ldamodel =  gensim.models.ldamodel.LdaModel(corpus,num_topics = 3,id2word = dictionary,passes = 20)
    
    

['lihaaf', 'is', 'a', 'text', 'that', 'challenges', 'some', 'of', 'the', 'key', 'tenets', 'of', 'a', 'certain', 'kind', 'of', 'feminism', 'what', 'are', 'we', 'to', 'make', 'of', 'the', 'begum', 's', 'transformation', 'into', 'a', 'sexual', 'predator', 'are', 'we', 'to', 'see', 'her', 'de', 'formation', 'as', 'itself', 'a', 'response', 'to', 'her', 'patriarchy', 'domination', 'by', 'the', 'nawab', 'and', 'by', 'her', 'immediate', 'hyper', 'conservative', 'milieu', 'or', 'would', 'we', 'say', 'since', 'we', 'want', 'to', 'called', 'her', 'agential', 'that', 'she', 's', 'a', 'hero', 'and', 'villain', 'of', 'her', 'own', 'making', 'begum', 'jan', 'is', 'more', 'complex', 'than', 'a', 'simple', 'victim', 'of', 'patriarchy', 'women', 'back', 'in', '1900', 's', 'married', 'in', 'their', 'teen', 's', 'and', 'early', 'twenties', 'they', 'have', 'too', 'little', 'even', 'no', 'rights', 'and', 'largely', 'depend', 'on', 'their', 'fathers', 'husbands', 'or', 'sons', 'in', 'this', 'short', 'and', 

In [31]:
 print(ldamodel.print_topics(num_topics=1, num_words=4))

[(1, '0.006*"begum" + 0.006*"s" + 0.006*"husband" + 0.006*"rabbu"')]


# Method II


In [22]:
import pandas as pd

data = pd.read_csv("C:/Users/MikeEd/Downloads/Compressed/abcnews-date-text.csv", error_bad_lines= False);
data_text = data[['headline_text']]
data_text['index']= data_text.index
documents = data_text

Take a peek of the data

In [23]:
print(len(documents))
print(documents[:5])

1186018
                                       headline_text  index
0  aba decides against community broadcasting lic...      0
1     act fire witnesses must be aware of defamation      1
2     a g calls for infrastructure protection summit      2
3           air nz staff in aust strike for pay rise      3
4      air nz strike to affect australian travellers      4


Data Pre-processing

In [24]:
import gensim
from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import STOPWORDS
from nltk.stem import WordNetLemmatizer, SnowballStemmer
stemmer = SnowballStemmer('english')
from nltk.stem.porter import *
import numpy as np
np.random.seed(2018)
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\MikeEd\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

Lemmatize and stem preprocessing Function

In [25]:
def lemmatize_stemming(text):
    return stemmer.stem(WordNetLemmatizer().lemmatize(text, pos='v'))
def preprocess(text):
    result = []
    for token in gensim.utils.simple_preprocess(text):
        if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3:
            result.append(lemmatize_stemming(token))
    return result

Selcet a document to preview after preprocessing

In [26]:
doc_sample = documents[documents['index'] == 4310].values[0][0]
print('original document: ')
words = []
for word in doc_sample.split(' '):
    words.append(word)
print(words)
print('\n\n tokenized and lemmatized document: ')
print(preprocess(doc_sample))

original document: 
['ratepayers', 'group', 'wants', 'compulsory', 'local', 'govt', 'voting']


 tokenized and lemmatized document: 
['ratepay', 'group', 'want', 'compulsori', 'local', 'govt', 'vote']


In [27]:
processed_docs = documents['headline_text'].map(preprocess)
processed_docs[:10]

0            [decid, communiti, broadcast, licenc]
1                               [wit, awar, defam]
2           [call, infrastructur, protect, summit]
3                      [staff, aust, strike, rise]
4             [strike, affect, australian, travel]
5               [ambiti, olsson, win, tripl, jump]
6           [antic, delight, record, break, barca]
7    [aussi, qualifi, stosur, wast, memphi, match]
8            [aust, address, secur, council, iraq]
9                         [australia, lock, timet]
Name: headline_text, dtype: object

Bag of Words on the Data set

create a dictionary from 'processed_docs' containing the number of times a word appears in the training set.

In [29]:
help(gensim.corpora.Dictionary())

Help on Dictionary in module gensim.corpora.dictionary object:

class Dictionary(gensim.utils.SaveLoad, collections.abc.Mapping)
 |  Dictionary encapsulates the mapping between normalized words and their integer ids.
 |  
 |  Notable instance attributes:
 |  
 |  Attributes
 |  ----------
 |  token2id : dict of (str, int)
 |      token -> tokenId.
 |  id2token : dict of (int, str)
 |      Reverse mapping for token2id, initialized in a lazy manner to save memory (not created until needed).
 |  cfs : dict of (int, int)
 |      Collection frequencies: token_id -> how many instances of this token are contained in the documents.
 |  dfs : dict of (int, int)
 |      Document frequencies: token_id -> how many documents contain this token.
 |  num_docs : int
 |      Number of documents processed.
 |  num_pos : int
 |      Total number of corpus positions (number of processed words).
 |  num_nnz : int
 |      Total number of non-zeroes in the BOW matrix (sum of the number of unique
 |      word

In [30]:
dictionary = gensim.corpora.Dictionary(processed_docs)
count = 0
for k,v in dictionary.iteritems():
    print(k,v)
    count = count +1
    if count >10:
        break

0 broadcast
1 communiti
2 decid
3 licenc
4 awar
5 defam
6 wit
7 call
8 infrastructur
9 protect
10 summit


In [31]:
dictionary.filter_extremes(no_below=15, no_above=0.5, keep_n=100000)

Bag of Words on the Data set
Create a dictionary from ‘processed_docs’ containing the number of times a word appears in the training set.

In [32]:
bow_corpus = [dictionary.doc2bow(doc) for doc in processed_docs]
bow_corpus[4310]

[(162, 1), (240, 1), (292, 1), (589, 1), (838, 1), (3567, 1), (3568, 1)]

In [33]:
bow_doc_4310 = bow_corpus[4310]
for i in range(len(bow_doc_4310)):
    print("Word {} (\"{}\") appears {} time.".format(bow_doc_4310[i][0], 
                                               dictionary[bow_doc_4310[i][0]], 
bow_doc_4310[i][1]))

Word 162 ("govt") appears 1 time.
Word 240 ("group") appears 1 time.
Word 292 ("vote") appears 1 time.
Word 589 ("local") appears 1 time.
Word 838 ("want") appears 1 time.
Word 3567 ("compulsori") appears 1 time.
Word 3568 ("ratepay") appears 1 time.


# TF-IDF
Create tf-idf model object using models.TfidfModel on ‘bow_corpus’ and save it to ‘tfidf’, then apply transformation to the entire corpus and call it ‘corpus_tfidf’. Finally we preview TF-IDF scores for our first document.

In [34]:
from gensim import corpora, models
tfidf = models.TfidfModel(bow_corpus)
corpus_tfidf = tfidf[bow_corpus]
from pprint import pprint
for doc in corpus_tfidf:
    pprint(doc)
    break

[(0, 0.5850076620505259),
 (1, 0.38947256567331934),
 (2, 0.4997099083387053),
 (3, 0.5063271308533074)]


#### Running LDA using Bag of Words
Train our lda model using gensim.models.LdaMulticore and save it to ‘lda_model’

In [36]:
lda_model = gensim.models.LdaMulticore(bow_corpus,num_topics=5,id2word = dictionary, passes=2 , workers=2)

In [37]:
for idx, topic in lda_model.print_topics(-1):
    print('Topic: {} \nWords: {}'.format(idx, topic))

Topic: 0 
Words: 0.009*"health" + 0.008*"brisban" + 0.008*"rural" + 0.007*"tasmanian" + 0.007*"tasmania" + 0.007*"nation" + 0.007*"warn" + 0.006*"report" + 0.006*"communiti" + 0.006*"sydney"
Topic: 1 
Words: 0.015*"queensland" + 0.012*"market" + 0.012*"donald" + 0.011*"news" + 0.009*"coast" + 0.009*"dead" + 0.008*"miss" + 0.008*"south" + 0.008*"bushfir" + 0.007*"rise"
Topic: 2 
Words: 0.030*"australia" + 0.024*"australian" + 0.016*"elect" + 0.014*"world" + 0.009*"test" + 0.008*"final" + 0.007*"open" + 0.006*"farm" + 0.006*"win" + 0.005*"year"
Topic: 3 
Words: 0.026*"polic" + 0.015*"charg" + 0.013*"court" + 0.013*"death" + 0.012*"murder" + 0.010*"crash" + 0.010*"woman" + 0.009*"face" + 0.009*"die" + 0.009*"alleg"
Topic: 4 
Words: 0.019*"trump" + 0.013*"say" + 0.012*"govern" + 0.010*"chang" + 0.008*"school" + 0.008*"live" + 0.006*"countri" + 0.006*"fund" + 0.006*"plan" + 0.006*"power"


##### Running LDA using TF-IDF

In [39]:
lda_model_tfidf = gensim.models.LdaMulticore(corpus_tfidf, num_topics=10, id2word=dictionary, passes=2, workers=4)
for idx, topic in lda_model_tfidf.print_topics(-1):
    print('Topic: {} Word: {}'.format(idx, topic))Classification of the topics
Performance evaluation by classifying sample document using LDA Bag of Words mode

Topic: 0 Word: 0.010*"govern" + 0.008*"monday" + 0.007*"michael" + 0.007*"royal" + 0.007*"david" + 0.006*"commiss" + 0.006*"sport" + 0.006*"budget" + 0.005*"histori" + 0.005*"abbott"
Topic: 1 Word: 0.019*"live" + 0.008*"cattl" + 0.008*"week" + 0.008*"search" + 0.007*"septemb" + 0.007*"miss" + 0.007*"june" + 0.006*"plane" + 0.006*"cancer" + 0.006*"anim"
Topic: 2 Word: 0.026*"trump" + 0.020*"news" + 0.017*"market" + 0.016*"rural" + 0.008*"share" + 0.007*"nation" + 0.007*"price" + 0.007*"rise" + 0.007*"dollar" + 0.007*"australian"
Topic: 3 Word: 0.010*"hobart" + 0.009*"scott" + 0.008*"sexual" + 0.008*"street" + 0.007*"island" + 0.006*"tree" + 0.006*"islam" + 0.006*"right" + 0.006*"human" + 0.006*"music"
Topic: 4 Word: 0.015*"interview" + 0.008*"morrison" + 0.006*"novemb" + 0.006*"marriag" + 0.006*"syria" + 0.006*"extend" + 0.006*"kill" + 0.005*"facebook" + 0.005*"hong" + 0.005*"kong"
Topic: 5 Word: 0.018*"south" + 0.015*"north" + 0.011*"west" + 0.010*"coast" + 0.009*"queensland" + 0.009*"

#### Classification of the topics
Performance evaluation by classifying sample document using LDA Bag of Words mode

In [40]:
processed_docs[4310]

['ratepay', 'group', 'want', 'compulsori', 'local', 'govt', 'vote']

In [42]:
for index, score in sorted(lda_model[bow_corpus[4310]], key=lambda tup: -1*tup[1]):
    print("\nScore: {}\t \nTopic: {}".format(score, lda_model.print_topic(index, 10)))


Score: 0.6430763602256775	 
Topic: 0.019*"trump" + 0.013*"say" + 0.012*"govern" + 0.010*"chang" + 0.008*"school" + 0.008*"live" + 0.006*"countri" + 0.006*"fund" + 0.006*"plan" + 0.006*"power"

Score: 0.15764841437339783	 
Topic: 0.030*"australia" + 0.024*"australian" + 0.016*"elect" + 0.014*"world" + 0.009*"test" + 0.008*"final" + 0.007*"open" + 0.006*"farm" + 0.006*"win" + 0.005*"year"

Score: 0.14912059903144836	 
Topic: 0.009*"health" + 0.008*"brisban" + 0.008*"rural" + 0.007*"tasmanian" + 0.007*"tasmania" + 0.007*"nation" + 0.007*"warn" + 0.006*"report" + 0.006*"communiti" + 0.006*"sydney"

Score: 0.025121551007032394	 
Topic: 0.015*"queensland" + 0.012*"market" + 0.012*"donald" + 0.011*"news" + 0.009*"coast" + 0.009*"dead" + 0.008*"miss" + 0.008*"south" + 0.008*"bushfir" + 0.007*"rise"

Score: 0.025033066049218178	 
Topic: 0.026*"polic" + 0.015*"charg" + 0.013*"court" + 0.013*"death" + 0.012*"murder" + 0.010*"crash" + 0.010*"woman" + 0.009*"face" + 0.009*"die" + 0.009*"alleg"


####  
Performance evaluation by classifying sample document using LDA TF-IDF model

In [43]:
for index, score in sorted(lda_model_tfidf[bow_corpus[4310]], key=lambda tup: -1*tup[1]):
    print("\nScore: {}\t \nTopic: {}".format(score, lda_model_tfidf.print_topic(index, 10)))


Score: 0.8874712586402893	 
Topic: 0.010*"govern" + 0.008*"monday" + 0.007*"michael" + 0.007*"royal" + 0.007*"david" + 0.006*"commiss" + 0.006*"sport" + 0.006*"budget" + 0.005*"histori" + 0.005*"abbott"

Score: 0.012506821192800999	 
Topic: 0.014*"elect" + 0.008*"labor" + 0.007*"liber" + 0.006*"parti" + 0.006*"say" + 0.006*"senat" + 0.006*"govern" + 0.006*"andrew" + 0.005*"christma" + 0.005*"jam"

Score: 0.012503585778176785	 
Topic: 0.017*"countri" + 0.013*"donald" + 0.012*"hour" + 0.009*"stori" + 0.007*"wednesday" + 0.007*"care" + 0.006*"climat" + 0.006*"chang" + 0.005*"farmer" + 0.005*"water"

Score: 0.012503149919211864	 
Topic: 0.026*"trump" + 0.020*"news" + 0.017*"market" + 0.016*"rural" + 0.008*"share" + 0.007*"nation" + 0.007*"price" + 0.007*"rise" + 0.007*"dollar" + 0.007*"australian"

Score: 0.012503040954470634	 
Topic: 0.010*"hobart" + 0.009*"scott" + 0.008*"sexual" + 0.008*"street" + 0.007*"island" + 0.006*"tree" + 0.006*"islam" + 0.006*"right" + 0.006*"human" + 0.006*"mu

In [44]:
unseen_document = 'How a Pentagon deal became an identity crisis for Google'
bow_vector = dictionary.doc2bow(preprocess(unseen_document))

for index, score in sorted(lda_model[bow_vector], key=lambda tup: -1*tup[1]):
    print("Score: {}\t Topic: {}".format(score, lda_model.print_topic(index, 5)))

Score: 0.36694255471229553	 Topic: 0.009*"health" + 0.008*"brisban" + 0.008*"rural" + 0.007*"tasmanian" + 0.007*"tasmania"
Score: 0.2000488042831421	 Topic: 0.026*"polic" + 0.015*"charg" + 0.013*"court" + 0.013*"death" + 0.012*"murder"
Score: 0.19951216876506805	 Topic: 0.019*"trump" + 0.013*"say" + 0.012*"govern" + 0.010*"chang" + 0.008*"school"
Score: 0.1994728296995163	 Topic: 0.015*"queensland" + 0.012*"market" + 0.012*"donald" + 0.011*"news" + 0.009*"coast"
Score: 0.03402366116642952	 Topic: 0.030*"australia" + 0.024*"australian" + 0.016*"elect" + 0.014*"world" + 0.009*"test"
