In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

In [2]:
df = pd.read_csv('moview_reviews.txt')

In [3]:
count = CountVectorizer(stop_words='english',max_df=.1,max_features=5000)

In [4]:
X = count.fit_transform(df['review'].values)

In [5]:
lda = LatentDirichletAllocation(n_components=10, random_state=123,learning_method='batch')

In [18]:
X_topics = lda.fit_transform(X)

In [9]:
n_top_words = 5

In [12]:
feature_names = count.get_feature_names()

In [20]:
for topic_idx, topic in enumerate(lda.components_):
    print("Topic %d:" % (topic_idx + 1))
    print(" ".join([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]]))

Topic 1:
worst minutes script awful stupid
Topic 2:
family mother father children girl
Topic 3:
war american dvd history german
Topic 4:
human audience cinema art feel
Topic 5:
police dead murder car guy
Topic 6:
horror house sex girl woman
Topic 7:
role performance comedy actor performances
Topic 8:
series episode episodes tv season
Topic 9:
book version original effects read
Topic 10:
action guy fight guys hero


### Based on reading the five most important words for each topic, we may guess that the LDA identified the following topics:
    1. Generally bad movies (not really a topic category)
    2. Movies about families
    3. War movies
    4. Art movies
    5. Crime movies
    6. Horror movies
    7. Comedy movies
    8. Movies somehow related to TV shows
    9. Movies based on books
    10. Action movies

#### To confirm that the categories make sense based on the reviews, let's plot three movies from the horror movie category (horror movies belong to category 6 at index position 5):

In [21]:
horror = X_topics[:, 5].argsort()[::-1]
for iter_idx, movie_idx in enumerate(horror[:3]):
    print('\nHorror movie #%d:' % (iter_idx + 1))
    print(df['review'][movie_idx][:300], '...')


Horror movie #1:
<br /><br />Horror movie time, Japanese style. Uzumaki/Spiral was a total freakfest from start to finish. A fun freakfest at that, but at times it was a tad too reliant on kitsch rather than the horror. The story is difficult to summarize succinctly: a carefree, normal teenage girl starts coming fac ...

Horror movie #2:
Before I talk about the ending of this film I will talk about the plot. Some dude named Gerald breaks his engagement to Kitty and runs off to Craven Castle in Scotland. After several months Kitty and her aunt venture off to Scottland. Arriving at Craven Castle Kitty finds that Gerald has aged and he ...

Horror movie #3:
This film marked the end of the "serious" Universal Monsters era (Abbott and Costello meet up with the monsters later in "Abbott and Costello Meet Frankentstein"). It was a somewhat desparate, yet fun attempt to revive the classic monsters of the Wolf Man, Frankenstein's monster, and Dracula one "la ...


we printed the first 300 characters from the top three horror movies, and we can see that the reviews—even though we don't know
which exact movie they belong to—sound like reviews of horror movies (however,
one might argue that Horror movie #2 could also be a good fit for topic category 1:
Generally bad movies)