# Visualizing Text Data

<img src="https://i.imgur.com/H2QBPTl.png" width=700 align="center">

In [1]:
# pip install spacy
import spacy
import pandas as pd 

Tokenization

In [2]:
#python -m spacy download en_core_web_sm in Anaconda cmd
#copy en_core_web_sm from C:\Users\delim\AppData\Local\Continuum\Anaconda3\Lib\site-packages
#to C:\Users\delim\AppData\Local\Continuum\Anaconda3\Anaconda3\Lib\site-packages\spacy\data\
spacy.load('en_core_web_sm')  
from spacy.lang.en import English
parser = English()
def tokenize(text):
    lda_tokens = []
    tokens = parser(text)
    for token in tokens:
        if token.orth_.isspace():
            continue
        elif token.like_url:
            lda_tokens.append('URL')
        elif token.orth_.startswith('@'):
            lda_tokens.append('SCREEN_NAME')
        else:
            lda_tokens.append(token.lower_)
    return lda_tokens

Lemmatization

In [3]:
import nltk
#nltk.download('wordnet')
from nltk.corpus import wordnet as wn
def get_lemma(word):
    lemma = wn.morphy(word)
    if lemma is None:
        return word
    else:
        return lemma
    
from nltk.stem.wordnet import WordNetLemmatizer
def get_lemma2(word):
    return WordNetLemmatizer().lemmatize(word)

Stopwords 

In [4]:
nltk.download('stopwords')
en_stop = set(nltk.corpus.stopwords.words('english'))

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\delim\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Put them all together - > Preprocessing function

In [5]:
def prepare_text_for_lda(text):
    tokens = tokenize(text)
    tokens = [token for token in tokens if len(token) > 4]
    tokens = [token for token in tokens if token not in en_stop]
    tokens = [get_lemma(token) for token in tokens]
    return tokens

Open csv file

In [6]:
reviews_df = pd.read_csv('all_amazon_reviews.csv')

Read Content

In [7]:
reviews_df = reviews_df['content']
reviews_df.head()

0    I gave this book four stars because I did legi...
1    I don't understand these 1-2 star ratings, I r...
2    I think I would have enjoyed this more as a no...
3    If you liked the harry potter series, you will...
4    It wasn't as good as Rowling's novels, and the...
Name: content, dtype: object

Print first 10 reviews and count the total no. of reviews 

In [8]:
num_lines=0
for line in reviews_df:
        num_lines=num_lines+1
        if num_lines<10:
            print(line)
print("total number of lines=", num_lines)

I gave this book four stars because I did legitimately enjoy reading it, and I would recommend it to other Harry Potter fans (and maybe some non-fans!) to read as well. I won't compare this to the original series, because that'd be like trying to compare anything with perfection in my world. There were some things that did bother me about this book though, which I'll get to in a bit.

I'll start with what I liked:
It has been nine years since the last Harry Potter book came out, in which I stood in line dressed as Luna Lovegood nervously avoiding everything in fear of spoilers, and as a die-hard fan I loved the excuse to return to a world I almost feel that I grew up in. I liked reading this play because I've always been eager for more Harry, and this gave me more Harry.
I actually thought it worked well as a play, too. Plays/scripts will by virtue of their format not be able to offer as much detail and description as a book might, especially when reading a script vs seeing it as it wa

Prepare text for LDA by data preprocessing and then add to a list

In [9]:
import random
text_data = []
for line in reviews_df:
    tokens = prepare_text_for_lda(line)
    if random.random() > .79:
        print(tokens)
        text_data.append(tokens)

['star', 'legitimately', 'enjoy', 'reading', 'would', 'recommend', 'harry', 'potter', 'maybe', 'compare', 'original', 'series', 'try', 'compare', 'anything', 'perfection', 'world', 'things', 'bother', 'though', 'start', 'like', 'years', 'since', 'harry', 'potter', 'stand', 'dress', 'lovegood', 'nervously', 'avoid', 'everything', 'spoiler', 'love', 'excuse', 'return', 'world', 'almost', 'like', 'reading', 'always', 'eager', 'harry', 'harry', 'actually', 'thought', 'work', 'play', 'script', 'virtue', 'format', 'offer', 'detail', 'description', 'might', 'especially', 'reading', 'script', 'seeing', 'mean', 'given', 'expect', 'appreciate', 'description', 'include', 'enough', 'smile', 'along', 'favorite', 'character', 'scorpius', 'definitely', 'cry', 'things', 'grain', 'things', 'thing', 'overview', 'anyway', 'point', 'given', 'story', 'previous', 'canon', 'detail', 'spoiler', 'trolley', 'witch', 'certain', 'office', 'seem', 'major', 'relationship', 'seem', 'fetch', 'consider', 'reality', 'u

['script', 'novel', 'would', 'better', 'however', 'script', 'enough', 'story', 'harry', 'potter', 'universe']
['enjoy', 'original', 'book', 'something', 'unbridle', 'expectation', 'original', 'include', 'world', 'description', 'satisfy', 'details', 'hungry', 'teenage', 'angst', 'relevant', 'topic', 'child', 'famous', 'parent', 'little', 'swallow', 'suggest', 'temper', 'hope']
['really', 'harry', 'potter', 'sequel', 'adult', 'version', 'harry', 'potter', 'hole', 'story', 'overall', 'would', 'recommend', 'hope', 'though']
['sitting', 'thoroughly', 'enjoy', 'engage', 'story', 'like', 'albus', 'potter', 'enough', 'depth', 'character', 'warrant', 'another', 'seven', 'series', 'think', 'rough', 'edge', 'reading', 'script', 'would', 'favorite', 'almost', 'everything', 'could', 'reading', 'script', 'choppy', 'without', 'visuals', 'music', 'character', 'scope', 'background', 'information', 'explore', 'feelings', 'usual', 'method', 'utilize', 'novel', 'flesh', 'character', 'evoke', 'emotion', 's

['anything', 'write', 'harry', 'potter', 'world', 'bound', 'bring', 'nostalgia', 'harry', 'potter', 'curse', 'child', 'certainly', 'favorite', 'character', 'appearance', 'reunion', 'episode', 'might', 'watch', 'favorite', 'joke', 'line', 'reappear', 'reminisce', 'times', 'magic', 'original', 'miss', 'still', 'enjoy', 'story', 'love', 'revisit', 'world']
['think', 'feeling', 'better', 'stage', 'believe', 'rowling', 'chosen', 'write', 'would', 'getting', 'better', 'review', 'rowling', 'writing', 'allow', 'exist', 'around', 'original', 'seven', 'book', 'instead', 'everything', 'strip', 'actor', 'anything', 'rowling', 'could', 'still', 'create', 'excite', 'frighten', 'compel', 'encourage', 'worth', 'though', 'never', 'travel', 'london', 'actually', 'could', 'never', 'afford', 'rowling', 'short', 'spoiler', 'anyone', 'disappoint', 'homoerotic', 'undertone', 'scorpius', 'albus', 'develop']
['review', 'mention', 'format', 'letdown', 'nothing', 'letdown', 'buying', 'script', 'hesitant', 'first

['amaze', 'like', 'everything', 'recommend', 'reading', 'choose', 'rating', 'deserve']
['years', 'rush', 'friendship', 'develop', 'recommend', 'hairy', 'potter', 'closing', 'stretching', 'little', 'giving', 'little', 'information', 'could', 'book', 'instead', 'rushing']
['await', 'chapter', 'harry', 'potter', 'disappoint', 'interest', 'reading', 'little', 'format', 'love']
['harry', 'potter', 'universe', 'pretty', 'anything', 'stamp', 'reading', 'review', 'doubt', 'script', 'cearly', 'working', 'script', 'stage', 'direction', 'dialog', 'write', 'reunion', 'episode', 'favorite', 'sitcom', 'family', 'people', 'universe', 'years', 'future', 'cameo', 'dialog', 'better', 'event', 'years', 'flaw', 'harry', 'potter', 'casual']
['immediately', 'transport', 'wizard', 'world', 'travel', 'handle', 'hypothetical', 'series', 'stage', 'direction', 'worth', 'read']
['really', 'enjoy', 'partly', 'think', 'help', 'expectation', 'think', 'great', 'building', 'original', 'story', 'blending', 'original', 

['great', 'story', 'really', 'write']
['fanfic', 'script', 'enjoy', 'fanfic', 'enjoy', 'reading', 'course', 'expect', 'original', 'harry', 'potter', 'book', 'probably', 'disappoint', 'hogwarts', 'reunite', 'character']
['wonderful', 'book!!1']
['surprise', 'developing', 'relationship', 'grip', 'story', 'harry', 'ginny', 'hermione', 'draco', 'parent']
['star', '......', 'enjoy', 'storyline', 'dialog', 'albus', 'scorpious', 'unrealistic', 'complaint', 'albus', 'whiney', 'teenage', 'angst', 'people', 'remember', 'harry', 'order', 'phoenix', 'could', 'barely', 'stand', 'character', 'remain', 'turn', 'brainless', 'buffoon', 'rowling', 'would', 'surprise', 'story', 'premise', 'approval', 'script', 'take', 'adjust', 'potterheads']
['story', 'accurately', 'refer', 'earlier', 'book', 'thickening', 'bringing', 'excite', 'found', 'memory', 'delightful']
['different', 'reading', 'script', 'instead', 'story', 'write', 'enjoy', 'harry', 'potter', 'series']
[]
['take', 'character', 'pull', 'immediate

['purchase', 'really', 'enjoy']
['reading', 'childhood', 'always', 'wonder', 'parent', 'really', 'family', 'would', 'longer', 'really', 'great', 'screenplay']
['start']
['daughter', 'want', 'cheap', 'price']
['granddaughter', 'enjoy']
['bring', 'teenage', 'years', 'original', 'amaze', 'great', 'hero', 'years', 'character', 'absolutely', 'amaze']
['love', 'better', 'thought', 'would']
['story', 'compare', 'original', 'series', 'little', 'lack', 'write']
['great', 'product', 'thank']
['great', 'need', 'longer', 'going', 'story']
['love', 'classic', 'rowling', 'screenplay', 'format']
['folks', 'struggle', 'parent', 'folks', 'struggle', 'parent', 'never', 'example', 'parent', 'darkness', 'fate', 'change']
['great', 'twist', 'spend', 'entire', 'reading', 'mental', 'movie', 'wishing', 'already', 'movie']
['rowling', 'series', 'definitely', 'enjoy', 'story', 'turn', 'reading', 'unnoticed', 'expert', 'prose', 'vibrant', 'imagery', 'character', 'development', 'could', 'stand', 'little', 'develo

['love']
['story', 'since', 'script', 'narrative', 'totally', 'different', 'book', 'harry', 'potter', 'series', 'however', 'harry', 'potter']
['really', 'enjoy', 'write', 'format', 'reading', 'style', 'writing', 'harry', 'friend', 'normally', 'child', 'write', 'completely', 'opposite', 'parent', 'great', 'reading', 'story', 'seeing', 'change', 'hermione', 'draco', 'parent', 'surprise', 'found', 'slack', 'jaw', 'turning', 'page', 'alternate', 'reality', 'reason', 'star', 'parts', 'become', 'tedious']
['love', 'reading']
['interest', 'would', 'probably', 'novel', 'would', 'detail', 'however', 'still']
['describe']
['write', 'excellent', 'story', 'except', 'harry', 'potter', 'story', 'rowling', 'milk', 'success', 'harry', 'potter', 'series', 'always', 'thought', 'harry', 'potter', 'harry', 'curse', 'child', 'impression']
['enjoy', 'harry', 'potter', 'series', 'disappoint', 'write', 'great', 'addition', 'story']
['another', 'great', 'rowling', 'another', 'chapter', 'harry', 'potter', 'seri

['great', 'could']
['review', 'claim', 'presentation', 'stilted', 'dimensional', 'guessing', 'reader', 'little', 'exposure', 'script', 'accustom', 'visualize', 'stage', 'action', 'found', 'limited', 'stage', 'direction', 'delightful', 'allow', 'imagine', 'creative', 'light', 'stage', 'interpret', 'another', 'complaint', 'hear', 'harry', 'action', 'character', 'seeing', 'harry', 'father', 'first', 'since', 'accompany', 'young', 'albus', 'crossing', 'epilogue', 'deathly', 'hallow', 'although', 'grow', 'child', 'harry', 'clumsy', 'fatherhood', 'relationship', 'something', 'believe', 'parent', 'relate', 'times']
['always', 'thought', 'really', 'clever', 'update', 'promo', 'stage', 'right']
['great', 'arrive', 'damage', 'paperback', 'cover', 'shipping']
['buy', 'yesterday', 'finish', 'today', 'excitement', 'character', 'wonderful', 'highly', 'recommend', 'harry', 'potter']
['worth', 'suspense', 'encourage']
['happy', 'harry', 'potter', 'recommend', 'enjoy', 'series']
['love', 'great', 'mess

['great']
['story', 'confuse', 'parts', 'potter']
['workings']
['sequel', 'first']
['enjoy', 'script', 'require', 'particularly', 'enjoy', 'ramification', 'travel', 'think', 'agsin']
['sister', 'ask', 'thought', 'spellbind', 'pull', 'quickly', 'sitting', 'equate', 'depth', 'delight', 'reading', 'previous', 'book', 'anyone', 'like', 'harry', 'potter', 'hinder', 'course', 'format', 'script', 'magic', 'novel', 'miss', 'limited', 'dialogue', 'stage', 'direction', 'means', 'story', 'enchant', 'lack', 'body', 'delight', 'novel', 'setting', 'description', 'sparse', 'truly', 'character', 'without', 'watch', 'actor', 'stage', 'nuance', 'would', 'given', 'introspection', 'narration', 'however', 'truly', 'love', 'glimpse', 'future', 'beloved', 'harry', 'ginny', 'hermione', 'seeing', 'navigate', 'adult', 'responsibility', 'parenthood', 'glimpse', 'beloved', 'character', 'mcgonagall', 'draco', 'finding', 'indeed', 'already', 'whole', 'still', 'frosting', 'enjoy']
['another', 'great']
['great', 'har

['waste', 'money']
['boring']
['really', 'dislike', 'harry', 'middle', 'story', 'love', 'story', 'change', 'image', 'forever', 'jkrowling', 'better']
['surface', 'level', 'enjoy', 'latest', 'harry', 'potter', 'enough', 'forward', 'momentum', 'move', 'along', 'child', 'character', 'actually', 'really', 'excuse', 'harry', 'potter', 'world', 'things', 'basic', 'basic', 'characterization', 'harry', 'hermione', 'draco', 'could', 'describe', 'spotty', 'overall', 'wrong', 'stage', 'direction', 'would', 'things', 'building', 'tension', 'understanding', 'relationship', 'little', 'better', 'language', 'need', 'actual', 'novel', 'draw', 'concept', 'emotion', 'match', 'serious', 'later', 'harry', 'potter', 'book', 'try', 'translate', 'spoiler', 'complaint', 'bring', 'descendant', 'voldemort', 'potential', 'interest', 'story', 'perceive', 'victim', 'maybe', 'could', 'dramatic', 'tension', 'actually', 'reducing', 'hint', 'drop', 'conceive', 'travel', 'reason', 'could', 'happen', 'allow', 'conflict',

['first', 'write', 'rowling', 'dialogue', 'feel', 'really', 'stilted', 'unnatural', 'show', 'promise', 'wonder', 'would', 'better', 'remember', 'draft', 'final', 'script', 'really', 'rowling', 'would', 'write', 'novel', 'version', 'would', 'great', 'stand', 'reading', 'things', 'format', 'purchase', 'different', 'writing', 'medium']
['really', 'character', 'original', 'charm', 'story', 'silly', 'seem', 'rowling', 'feedback', 'remind', 'playwright', 'choice', 'quote', 'quite', 'disappoint']
['story']
['horrible', 'horrible', 'stand', 'expectation', 'harry', 'potter', 'book', 'quick', 'seem', 'vividness', 'descriptive', 'nature', 'rowling', 'found', 'prior', 'harry', 'book', 'miss', 'interest', 'upcoming', 'tale', 'hogwarts', 'turn', 'reading', 'definitely', 'somewhat', 'disappointment', 'regard', 'prior', 'book', 'series']
['expect', 'script', 'format']
['realize', 'scrpt', 'rather', 'novel', 'would', 'buy', 'slog', 'script', 'write', 'hear', 'stage', 'effects', 'rawlings', 'instead', '

['want', 'thought', 'great', 'book', 'great', 'start', 'think', 'finish', 'realize', 'reread', 'enjoy', 'excite', 'friend', 'rowland', 'canon', 'harry', 'playwright', 'ruin', 'harry', 'ruin', 'enough', 'deserve', 'better', 'however', 'like', 'draco', 'bring', 'darkness', 'favorite', 'things', 'someday', 'broadway', 'write', 'three', 'character', 'poorly', 'write', 'bastardize']
['words', 'describe', 'disappointment', 'solid', 'character', 'scorpius', 'malfoy', 'insult', 'rowling']
['harry', 'potter', 'book', 'least', 'skip', 'book', 'start', 'epilogue', 'probably', 'copy', 'already', 'married', 'problem', 'stupid', 'contrive', 'everyone', 'character', 'harry', 'hermione', 'caricature', 'childhood', 'self', 'harry', 'hermione', 'really', 'position', 'years', 'bumble', 'idiot', 'hate', 'seeing', 'reduce', 'boring', 'people', 'character', 'albus', 'scorpius', 'whiner', 'really', 'would', 'great', 'would', 'continue', 'story', 'character', 'really', 'pretty', 'exclude', 'together', 'concen

['honestly', 'harry', 'potter', 'franchise', 'whole', 'spoiler', 'whole', 'voldemort', 'daughter', 'bellatrix', 'mother', 'remind', 'fanfictions', 'people', 'writing', 'years', 'anything', 'delphi', 'character', 'really', 'write', 'enjoy', 'novel', 'still', 'little', 'disappoint']
['difficult', 'follow', 'format', 'series', 'harry', 'married', 'still', 'pretty', 'personality']
['giving', 'star', 'think', 'could', 'something', 'harry', 'potter', 'start', 'aware', 'script', 'write', 'entirely', 'rowling', 'however', 'miss', 'whole', 'thing', 'miss', 'beautiful', 'details', 'bring', 'world', 'poorly', 'write', 'times', 'painful', 'force', 'finish', 'coming', 'total', 'potterhead', 'make', 'sorry', 'spend', 'money']
['enjoy', 'certainly', 'rowling', 'actually', 'write', 'break', 'rule', 'establish', 'universe', 'sometimes', 'character', 'behave', 'feel', 'natural', 'personal', 'influence', 'character', 'downplay']
['reading', 'version', 'harry', 'potter', 'enjoy', 'image', 'potter', 'chara

['definitely', 'writing', 'style']
['laugh', 'guarantee', 'genuine', 'mirth', 'growing', 'sense', 'horror', 'perhaps', 'lifelong', 'series', 'probably', 'quality', 'hilarity', 'character', 'taking', 'thing', 'maybe', 'golden']
['play/', 'script']
['write']
[]
['seem', 'force', 'formulaic', 'heart', 'potter', 'offering']
['definitely', 'nostalgia', 'enjoy', 'direction', 'character', 'still', 'worth', 'kindle', 'price']
['disappoint', 'script', 'lack', 'moodiness', 'exposition', 'ambience', 'book', 'immersive', 'experience', 'forever', 'finish', 'bore']
['first', 'reading', 'always', 'reading', 'novel', 'story', 'voice', 'author', 'voice', 'rowlings']
['2.5hours', 'finish', 'reading', '2.5hours', 'rowling', 'character', 'foreign', 'apart', 'generation', 'reading', 'harry', 'potter', 'honest', 'better', 'fiction', 'decide', 'worth']
['enjoy', 'original', 'book', 'still', 'intrigue', 'sitting']
['apparently', 'great', 'stage', 'rowling', 'whoever', 'write', 'write', 'play', 'direction', 'm

['rowling', 'trash', 'legacy', 'deviate', 'character', 'thinking', 'would', 'never', 'would', 'never', 'track', 'character', 'book', 'disappoint', 'script', 'firstly', 'actually', 'novel', 'blame', 'format', 'departure', 'quality', 'clearly', 'rowling', 'favor', 'read', 'fiction', 'start', 'finish', 'character', 'change', 'character', 'concept', 'scream', 'fiction', 'fiction', 'point', 'story', 'universe', 'uncomfortable', 'original', 'story', 'struggle', 'action', 'character', 'feel', 'retread', 'material', 'flashback', 'theme', 'anything', 'harry', 'potter', 'universe', 'least', 'nothing', 'could', 'live', 'without', 'brief', 'moment', 'wizarding', 'world', 'sadly', 'harry', 'potter', 'story', 'strongly', 'recommend', 'memory', 'book']
['great', 'anyone', 'wishing', 'revisit', 'amaze', 'world', 'harry', 'potter', 'however', 'short', 'hours', 'writing', 'times', 'understand', 'write']
['voldemort', 'bellatrix', 'never', 'eleven', 'remotely', 'physical', 'relationship', 'story', 'aside

['buy', 'daughter', 'desperately', 'want', 'rowling', 'book', 'could', 'believe', 'disappoint', 'reading', 'hate', 'worst', 'rowling', 'character', 'development', 'story', 'worth', 'paper', 'print', 'shock', 'rowling', 'several', 'copy', 'every', 'potter', 'series', 'potter', 'daughter', 'schedule', 'alaskan', 'cruise', 'want', 'cancel', 'could', 'among', 'first', 'spend', 'convince', 'alaska', 'better', 'choice', 'would', 'return', 'vacation', 'cruise', 'sorry', 'folks', 'disrespect', 'miss']
['taking', 'nowhere', 'water', 'names', 'familiar', 'excitement', 'harry', 'potter', 'wonder', 'pedantic', 'waste', 'money']
['return', 'kindle', 'version', 'importantly', 'character', 'trivialize', 'sully', 'every', 'conceivable', 'happen', 'drawing', 'board', 'write', 'character', 'book', 'think', 'rowlings', 'angry', 'permission', 'writer', 'vapid', 'cardboard', 'screen', 'format', 'first', 'place', 'ready', 'screen', 'version', 'reading', 'swill', 'obviously', 'loyal', 'anger', 'sophomoric', 

['getting']
['horrible', 'story', 'unsure', 'getting', 'predictable', 'parts', 'boring']
['shame', 'rowling', 'absolutely', 'harry', 'potter', 'series', 'harry', 'potter', 'expect', 'anything', 'cheap', 'write', 'typo', 'ignore', 'characteristic', 'character', 'hogwarts', 'house', 'reading', 'thinking', 'future', 'basically', 'future', 'story', 'harry', 'potter', 'waste', 'money', 'reading']
['harry', 'potter', 'could', 'write']
['although', 'great', 'revisit', 'harry', 'storyline', 'interest', 'rowling', 'thriller', 'hours', 'disappoint', 'write', 'format', 'detrimental', 'enjoyment']
['story', 'harry', 'potter', 'character', 'include', 'details', 'supply', 'actor', 'staging', 'story', 'rather', 'spoiler', 'spoiler', 'spoiler', 'dialogue', 'level', 'rowling', 'elements', 'insert', 'order', 'satisfy', 'familiar', 'character', 'particular', 'context', 'extensive', 'teasing', 'affair', 'nowhere', 'either', 'remove', 'numerous', 'character', 'become', 'purely', 'functional', 'somewhat', '

['harry', 'living', 'awful']
['character', 'muddle', 'confuse', 'looking', 'speaking', 'lack', 'writing', 'however', 'interest', 'story', 'book', 'mean', 'stage', 'broadway', 'rowling']
['review', 'originally', 'publish', 'start', 'finish', 'conflict', 'pretty', 'entertain', 'throughout', 'whole', 'thing', 'pretty', 'disappoint', 'underwhelmed', 'overall', 'start', 'love', 'albus', 'scorpius', 'seriously', 'unsubtle', 'devotion', 'finally', 'potter', 'malfoy', 'story', 'waiting', 'seriously', 'though', 'ignore', 'premise', 'story', 'outrageous', 'entertain', 'love', 'easily', 'connection', 'albus', 'scorpius', 'struggle', 'try', 'place', 'school', 'hold', 'legacy', 'father', 'weasley', 'still', 'every', 'universe', 'nothing', 'however', 'found', 'story', 'predictable', 'original', 'character', 'completely', 'character', 'development', 'canon', 'harry', 'potter', 'book', 'consider', 'canon', 'first', 'universe', 'would', 'harry', 'james', 'potter', 'terrible', 'things', 'control', 'durs

['harry', 'potter', 'series', 'book', 'seven', 'series', 'three', 'hogwarts', 'library', 'complimentary', 'popular', 'reason', 'flavor', 'trend', 'setter', 'rather', 'followers', 'timeless', 'engage', 'write', 'mystery', 'novel', 'interest', 'smart', 'movie', 'adaptation', 'movie', 'quite', 'opposite', 'people', 'familiar', 'series', 'certainly', 'miss', 'harry', 'potter', 'curse', 'child', 'refer', 'eight', 'story', 'certainly', 'since', 'rehearse', 'script', 'understandable', 'expect', 'problem', 'problem', 'everything', 'unlike', 'book', 'story', 'derivative', 'predictable', 'indulgent', 'character', 'original', 'book', 'since', 'personality', 'certainly', 'smart', 'writing', 'bite', 'mystery', 'sense', 'adventure', 'anything', 'close', 'actual', 'lesson', 'replace', 'childish', 'attempt', 'border', 'humorless', 'parody', 'story', 'fanfiction', 'falls', 'notable', 'effort', 'avoid', 'example', 'picture', 'significant', 'shopping', 'reply', 'lapse', 'logic', 'offer', 'solution', 'pro

['start', 'harry', 'potter', 'since', 'first', 'book', 'times', 'count', 'queen', 'however', 'terrible', 'refuse', 'acknowledge', 'wording', 'world', 'basically', 'horribly', 'write', 'fiction', 'publish', 'potterheads', 'worth']
['someone', 'book', 'times', 'heartbreaking', 'expect', 'window', 'harry', 'adult', 'world', 'selfish', 'cowardly', 'shell', 'harry', 'love', 'dearly', 'hermione', 'still', 'longer', 'clever', 'motherly', 'still', 'stupid', 'drunkard', 'happen', 'answer', 'homie', 'write', 'serious', 'garbage', 'clarify', 'study', 'theatre', 'years', 'countless', 'play', 'script', 'format', 'hinder', 'ability', 'understand', 'visualize', 'story', 'complexity', 'almost', 'complexity', 'character', 'speak', 'eloquently', 'meaningfully', 'spoke', 'texting', 'friend', 'stream', 'author', 'consciousness', 'found', 'twist', 'turn', 'surprise', 'pollute', 'every', 'muddy', 'message--', 'idea', 'author', 'intentional', 'construction', 'single', 'central', 'theme', 'would', 'pinpoint',

['excite', 'disappoint', 'script', 'whatever', 'script', 'buy', 'thought', 'would', 'contain', 'better', 'story', 'rubbish', 'lack', 'fantastic', 'adventure', 'book', 'worth', 'money', 'especially', 'reading', 'script']
['review', 'thinking', 'middle', 'imagine', 'potter', 'fanatic', 'family', 'matter', 'format', 'make', 'different', 'narrative', 'novel', 'enjoy', 'story', 'revisit', 'favorite', 'world', 'surprise', 'really', 'getting', 'revisit', 'scene', 'spoiler', 'deal', 'travel', 'funny', 'ginny', 'excellent', 'harry', 'typical', 'neurotic', 'harry', 'minister', 'magic', 'surprise', 'like', 'wrapping', 'around', 'going', 'stage', 'short', 'hours', 'greatest', 'definitely', 'enjoyable']
['fanfic']
['really', 'enjoy', 'certiain', 'charecters', 'include', '-albus', '-snape', '-scorpius', '-delphi', 'flip', 'include', 'derange', 'hermione', 'stupid', 'harry', 'split', 'frendships', 'never', 'live', 'first', 'book', 'super', 'finish', 'thought', 'worst', 'fanfiction', 'thing', 'trolley

['kinda', 'never', 'happen']
['absolutely', 'adore', 'harry', 'potter', 'series', 'excite', 'however', 'disappoint', 'become', 'typically', 'play', 'expect', 'revisit', 'character', 'price', 'ridiculous', 'consider', 'falls', 'short', 'honoring', 'timeless', 'series', 'precede', 'enough', 'alone', 'ending', 'satisfy', 'stop']
['instant', 'classic', 'great', 'story', 'incredible', 'rain', 'really', 'pour', 'promise', 'stick']
['harry', 'potter', 'reread', 'book', 'twice', 'excite', 'installment', 'story', 'depth', 'story', 'details', 'moment', 'getting', 'spend', 'favorite', 'character', 'however', 'finishing', 'substance', 'probably', 'partially', 'story', 'could', 'develop', 'better', 'potential', 'great', 'develop', 'fully', 'lack', 'feeling', 'reading', 'harry', 'potter', 'guess', 'thing', 'story', 'character', 'glimpse', 'world']
['suck']
['maybe', 'traditional', 'novel', 'make', 'story', 'depth', 'original', 'book', 'story', 'dialogue', 'make', 'quick', 'feeling', 'great', 'satisf

['horrible', 'gratuitous', 'story', 'happen']
['disappoint', 'sound', 'writing', 'style', 'fiction']
['script', 'emotional', 'roller', 'coaster', 'hours', 'going', 'break', 'without', 'try', 'reveal', 'nostalgia', 'particular', 'harry', 'share', 'first', 'piece', 'wizarding', 'wisdom', 'bestow', 'child', 'king', 'cross', 'station', 'moment', 'love', 'favorite', 'character', 'grow', 'harry', 'grappling', 'parent', 'parent', 'model', 'realistic', 'opinion', 'hermione', 'title', 'nothing', 'expect', 'james', 'potter', 'every', 'namesake', 'overall', 'decent', 'enjoy', 'despite', 'flaw', 'timeline', 'try', 'imagine', 'story', 'follow', 'jumping', 'forward', 'right', 'attempging', 'imagine', 'stage', 'impossible', 'hole', 'everywhere', 'twist', 'unbelievable', 'literally', 'believe', 'bogus', 'spoiler', 'character', 'interaction', 'awful', 'people', 'anymore', 'hermione', 'suck', 'harry', 'disrespectful', 'inconsiderate', 'everyone', 'ginny', 'health', 'freak', 'drunk', 'mention', 'hermione

['super', 'harry', 'potter', 'found', 'story', 'believe', 'taint', 'first', 'book']
['great', 'difficulty', 'quality', 'writing']
['aware', 'disappoint']
['would', 'release', 'order', 'anxiously', 'wait', 'unfortunately', 'disappointment']
['looking', 'harry', 'potter', 'rowling', 'fiction', 'saying', 'giving']
['traveling', 'stuff', 'would', 'expect', 'great', 'impression', 'disappoint', 'honest']
['poorly', 'write', 'harry', 'potter', 'universe', 'rowling', 'write', 'seriously', 'never', 'release']
['horrible', 'rowling', 'write', 'harry', 'potter', 'book', 'mystery', 'wrap', 'fantasy', 'butterfly', 'effect', 'wrap', 'writing', 'character', 'developement', 'story', 'travel', 'character', 'different', 'timeline', 'waste', 'since', 'different', 'watch', 'overview', 'youtube', 'spend', 'minutes', 'getting', 'disappoint', 'rather', 'reading', 'whole', 'thing', 'getting', 'disappoint']
['literally', 'disappoint']
['expect', 'write', 'spoof', 'harry', 'potter', 'actual', 'harry', 'potter',

TypeError: object of type 'float' has no len()

## LDA with Gensim

First, we are creating a dictionary from the data, then convert to bag-of-words corpus and save the dictionary and corpus for future use.

In [10]:
from gensim import corpora
dictionary = corpora.Dictionary(text_data)
corpus = [dictionary.doc2bow(text) for text in text_data]

In [11]:
# Total no. of tokens
len(dictionary)

4858

In [12]:
corpus

[[(0, 1),
  (1, 1),
  (2, 1),
  (3, 1),
  (4, 1),
  (5, 1),
  (6, 1),
  (7, 1),
  (8, 1),
  (9, 1),
  (10, 1),
  (11, 1),
  (12, 1),
  (13, 1),
  (14, 2),
  (15, 1),
  (16, 1),
  (17, 1),
  (18, 1),
  (19, 2),
  (20, 2),
  (21, 1),
  (22, 1),
  (23, 1),
  (24, 1),
  (25, 1),
  (26, 1),
  (27, 1),
  (28, 1),
  (29, 1),
  (30, 1),
  (31, 1),
  (32, 1),
  (33, 3),
  (34, 1),
  (35, 4),
  (36, 1),
  (37, 1),
  (38, 2),
  (39, 1),
  (40, 1),
  (41, 1),
  (42, 1),
  (43, 1),
  (44, 1),
  (45, 1),
  (46, 1),
  (47, 1),
  (48, 1),
  (49, 1),
  (50, 1),
  (51, 1),
  (52, 1),
  (53, 1),
  (54, 1),
  (55, 1),
  (56, 2),
  (57, 1),
  (58, 1),
  (59, 4),
  (60, 1),
  (61, 1),
  (62, 1),
  (63, 1),
  (64, 2),
  (65, 1),
  (66, 2),
  (67, 1),
  (68, 2),
  (69, 1),
  (70, 1),
  (71, 1),
  (72, 2),
  (73, 1),
  (74, 1),
  (75, 1),
  (76, 1),
  (77, 1),
  (78, 1),
  (79, 3),
  (80, 1),
  (81, 1),
  (82, 1),
  (83, 1),
  (84, 1),
  (85, 1),
  (86, 1),
  (87, 1),
  (88, 1),
  (89, 1),
  (90, 2),
  (91, 1)

In [13]:
# save corpus and dictionary to disk so that we can use it later during visualization
import pickle
pickle.dump(corpus, open('corpus.pkl', 'wb'))
dictionary.save('dictionary.gensim')

In [14]:
import gensim
NUM_TOPICS = 5
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)

In [15]:
topics = ldamodel.print_topics(num_words=6)
for topic in topics:
    print(topic)

(0, '0.011*"small" + 0.011*"fiction" + 0.010*"mean" + 0.009*"thank" + 0.008*"right" + 0.007*"emotional"')
(1, '0.031*"story" + 0.027*"character" + 0.022*"harry" + 0.015*"potter" + 0.014*"would" + 0.014*"book"')
(2, '0.037*"disappoint" + 0.029*"potter" + 0.029*"write" + 0.026*"harry" + 0.021*"story" + 0.019*"script"')
(3, '0.026*"would" + 0.023*"write" + 0.023*"rowling" + 0.020*"harry" + 0.015*"script" + 0.015*"series"')
(4, '0.043*"harry" + 0.035*"potter" + 0.017*"story" + 0.017*"book" + 0.017*"character" + 0.015*"would"')


Finding 10 topics in the data

In [16]:
ldamodel_3 = gensim.models.ldamodel.LdaModel(corpus, num_topics = 10, id2word=dictionary, passes=15)
topics = ldamodel_3.print_topics(num_words=4)
for topic in topics:
    print(topic)

(0, '0.075*"harry" + 0.072*"potter" + 0.036*"write" + 0.033*"book"')
(1, '0.030*"harry" + 0.027*"potter" + 0.025*"character" + 0.021*"story"')
(2, '0.057*"story" + 0.027*"character" + 0.027*"book" + 0.026*"potter"')
(3, '0.037*"harry" + 0.021*"albus" + 0.015*"would" + 0.015*"potter"')
(4, '0.086*"disappoint" + 0.043*"writing" + 0.042*"style" + 0.025*"rowling"')
(5, '0.034*"screen" + 0.013*"book" + 0.013*"magic" + 0.013*"literally"')
(6, '0.037*"harry" + 0.031*"character" + 0.024*"potter" + 0.021*"hermione"')
(7, '0.030*"blank" + 0.021*"write" + 0.017*"nonsense" + 0.014*"cry"')
(8, '0.030*"write" + 0.026*"character" + 0.024*"script" + 0.023*"rowling"')
(9, '0.038*"better" + 0.023*"rowling" + 0.017*"shame" + 0.014*"hate"')


Finding 25 topics in the data

In [17]:
ldamodel_2 = gensim.models.ldamodel.LdaModel(corpus, num_topics = 25, id2word=dictionary, passes=15)
topics = ldamodel_2.print_topics(num_words=4)
for topic in topics:
    print(topic)

(4, '0.095*"format" + 0.071*"movie" + 0.045*"write" + 0.036*"better"')
(8, '0.071*"disappoint" + 0.062*"story" + 0.050*"original" + 0.029*"poorly"')
(6, '0.155*"harry" + 0.153*"potter" + 0.048*"rowling" + 0.039*"write"')
(7, '0.038*"inconsistency" + 0.035*"cheesy" + 0.034*"waste" + 0.032*"inconsistent"')
(16, '0.038*"harry" + 0.022*"potter" + 0.021*"would" + 0.019*"albus"')
(10, '0.072*"potter" + 0.063*"harry" + 0.049*"could" + 0.036*"write"')
(20, '0.113*"order" + 0.031*"absolutely" + 0.031*"review" + 0.031*"difficulty"')
(22, '0.027*"really" + 0.024*"would" + 0.020*"series" + 0.019*"rowling"')
(14, '0.056*"series" + 0.050*"write" + 0.041*"could" + 0.032*"production"')
(1, '0.055*"character" + 0.035*"book" + 0.031*"would" + 0.029*"harry"')
(12, '0.101*"story" + 0.049*"plenty" + 0.035*"book" + 0.032*"character"')
(24, '0.030*"story" + 0.027*"character" + 0.027*"potter" + 0.027*"harry"')
(23, '0.052*"reading" + 0.048*"story" + 0.042*"script" + 0.033*"character"')
(21, '0.049*"amazon" + 

Finding 50 topics in the data

In [18]:
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = 50, id2word=dictionary, passes=15)
topics = ldamodel.print_topics(num_words=7)
for topic in topics:
    print(topic)

(8, '0.078*"harry" + 0.050*"potter" + 0.045*"things" + 0.042*"story" + 0.038*"big" + 0.029*"rowling" + 0.028*"really"')
(6, '0.079*"adventure" + 0.063*"wonder" + 0.049*"sometimes" + 0.047*"break" + 0.045*"brilliant" + 0.041*"travel" + 0.041*"effort"')
(36, '0.093*"amaze" + 0.057*"pace" + 0.056*"generation" + 0.046*"thank" + 0.043*"really" + 0.041*"justice" + 0.037*"right"')
(46, '0.057*"write" + 0.052*"reading" + 0.038*"really" + 0.034*"script" + 0.030*"potter" + 0.025*"rowling" + 0.024*"author"')
(23, '0.073*"overall" + 0.071*"slightly" + 0.049*"nonsense" + 0.040*"hole" + 0.037*"words" + 0.036*"battle" + 0.034*"betray"')
(10, '0.107*"harry" + 0.100*"potter" + 0.066*"story" + 0.045*"great" + 0.044*"little" + 0.035*"series" + 0.029*"original"')
(25, '0.062*"character" + 0.028*"book" + 0.018*"still" + 0.018*"magic" + 0.016*"happen" + 0.013*"hermione" + 0.011*"enjoy"')
(22, '0.164*"love" + 0.058*"harry" + 0.050*"potter" + 0.043*"really" + 0.043*"relationship" + 0.040*"story" + 0.037*"time

In [19]:
dictionary = gensim.corpora.Dictionary.load('dictionary.gensim')
corpus = pickle.load(open('corpus.pkl', 'rb'))
lda = ldamodel

In [20]:
# pip install pyLDAvis==2.1.1

# pyLDAvis

In [21]:
import pyLDAvis.gensim

#### 10 topics

In [22]:
lda = ldamodel_3
lda_display = pyLDAvis.gensim.prepare(lda, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  topic_term_dists = topic_term_dists.ix[topic_order]
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  return pd.concat([default_term_info] + list(topic_dfs))


#### 25 topics

In [23]:
lda = ldamodel_2
lda_display = pyLDAvis.gensim.prepare(lda, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  topic_term_dists = topic_term_dists.ix[topic_order]
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  return pd.concat([default_term_info] + list(topic_dfs))


#### 50 topics

In [24]:

lda_display = pyLDAvis.gensim.prepare(lda, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  topic_term_dists = topic_term_dists.ix[topic_order]
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  return pd.concat([default_term_info] + list(topic_dfs))
