<a href="https://colab.research.google.com/github/HuitingSheng/Amazon-User-Review-Sentiment-Analysis/blob/main/Amazon_user_review_sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document Clustering and Topic Modeling 

*In* this project, use unsupervised learning models to cluster unlabeled documents into different groups, visualize the results and identify their latent topics/structures.

## Contents

* [Part 1: Load Data](#Part-1:-Load-Data)
* [Part 2: Tokenizing and Stemming](#Part-2:-Tokenizing-and-Stemming)
* [Part 3: TF-IDF](#Part-3:-TF-IDF)
* [Part 4: K-means clustering](#Part-4:-K-means-clustering)
* [Part 5: Topic Modeling - Latent Dirichlet Allocation](#Part-5:-Topic-Modeling---Latent-Dirichlet-Allocation)


# Part 1: Load Data

In [None]:
import numpy as np
import pandas as pd
import nltk
# import gensim
from nltk.stem.snowball import SnowballStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
import matplotlib.pyplot as plt
from nltk.corpus import stopwords

# nltk.download('punkt') 
# nltk.download('stopwords')

In [None]:
# Load data from Amazon Open Source
url = "https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Watches_v1_00.tsv.gz"
df = pd.read_csv(url, sep= "\t",error_bad_lines=False)


b'Skipping line 8704: expected 15 fields, saw 22\nSkipping line 16933: expected 15 fields, saw 22\nSkipping line 23726: expected 15 fields, saw 22\n'
b'Skipping line 85637: expected 15 fields, saw 22\n'
b'Skipping line 132136: expected 15 fields, saw 22\nSkipping line 158070: expected 15 fields, saw 22\nSkipping line 166007: expected 15 fields, saw 22\nSkipping line 171877: expected 15 fields, saw 22\nSkipping line 177756: expected 15 fields, saw 22\nSkipping line 181773: expected 15 fields, saw 22\nSkipping line 191085: expected 15 fields, saw 22\nSkipping line 196273: expected 15 fields, saw 22\nSkipping line 196331: expected 15 fields, saw 22\n'
b'Skipping line 197000: expected 15 fields, saw 22\nSkipping line 197011: expected 15 fields, saw 22\nSkipping line 197432: expected 15 fields, saw 22\nSkipping line 208016: expected 15 fields, saw 22\nSkipping line 214110: expected 15 fields, saw 22\nSkipping line 244328: expected 15 fields, saw 22\nSkipping line 248519: expected 15 fields,

In [None]:
# Load data into dataframe
# df = pd.read_csv('watch_reviews.tsv', sep='\t', error_bad_lines=False)
# 'amazon_us_reviews/Mobile_Electronics_v1_00', with_info = True

In [None]:
df.head()

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
0,US,3653882,R3O9SGZBVQBV76,B00FALQ1ZC,937001370,"Invicta Women's 15150 ""Angel"" 18k Yellow Gold ...",Watches,5,0,0,N,Y,Five Stars,Absolutely love this watch! Get compliments al...,2015-08-31
1,US,14661224,RKH8BNC3L5DLF,B00D3RGO20,484010722,Kenneth Cole New York Women's KC4944 Automatic...,Watches,5,0,0,N,Y,I love thiswatch it keeps time wonderfully,I love this watch it keeps time wonderfully.,2015-08-31
2,US,27324930,R2HLE8WKZSU3NL,B00DKYC7TK,361166390,Ritche 22mm Black Stainless Steel Bracelet Wat...,Watches,2,1,1,N,Y,Two Stars,Scratches,2015-08-31
3,US,7211452,R31U3UH5AZ42LL,B000EQS1JW,958035625,Citizen Men's BM8180-03E Eco-Drive Stainless S...,Watches,5,0,0,N,Y,Five Stars,"It works well on me. However, I found cheaper ...",2015-08-31
4,US,12733322,R2SV659OUJ945Y,B00A6GFD7S,765328221,Orient ER27009B Men's Symphony Automatic Stain...,Watches,4,0,0,N,Y,"Beautiful face, but cheap sounding links",Beautiful watch face. The band looks nice all...,2015-08-31


In [None]:
# Remove missing value
df.dropna(subset=['review_body'],inplace=True)

In [None]:
df.reset_index(inplace=True, drop=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 960056 entries, 0 to 960203
Data columns (total 15 columns):
marketplace          960056 non-null object
customer_id          960056 non-null int64
review_id            960056 non-null object
product_id           960056 non-null object
product_parent       960056 non-null int64
product_title        960054 non-null object
product_category     960056 non-null object
star_rating          960056 non-null int64
helpful_votes        960056 non-null int64
total_votes          960056 non-null int64
vine                 960056 non-null object
verified_purchase    960056 non-null object
review_headline      960049 non-null object
review_body          960056 non-null object
review_date          960052 non-null object
dtypes: int64(5), object(10)
memory usage: 117.2+ MB


In [None]:
# # use the first 1000 data as our training data
# data = df.loc[:9999, 'review_body'].tolist()

In [None]:
# data

# Part 2: Tokenizing and Stemming

Load stopwords and stemmer function from NLTK library.
Stop words are words like "a", "the", or "in" which don't convey significant meaning.
Stemming is the process of breaking a word down into its root.

In [None]:
# Use nltk's English stopwords.
stopwords = stopwords.words('english') #stopwords.append("n't")
stopwords.append("'s")
stopwords.append("'m")
stopwords.append("br") 
stopwords.append("watch")

print ("We use " + str(len(stopwords)) + " stop-words from nltk library.")
print (stopwords[:10])

We use 183 stop-words from nltk library.
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]


Use defined functions to analyze (i.e. tokenize,em) our reviews.

Use defined functions to analyze (i.e. tokenize, stem) our reviews.

In [None]:
import spacy
import re
import string
from emoji_translate.emoji_translate import Translator
emo = Translator(exact_match_only=False, randomize=True)
punctuation = string.punctuation
nlp = spacy.load('en_core_web_sm',disable= ["parser","ner"])
def tokenization_and_lemmatization(text):
    text = text.lower()
    print(text)
    #text = emo.demojify(text)
    doc = nlp(text)
    tokens= [token.text for token in doc if token.text not in stopwords 
             and token.text not in punctuation 
             and token.text.isalpha()]
    #print(tokens)
    new_doc = nlp(" ".join(tokens))
    #print(new_doc)
    lemmas = [token.lemma_ for token in new_doc]
    print(lemmas)
        
        

In [None]:
tokenization_and_lemmatization("My husband LOVED it😁👍🏽👍🏽👍🏽👍🏽")

my husband loved it😁👍🏽👍🏽👍🏽👍🏽
['husband', 'love']


In [None]:
tokenization_and_stemming(df.review_body[0])

['absolutely',
 'love',
 'get',
 'compliments',
 'almost',
 'every',
 'time',
 'wear',
 'dainty']

In [None]:
df.review_body[990]

'a very nice watch for a good price.'

# Part 3: TF-IDF

TF: Term Frequency

IDF: Inverse Document Frequency

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_model = TfidfVectorizer(max_df=0.99, max_features=1000,
                                 min_df=0.01, stop_words='english',
                                 use_idf=True, tokenizer=tokenization_and_stemming, ngram_range=(1,2))

tfidf_matrix = tfidf_model.fit_transform(data) #fit the vectorizer to synopses

print ("In total, there are " + str(tfidf_matrix.shape[0]) + \
      " reviews and " + str(tfidf_matrix.shape[1]) + " terms.")

  'stop_words.' % sorted(inconsistent))


In total, there are 10000 reviews and 238 terms.


In [None]:
tfidf_matrix

<10000x238 sparse matrix of type '<class 'numpy.float64'>'
	with 69911 stored elements in Compressed Sparse Row format>

In [None]:
tfidf_matrix.toarray() 

array([[0.        , 0.52828291, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.29794121, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [None]:
tfidf_matrix.todense()

matrix([[0.        , 0.52828291, 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        ...,
        [0.        , 0.        , 0.29794121, ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ]])

In [None]:
print(type(tfidf_matrix.toarray()))

<class 'numpy.ndarray'>


In [None]:
print(type(tfidf_matrix.todense()))

<class 'numpy.matrix'>


Save the terms identified by TF-IDF.

In [None]:
# words
tf_selected_words = tfidf_model.get_feature_names()

In [None]:
# print out words
tf_selected_words

['abl',
 'absolut',
 'accur',
 'actual',
 'adjust',
 'alarm',
 'alreadi',
 'alway',
 'amaz',
 'amazon',
 'anoth',
 'appear',
 'arriv',
 'attract',
 'automat',
 'awesom',
 'bad',
 'band',
 'batteri',
 'beauti',
 'best',
 'better',
 'big',
 'bit',
 'black',
 'blue',
 'bought',
 'box',
 'bracelet',
 'brand',
 'broke',
 'button',
 'buy',
 'ca',
 'came',
 'case',
 'casio',
 'chang',
 'cheap',
 'clasp',
 'classi',
 'clear',
 'clock',
 'color',
 'come',
 'comfort',
 'compliment',
 'cool',
 'cost',
 'coupl',
 'crystal',
 'cute',
 'dark',
 'date',
 'daughter',
 'day',
 'deal',
 'definit',
 'design',
 'dial',
 'differ',
 'difficult',
 'digit',
 'disappoint',
 'display',
 'durabl',
 'easi',
 'easi read',
 'easili',
 'eleg',
 'end',
 'everi',
 'everyday',
 'everyth',
 'exact',
 'excel',
 'expect',
 'expens',
 'face',
 'far',
 'fast',
 'featur',
 'feel',
 'fell',
 'figur',
 'fine',
 'fit',
 'function',
 'gave',
 'gift',
 'glass',
 'gold',
 'good',
 'good look',
 'good qualiti',
 'got',
 'great',
 '

# Part 4: K-means clustering

In [None]:
# k-means clustering
from sklearn.cluster import KMeans

num_clusters = 5

# number of clusters
km = KMeans(n_clusters=num_clusters)
km.fit(tfidf_matrix)

clusters = km.labels_.tolist()

## 4.1. Analyze K-means Result

In [None]:
# create DataFrame films from all of the input files.
product = { 'review': df[:10000].review_body, 'cluster': clusters}
frame = pd.DataFrame(product, columns = ['review', 'cluster'])

In [None]:
frame.head(10)

Unnamed: 0,review,cluster
0,Absolutely love this watch! Get compliments al...,2
1,I love this watch it keeps time wonderfully.,3
2,Scratches,2
3,"It works well on me. However, I found cheaper ...",2
4,Beautiful watch face. The band looks nice all...,2
5,"i love this watch for my purpose, about the pe...",2
6,"for my wife and she loved it, looks great and ...",0
7,I was about to buy this thinking it was a Swis...,2
8,Watch is perfect. Rugged with the metal &#34;B...,0
9,Great quality and build.<br />The motors are r...,2


In [None]:
print ("Number of reviews included in each cluster:")
frame['cluster'].value_counts().to_frame()

Number of reviews included in each cluster:


Unnamed: 0,cluster
2,6902
0,1173
3,748
1,644
4,533


In [None]:
km.cluster_centers_


array([[0.00130689, 0.00385818, 0.00510487, ..., 0.00483826, 0.00964834,
        0.01243986],
       [0.        , 0.        , 0.00249225, ..., 0.00364404, 0.00328115,
        0.00236999],
       [0.00544808, 0.00573283, 0.00613407, ..., 0.00929717, 0.02426404,
        0.01635063],
       [0.00104253, 0.02175625, 0.        , ..., 0.00083496, 0.00461507,
        0.00478863],
       [0.        , 0.        , 0.00211771, ..., 0.        , 0.00690848,
        0.        ]])

In [None]:
km.cluster_centers_.shape

(5, 238)

In [None]:
print ("<Document clustering result by K-means>")

#km.cluster_centers_ denotes the importances of each items in centroid.
#We need to sort it in decreasing-order and get the top k items.
order_centroids = km.cluster_centers_.argsort()[:, ::-1] 

Cluster_keywords_summary = {}
for i in range(num_clusters):
    print ("Cluster " + str(i) + " words:", end='')
    Cluster_keywords_summary[i] = []
    for ind in order_centroids[i, :6]: #replace 6 with n words per cluster
        Cluster_keywords_summary[i].append(tf_selected_words[ind])
        print (tf_selected_words[ind] + ",", end='')
    print ()
    
    cluster_reviews = frame[frame.cluster==i].review.tolist()
    print ("Cluster " + str(i) + " reviews (" + str(len(cluster_reviews)) + " reviews): ")
    print (", ".join(cluster_reviews))
    print ()

<Document clustering result by K-means>
Cluster 0 words:great,look,look great,price,great price,great look,
Cluster 0 reviews (1173 reviews): 
for my wife and she loved it, looks great and a great price!, Watch is perfect. Rugged with the metal &#34;Bull Bars&#34;. The red accents are a great touch and I get compliments when wearing it. If you are worried about being able to read this in sunlight or in the dark don't! The LED ilumination works great! I might even get this in a different color for my next G-Shock purchase!, Works great but the watch a used it on was slim so I had to use a quarter to rase it up the right height, Perfect Condition, Arrived on Time,Works & Looks Great, amazing product keeps everything safe and secure organized great quality for my designer belts perfect for homes or traveling, Comfortable, looks great, very lightweight.The band is a little on the short side, but it's usable and not a big deal to replace., Beautiful watch! Looks better on the wrist than in 

# Part 5: Topic Modeling - Latent Dirichlet Allocation

In [None]:
# Use LDA for clustering
from sklearn.decomposition import LatentDirichletAllocation
lda = LatentDirichletAllocation(n_components=5)

In [None]:
# document topic matrix for tfidf_matrix_lda
lda_output = lda.fit_transform(tfidf_matrix)
print(lda_output.shape)
print(lda_output)

(10000, 5)
[[0.28208723 0.05955054 0.53931565 0.05949737 0.05954921]
 [0.0886215  0.08393129 0.65860226 0.08489857 0.08394637]
 [0.59970019 0.10000151 0.1000017  0.1002954  0.10000121]
 ...
 [0.680637   0.04780896 0.18248899 0.04466992 0.04439512]
 [0.82167652 0.04387159 0.04351991 0.04400548 0.04692651]
 [0.75586759 0.05640191 0.05612302 0.05864476 0.07296272]]


In [None]:
# topics and words matrix
topic_word = lda.components_
print(topic_word.shape)
print(topic_word)

(5, 238)
[[ 32.37358521   3.18973616  38.6440784  ...  33.35280165  80.86188137
  106.40396886]
 [  0.20229706   0.20121821   5.89234086 ...  23.17885914   0.32825415
    0.2076596 ]
 [  0.20225321  57.57362456   0.20112406 ...   0.20262754   0.20216227
   21.27544424]
 [  7.64367836   0.20135271   3.39300776 ...  16.82028608  10.82927219
    1.99370471]
 [  0.49363055   0.20135312   3.92858379 ...   0.26109484  96.81369815
    3.67137619]]


In [None]:
# column names
topic_names = ["Topic" + str(i) for i in range(lda.n_components)]

# index names
doc_names = ["Doc" + str(i) for i in range(len(data))]

df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns=topic_names, index=doc_names)

# get dominant topic for each document
topic = np.argmax(df_document_topic.values, axis=1)
df_document_topic['topic'] = topic

df_document_topic.head(10)

Unnamed: 0,Topic0,Topic1,Topic2,Topic3,Topic4,topic
Doc0,0.28,0.06,0.54,0.06,0.06,2
Doc1,0.09,0.08,0.66,0.08,0.08,2
Doc2,0.6,0.1,0.1,0.1,0.1,0
Doc3,0.74,0.07,0.06,0.06,0.07,0
Doc4,0.44,0.04,0.04,0.23,0.25,0
Doc5,0.64,0.08,0.13,0.07,0.07,0
Doc6,0.06,0.06,0.25,0.06,0.58,4
Doc7,0.75,0.06,0.06,0.06,0.06,0
Doc8,0.49,0.04,0.05,0.04,0.38,0
Doc9,0.76,0.06,0.06,0.06,0.07,0


In [None]:
df_document_topic['topic'].value_counts().to_frame()

Unnamed: 0,topic
0,3973
4,1724
3,1572
1,1483
2,1248


In [None]:
# topic word matrix
print(lda.components_)
# topic-word matrix
df_topic_words = pd.DataFrame(lda.components_)

# column and index
df_topic_words.columns = tfidf_model.get_feature_names()
df_topic_words.index = topic_names

df_topic_words.head()

[[ 32.37358521   3.18973616  38.6440784  ...  33.35280165  80.86188137
  106.40396886]
 [  0.20229706   0.20121821   5.89234086 ...  23.17885914   0.32825415
    0.2076596 ]
 [  0.20225321  57.57362456   0.20112406 ...   0.20262754   0.20216227
   21.27544424]
 [  7.64367836   0.20135271   3.39300776 ...  16.82028608  10.82927219
    1.99370471]
 [  0.49363055   0.20135312   3.92858379 ...   0.26109484  96.81369815
    3.67137619]]


Unnamed: 0,abl,absolut,accur,actual,adjust,alarm,alreadi,alway,amaz,amazon,anoth,appear,arriv,attract,automat,awesom,bad,band,batteri,beauti,best,better,big,bit,black,blue,bought,box,bracelet,brand,broke,button,buy,ca,came,case,casio,chang,cheap,clasp,...,solid,someth,son,star,start,stop,strap,style,stylish,super,sure,tell,thank,thing,think,thought,time,timex,took,tri,turn,use,valu,want,watch,water,water resist,way,wear,week,weight,white,wife,wish,work,work great,worn,worth,wrist,year
Topic0,32.373585,3.189736,38.644078,36.091479,48.713382,40.295173,20.651773,42.861499,4.497841,43.49855,52.836893,25.490459,51.936842,41.830678,31.516788,1.294485,23.555614,145.795007,112.12917,32.528495,26.248755,37.974972,44.679615,43.777563,40.845189,22.557478,78.253833,17.53562,23.026442,28.780569,18.742925,50.24708,73.091331,54.953273,28.231759,42.138764,51.762124,54.346029,32.814676,21.215595,...,22.553043,36.170279,0.202958,46.838076,34.42198,77.968581,68.037973,25.340353,3.566493,9.376344,33.665498,54.83232,0.203239,53.140439,58.813699,30.968504,254.014556,56.192354,23.086367,43.58876,37.632091,175.972592,5.951981,54.388452,123.145315,74.667674,27.39061,34.875136,141.40039,77.113175,19.85416,29.028893,0.368635,35.916862,155.147461,0.203852,38.442339,33.352802,80.861881,106.403969
Topic1,0.202297,0.201218,5.892341,0.214256,0.201424,0.200337,0.20143,1.723584,7.994119,20.884791,12.104948,5.875577,28.647374,0.201764,0.201686,130.913284,0.201243,1.139527,0.201366,155.338101,54.979205,42.02557,7.239838,0.329749,0.202586,0.208206,3.220919,6.181405,0.201763,4.906747,0.212886,0.200517,42.481438,0.770919,0.201718,6.225289,0.200975,0.201122,1.938383,0.212485,...,9.835201,0.419991,0.200783,4.651106,0.200417,0.200782,0.607843,0.205539,0.20299,3.571302,4.688211,0.201577,125.166451,4.157765,0.216081,10.866194,17.917951,0.201269,0.201837,0.201025,0.200859,1.789913,65.406161,1.115692,22.476673,0.200739,0.200341,3.455651,1.089557,0.321456,7.159021,0.200674,0.201719,0.20124,0.493341,0.200892,0.20053,23.178859,0.328254,0.20766
Topic2,0.202253,57.573625,0.201124,0.202136,0.201536,1.012923,0.202106,0.202386,18.379147,0.20746,0.307152,0.20209,0.20798,0.201759,0.200943,0.20128,0.200459,0.217582,0.201083,70.15022,0.201368,0.20308,0.328984,0.201361,0.201889,0.201238,85.857521,0.202931,0.201785,0.20095,0.200839,0.200332,2.097346,0.283973,0.202177,0.201852,0.202241,0.201609,0.201333,0.201254,...,0.200863,0.201312,66.056932,0.201926,0.200858,0.200485,0.201122,15.959928,13.798486,0.201904,1.730458,0.201514,0.201475,0.204228,0.201552,0.203123,22.061208,0.245994,0.201359,0.200528,0.200204,0.204681,0.201267,0.203094,18.921174,0.200727,0.200685,1.385671,31.699317,0.201679,0.201815,0.200899,67.253812,0.20183,0.203384,0.200444,0.201489,0.202628,0.202162,21.275444
Topic3,7.643678,0.201353,3.393008,1.092415,3.17566,0.200193,18.553507,0.202158,55.476978,2.841704,3.946193,0.202108,0.460706,0.470041,0.200875,0.201128,35.748487,98.340737,24.403504,0.589147,0.201094,6.154187,4.468697,2.258407,1.673556,0.848092,0.307819,32.176412,12.530298,10.361027,75.29601,0.200518,33.78583,0.612843,53.281964,2.13317,2.790172,4.194004,96.796182,14.717482,...,15.143639,10.659552,0.202534,9.111937,0.200967,1.383113,46.468312,0.201739,3.773426,24.347723,5.670697,0.201199,0.289924,12.198623,5.750413,0.488298,35.290073,0.528273,13.670133,5.340434,0.201484,16.999995,0.201127,35.109903,2.550024,0.201528,0.200329,7.414213,11.110704,22.86479,0.458757,0.200928,0.201184,0.201414,131.56372,0.200743,0.207201,16.820286,10.829272,1.993705
Topic4,0.493631,0.201353,3.928584,9.925521,13.77828,0.200552,1.537704,6.925079,0.203483,0.616847,2.466773,6.561582,7.11511,0.202844,0.200771,0.201792,0.298446,85.042752,0.201162,9.881112,0.201204,15.640281,84.935666,38.156489,29.014741,28.633264,3.875844,3.667978,7.274853,0.20134,0.200477,0.201359,7.414887,15.404038,4.184328,7.024717,0.208801,0.975366,5.939532,5.729533,...,0.203788,0.25125,0.201382,5.459144,0.200256,0.200333,21.427091,20.02368,38.124245,4.294359,3.735955,0.79846,0.20162,4.807706,9.038799,7.893909,21.278625,0.205596,0.202208,0.201006,0.200547,6.986535,0.201855,34.895259,25.124023,0.201868,0.200496,28.118908,34.721629,2.228923,22.635782,8.826735,0.46029,12.845103,53.830428,60.255674,0.202749,0.261095,96.813698,3.671376


In [None]:
# print top n keywords for each topic
def print_topic_words(tfidf_model, lda_model, n_words):
    words = np.array(tfidf_model.get_feature_names())
    topic_words = []
    # for each topic, we have words weight
    for topic_words_weights in lda_model.components_:
        top_words = topic_words_weights.argsort()[::-1][:n_words]
        topic_words.append(words.take(top_words))
    return topic_words

topic_keywords = print_topic_words(tfidf_model=tfidf_model, lda_model=lda, n_words=15)        

df_topic_words = pd.DataFrame(topic_keywords)
df_topic_words.columns = ['Word '+str(i) for i in range(df_topic_words.shape[1])]
df_topic_words.index = ['Topic '+str(i) for i in range(df_topic_words.shape[0])]
df_topic_words

Unnamed: 0,Word 0,Word 1,Word 2,Word 3,Word 4,Word 5,Word 6,Word 7,Word 8,Word 9,Word 10,Word 11,Word 12,Word 13,Word 14
Topic 0,time,use,easi,work,like,band,look,wear,day,read,watch,face,batteri,set,need
Topic 1,good,beauti,qualiti,awesom,thank,price,like,look,fast,look good,valu,great,ship,item,good qualiti
Topic 2,love,excel,gift,perfect,husband,bought,expect,beauti,compliment,wife,son,eleg,absolut,daughter,classi
Topic 3,nice,product,work,band,cheap,broke,nice look,look,money,ok,amaz,recommend,came,pin,strap
Topic 4,great,look,look great,price,small,wrist,band,big,great price,exact,pictur,like,cute,fit,size


In [None]:
!pip install pyLDAvis==2.1.2



In [None]:
import pkg_resources
#pkg_resources.require("pyLDAvis==`2.1.2`")  # modified to use specific numpy
#import numpy
import pyLDAvis


In [None]:
 ! pip freeze | grep pyLDAvis

pyLDAvis==2.1.2


In [None]:
!

In [None]:
# import pyLDAvis.gensim

In [None]:
# pyLDAvis.enable_notebook()
# vis = pyLDAvis.gensim.prepare(df_topic_words)

NameError: ignored