<a href="https://colab.research.google.com/github/IggyZhao/Python-Skiil-Iggy/blob/master/NLP_Document_Clustering_and_Topic_Modeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document Clustering and Topic Modeling

*In this project, I use unsupervised learning models to cluster unlabeled documents into different groups, visualize the results and identify their latent topics/structures.

## Contents

* [Part 1: Load Data](#Part-1:-Load-Data)
* [Part 2: Tokenizing and Stemming](#Part-2:-Tokenizing-and-Stemming)
* [Part 3: TF-IDF](#Part-3:-TF-IDF)
* [Part 4: K-means clustering](#Part-4:-K-means-clustering)
* [Part 5: Topic Modeling - Latent Dirichlet Allocation](#Part-5:-Topic-Modeling---Latent-Dirichlet-Allocation)


# Part 0: Setup Google Drive Environment

In [None]:
!pip install -U -q PyDrive

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [None]:
# https://drive.google.com/open?id=192JMR7SIqoa14vrs7Z9BXO3iK89pimJL
file = drive.CreateFile({'id':'1e4ZybEWVLMaV0KGVGWhLGN2Q-njaz2vz'})
file.GetContentFile('data.csv')  
# https://drive.google.com/open?id=1e4ZybEWVLMaV0KGVGWhLGN2Q-njaz2vz

# Part 1: Load Data

In [None]:
import numpy as np
import pandas as pd
import nltk
import gensim

from sklearn.feature_extraction.text import TfidfVectorizer
import matplotlib.pyplot as plt

nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
# Load data into dataframe
df = pd.read_csv('data.csv', sep='\t', header=0, error_bad_lines=False)

In [None]:
df.head()

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
0,US,3653882,R3O9SGZBVQBV76,B00FALQ1ZC,937001370,"Invicta Women's 15150 ""Angel"" 18k Yellow Gold ...",Watches,5,0,0,N,Y,Five Stars,Absolutely love this watch! Get compliments al...,2015-08-31
1,US,14661224,RKH8BNC3L5DLF,B00D3RGO20,484010722,Kenneth Cole New York Women's KC4944 Automatic...,Watches,5,0,0,N,Y,I love thiswatch it keeps time wonderfully,I love this watch it keeps time wonderfully.,2015-08-31
2,US,27324930,R2HLE8WKZSU3NL,B00DKYC7TK,361166390,Ritche 22mm Black Stainless Steel Bracelet Wat...,Watches,2,1,1,N,Y,Two Stars,Scratches,2015-08-31
3,US,7211452,R31U3UH5AZ42LL,B000EQS1JW,958035625,Citizen Men's BM8180-03E Eco-Drive Stainless S...,Watches,5,0,0,N,Y,Five Stars,"It works well on me. However, I found cheaper ...",2015-08-31
4,US,12733322,R2SV659OUJ945Y,B00A6GFD7S,765328221,Orient ER27009B Men's Symphony Automatic Stain...,Watches,4,0,0,N,Y,"Beautiful face, but cheap sounding links",Beautiful watch face. The band looks nice all...,2015-08-31


In [None]:
# Remove missing value
df.dropna(subset=['review_body'],inplace=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 960056 entries, 0 to 960203
Data columns (total 15 columns):
marketplace          960056 non-null object
customer_id          960056 non-null int64
review_id            960056 non-null object
product_id           960056 non-null object
product_parent       960056 non-null int64
product_title        960054 non-null object
product_category     960056 non-null object
star_rating          960056 non-null int64
helpful_votes        960056 non-null int64
total_votes          960056 non-null int64
vine                 960056 non-null object
verified_purchase    960056 non-null object
review_headline      960049 non-null object
review_body          960056 non-null object
review_date          960052 non-null object
dtypes: int64(5), object(10)
memory usage: 117.2+ MB


In [None]:
# use the first 1000 data as our training data
data = df.loc[:1000, 'review_body'].tolist()

# Part 2: Tokenizing and Stemming

Load stopwords and stemmer function from NLTK library.
Stop words are words like "a", "the", or "in" which don't convey significant meaning.
Stemming is the process of breaking a word down into its root.

In [None]:
# Use nltk's English stopwords.
stopwords = nltk.corpus.stopwords.words('english')
stopwords.append("'s")
stopwords.append("'m")
stopwords.append("n't")
stopwords.append("br")

print ("We use " + str(len(stopwords)) + " stop-words from nltk library.")
print (stopwords[:10])

We use 183 stop-words from nltk library.
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]


Use our defined functions to analyze (i.e. tokenize, stem) our reviews.

In [None]:
from nltk.stem.snowball import SnowballStemmer
# REGULAR EXPRESSION
import re

stemmer = SnowballStemmer("english")

# tokenization and stemming
def tokenization_and_stemming(text):
    tokens = []
    # exclude stop words and tokenize the document, generate a list of string 
    for word in nltk.word_tokenize(text):
        if word.lower() not in stopwords:
            tokens.append(word.lower())

    filtered_tokens = []
    
    # filter out any tokens not containing letters (e.g., numeric tokens, raw punctuation)
    for token in tokens:
        if re.search('[a-zA-Z]', token):
            filtered_tokens.append(token)
            
    # stemming
    stems = [stemmer.stem(t) for t in filtered_tokens]
    return stems

In [None]:
# optional lemmatization
tokenization_and_stemming(data[0])

['absolut',
 'love',
 'watch',
 'get',
 'compliment',
 'almost',
 'everi',
 'time',
 'wear',
 'dainti']

# Part 3: TF-IDF

TF: Term Frequency

IDF: Inverse Document Frequency

***example:***
(1,2)
dictionary :[Arthur, da, Jason, huang, arthur da, da jason, jason da, da da, da huang]

document1: "Arthur da Jason"

document 2: "Jason da da huang"

document1: tf-idf [1, 0.5, 0.5, 0];  document2: tf-idf [0, 1, 0.5, 1]  

2-gram: 

document 1: Arthur da, da Jason; document 2: Jason da, da da, da huang bigram

3-gram:

document 1: Athur da Jason;  document 2: Jason da da, da da huang

[Arhur, da, Jason...]

In [None]:
# define vectorizer parameters
# TfidfVectorizer will help us to create tf-idf matrix
# max_df : maximum document frequency for the given word
# min_df : minimum document frequency for the given word
# max_features: maximum number of words
# use_idf: if not true, we only calculate tf
# stop_words : built-in stop words
# tokenizer: how to tokenize the document
# ngram_range: (min_value, max_value), eg. (1, 3) means the result will include 1-gram, 2-gram, 3-gram
tfidf_model = TfidfVectorizer(max_df=0.99, max_features=1000,
                                 min_df=0.01, stop_words='english',
                                 use_idf=True, tokenizer=tokenization_and_stemming, ngram_range=(1,1))

tfidf_matrix = tfidf_model.fit_transform(data) #fit the vectorizer to synopses

print ("In total, there are " + str(tfidf_matrix.shape[0]) + \
      " reviews and " + str(tfidf_matrix.shape[1]) + " terms.")

  'stop_words.' % sorted(inconsistent))


In total, there are 1000 reviews and 241 terms.


In [None]:
# check the parameters
tfidf_model.get_params()

{'analyzer': 'word',
 'binary': False,
 'decode_error': 'strict',
 'dtype': numpy.float64,
 'encoding': 'utf-8',
 'input': 'content',
 'lowercase': True,
 'max_df': 0.99,
 'max_features': 1000,
 'min_df': 0.01,
 'ngram_range': (1, 1),
 'norm': 'l2',
 'preprocessor': None,
 'smooth_idf': True,
 'stop_words': 'english',
 'strip_accents': None,
 'sublinear_tf': False,
 'token_pattern': '(?u)\\b\\w\\w+\\b',
 'tokenizer': <function __main__.tokenization_and_stemming>,
 'use_idf': True,
 'vocabulary': None}

Save the terms identified by TF-IDF.

In [None]:
# words
tf_selected_words = tfidf_model.get_feature_names()

In [None]:
# print out words
tf_selected_words

['abl',
 'absolut',
 'accur',
 'actual',
 'adjust',
 'alarm',
 'alreadi',
 'alway',
 'amaz',
 'amazon',
 'anoth',
 'arm',
 'arriv',
 'automat',
 'awesom',
 'bad',
 'band',
 'batteri',
 'beauti',
 'best',
 'better',
 'big',
 'bit',
 'black',
 'blue',
 'bought',
 'box',
 'bracelet',
 'brand',
 'break',
 'bright',
 'broke',
 'button',
 'buy',
 'ca',
 'came',
 'case',
 'casio',
 'chang',
 'cheap',
 'clasp',
 'classi',
 'clock',
 'color',
 'come',
 'comfort',
 'compliment',
 'cool',
 'cost',
 'crown',
 'crystal',
 'dark',
 'date',
 'daughter',
 'day',
 'deal',
 'definit',
 'deliveri',
 'design',
 'dial',
 'differ',
 'difficult',
 'disappoint',
 'display',
 'dress',
 'durabl',
 'easi',
 'easili',
 'end',
 'everi',
 'everyday',
 'everyth',
 'exact',
 'excel',
 'expect',
 'expens',
 'face',
 'fair',
 'far',
 'fast',
 'featur',
 'feel',
 'fell',
 'fine',
 'finish',
 'fit',
 'function',
 'gave',
 'gift',
 'gold',
 'good',
 'got',
 'great',
 'hand',
 'happi',
 'hard',
 'heavi',
 'high',
 'hold',


In [None]:
tfidf_matrix

<1000x241 sparse matrix of type '<class 'numpy.float64'>'
	with 7377 stored elements in Compressed Sparse Row format>

# Part 4: K-means clustering

In [None]:
# k-means clustering
from sklearn.cluster import KMeans

num_clusters = 5

# number of clusters
km = KMeans(n_clusters=num_clusters)
km.fit(tfidf_matrix)

clusters = km.labels_.tolist()

## 4.1. Analyze K-means Result

In [None]:
# create DataFrame films from all of the input files.
product = { 'review': df[:1000].review_body, 'cluster': clusters}
frame = pd.DataFrame(product, columns = ['review', 'cluster'])

In [None]:
frame.head(10)

Unnamed: 0,review,cluster
0,Absolutely love this watch! Get compliments al...,4
1,I love this watch it keeps time wonderfully.,4
2,Scratches,0
3,"It works well on me. However, I found cheaper ...",0
4,Beautiful watch face. The band looks nice all...,0
5,"i love this watch for my purpose, about the pe...",4
6,"for my wife and she loved it, looks great and ...",2
7,I was about to buy this thinking it was a Swis...,0
8,Watch is perfect. Rugged with the metal &#34;B...,2
9,Great quality and build.<br />The motors are r...,2


In [None]:
print ("Number of reviews included in each cluster:")
frame['cluster'].value_counts().to_frame()

Number of reviews included in each cluster:


Unnamed: 0,cluster
0,649
4,108
2,105
1,71
3,67


In [None]:
km.cluster_centers_

array([[0.00566771, 0.00443746, 0.00384245, ..., 0.00654075, 0.01754582,
        0.01324199],
       [0.        , 0.        , 0.        , ..., 0.        , 0.00918964,
        0.        ],
       [0.00305421, 0.        , 0.        , ..., 0.00201562, 0.00354235,
        0.02149202],
       [0.        , 0.        , 0.        , ..., 0.        , 0.00710388,
        0.        ],
       [0.        , 0.04175663, 0.        , ..., 0.0125092 , 0.01747128,
        0.00404807]])

In [None]:
print ("<Document clustering result by K-means>")

#km.cluster_centers_ denotes the importances of each items in centroid.
#We need to sort it in decreasing-order and get the top k items.
order_centroids = km.cluster_centers_.argsort()[:, ::-1] 

Cluster_keywords_summary = {}
for i in range(num_clusters):
    print ("Cluster " + str(i) + " words:", end='')
    Cluster_keywords_summary[i] = []
    for ind in order_centroids[i, :6]: #replace 6 with n words per cluster
        Cluster_keywords_summary[i].append(tf_selected_words[ind])
        print (tf_selected_words[ind] + ",", end='')
    print ()
    
    cluster_reviews = frame[frame.cluster==i].review.tolist()
    print ("Cluster " + str(i) + " reviews (" + str(len(cluster_reviews)) + " reviews): ")
    print (", ".join(cluster_reviews))
    print ()

<Document clustering result by K-means>
Cluster 0 words:watch,like,look,band,work,time,
Cluster 0 reviews (649 reviews): 
Scratches, It works well on me. However, I found cheaper prices in other places after making the purchase, Beautiful watch face.  The band looks nice all around.  The links do make that squeaky cheapo noise when you swing it back and forth on your wrist which can be embarrassing in front of watch enthusiasts.  However, to the naked eye from afar, you can't tell the links are cheap or folded because it is well polished and brushed and the folds are pretty tight for the most part.<br /><br />I love the new member of my collection and it looks great.  I've had it for about a week and so far it has kept good time despite day 1 which is typical of a new mechanical watch, I was about to buy this thinking it was a Swiss Army Infantry watch-- the description uses the words infantry and army--- when I realized it must be a fraud for $12.00. This should not be offered on Amaz

# Part 5: Topic Modeling - Latent Dirichlet Allocation

In [None]:
# Use LDA for clustering
from sklearn.decomposition import LatentDirichletAllocation
lda = LatentDirichletAllocation(n_components=5)

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
# LDA requires integer values
tfidf_model_lda = CountVectorizer(max_df=0.99, max_features=500,
                                 min_df=0.01, stop_words='english',
                                 tokenizer=tokenization_and_stemming, ngram_range=(1,1))

tfidf_matrix_lda = tfidf_model_lda.fit_transform(data) #fit the vectorizer to synopses

print ("In total, there are " + str(tfidf_matrix_lda.shape[0]) + \
      " reviews and " + str(tfidf_matrix_lda.shape[1]) + " terms.")

  'stop_words.' % sorted(inconsistent))


In total, there are 1000 reviews and 241 terms.


In [None]:
# document topic matrix for tfidf_matrix_lda
lda_output = lda.fit_transform(tfidf_matrix_lda)
print(lda_output.shape)
print(lda_output)

(1000, 5)
[[0.02528738 0.02548483 0.02509881 0.02519328 0.89893569]
 [0.05111851 0.05232907 0.05030722 0.05071231 0.79553288]
 [0.2        0.2        0.2        0.2        0.2       ]
 ...
 [0.10012728 0.59691765 0.10209538 0.10085503 0.10000467]
 [0.05127899 0.05085008 0.05049291 0.79633507 0.05104296]
 [0.04099129 0.04334694 0.0424227  0.83184234 0.04139673]]


In [None]:
# topics and words matrix
topic_word = lda.components_
print(topic_word.shape)
print(topic_word)

(5, 241)
[[ 1.48955293  0.20156742  6.78415769 ...  0.22985472  3.30585497
   0.20059615]
 [ 0.20098056  0.20330722  1.35484628 ...  1.67721118  6.96539177
   0.20303351]
 [ 0.20337886  0.20217454  0.20348286 ...  0.20124769  0.20108616
  14.058465  ]
 [ 0.22787056  0.20474323  4.45150978 ...  4.03325327 31.25251035
  21.71407632]
 [14.8782171  17.18820758  0.20600339 ... 11.85843314 30.27515676
  20.82382902]]


In [None]:
# column names
topic_names = ["Topic" + str(i) for i in range(lda.n_components)]

# index names
doc_names = ["Doc" + str(i) for i in range(len(data))]

df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns=topic_names, index=doc_names)

# get dominant topic for each document
topic = np.argmax(df_document_topic.values, axis=1)
df_document_topic['topic'] = topic

df_document_topic.head(10)

Unnamed: 0,Topic0,Topic1,Topic2,Topic3,Topic4,topic
Doc0,0.03,0.03,0.03,0.03,0.9,4
Doc1,0.05,0.05,0.05,0.05,0.8,4
Doc2,0.2,0.2,0.2,0.2,0.2,0
Doc3,0.22,0.03,0.21,0.03,0.51,4
Doc4,0.01,0.24,0.15,0.38,0.23,3
Doc5,0.04,0.04,0.04,0.04,0.84,4
Doc6,0.03,0.89,0.03,0.03,0.03,1
Doc7,0.03,0.03,0.03,0.03,0.88,4
Doc8,0.01,0.23,0.01,0.01,0.74,4
Doc9,0.03,0.56,0.36,0.03,0.03,1


In [None]:
df_document_topic['topic'].value_counts().to_frame()

Unnamed: 0,topic
1,337
4,206
3,171
0,157
2,129


In [None]:
# topic word matrix
print(lda.components_)
# topic-word matrix
df_topic_words = pd.DataFrame(lda.components_)

# column and index
df_topic_words.columns = tfidf_model_lda.get_feature_names()
df_topic_words.index = topic_names

df_topic_words.head()

[[ 1.48955293  0.20156742  6.78415769 ...  0.22985472  3.30585497
   0.20059615]
 [ 0.20098056  0.20330722  1.35484628 ...  1.67721118  6.96539177
   0.20303351]
 [ 0.20337886  0.20217454  0.20348286 ...  0.20124769  0.20108616
  14.058465  ]
 [ 0.22787056  0.20474323  4.45150978 ...  4.03325327 31.25251035
  21.71407632]
 [14.8782171  17.18820758  0.20600339 ... 11.85843314 30.27515676
  20.82382902]]


Unnamed: 0,abl,absolut,accur,actual,adjust,alarm,alreadi,alway,amaz,amazon,anoth,arm,arriv,automat,awesom,bad,band,batteri,beauti,best,better,big,bit,black,blue,bought,box,bracelet,brand,break,bright,broke,button,buy,ca,came,case,casio,chang,cheap,...,star,start,stop,strap,sturdi,style,stylish,super,sure,surpris,swim,tell,thank,thing,think,thought,time,timex,tini,tri,turn,use,valu,ve,want,watch,water,way,wear,week,weight,went,wife,wind,wish,work,worn,worth,wrist,year
Topic0,1.489553,0.201567,6.784158,6.431502,8.941229,0.200115,1.433146,0.205697,0.200948,3.374795,0.201597,0.204448,11.658968,9.551758,24.128459,1.541368,8.256848,0.200333,2.893363,3.628799,3.375022,0.20335,0.203841,0.20141,0.201208,2.832164,0.201896,0.201536,1.287251,3.534081,0.204469,1.235966,0.202509,0.207206,0.20202,3.779288,20.524243,0.202443,3.441039,1.687549,...,7.882321,0.202712,0.200929,0.202429,0.232643,1.827032,2.379618,0.204952,1.670934,0.200949,5.742681,3.033923,5.66923,0.465373,0.204797,1.872493,41.443296,0.201702,0.20208,0.200894,0.200142,6.77588,4.401179,2.914255,0.202369,128.065289,25.522102,0.200354,8.3711,10.909778,3.977984,2.854132,0.203582,0.202589,1.568811,0.20139,0.202798,0.229855,3.305855,0.200596
Topic1,0.200981,0.203307,1.354846,9.052364,0.408312,0.201234,0.20353,2.964377,16.778172,0.202183,0.201853,0.201639,0.202916,6.171702,0.202742,7.663402,1.169036,0.20226,50.429438,0.203639,8.585147,40.207576,1.754346,1.701054,4.238251,14.904114,1.525399,2.117813,6.805613,0.202396,0.202292,0.200011,0.206161,6.873595,3.912153,0.817406,4.21914,0.201033,0.201286,14.720923,...,3.652592,0.200418,0.202471,28.232044,1.380548,5.64542,9.845513,15.19126,0.204073,0.201436,3.761587,11.714502,5.936599,8.732535,6.198,6.052254,21.502884,0.206186,0.200579,0.200717,0.922473,6.115451,13.641247,0.203099,7.605622,231.755901,0.201672,6.89456,29.807224,0.202163,7.899745,1.558369,20.192149,0.200945,0.20028,0.323317,3.406673,1.677211,6.965392,0.203034
Topic2,0.203379,0.202175,0.203483,0.290484,0.204802,16.191115,0.689892,0.204013,2.956527,0.204416,0.202683,0.204343,0.204612,0.200461,0.202487,16.38922,0.201579,35.724434,1.236274,0.201001,0.202185,0.205396,0.204452,0.206906,3.631628,6.390736,0.200827,2.343562,2.637571,0.201464,0.200008,15.314978,6.374212,2.036674,0.902155,0.201404,0.200794,0.202399,12.735315,0.202061,...,4.763152,0.205324,18.131251,0.201639,0.202216,1.824378,1.462987,0.200019,1.58082,11.85705,0.205806,1.686112,0.201041,0.201834,10.340087,1.975684,0.205208,1.325668,0.200748,19.105545,0.203581,35.552838,0.206243,2.875721,5.354672,58.899364,8.870981,0.205902,6.9321,25.625737,0.201311,0.201974,0.201214,0.200005,0.201207,96.703642,0.200258,0.201248,0.201086,14.058465
Topic3,0.227871,0.204743,4.45151,0.203751,0.203657,0.205898,5.31744,5.362094,1.531501,3.528443,5.844541,10.099804,0.203012,0.20244,0.200009,0.202146,177.327099,1.007784,0.20441,6.852685,6.747343,3.05041,6.076407,9.77146,1.994485,0.202468,12.862083,0.202598,3.063477,1.580137,0.202987,4.046757,0.20214,18.077874,0.202477,21.228011,1.438341,18.191652,2.484291,21.186488,...,8.118322,4.092488,0.201026,0.204093,0.211958,0.201485,0.564692,0.20208,4.32055,0.251925,0.200009,2.009342,9.99137,0.204244,5.735165,6.895808,20.1341,12.481187,0.201264,1.849366,0.203197,12.514706,2.544226,9.965655,16.690539,245.308241,0.202239,8.980404,18.803269,13.000359,0.2049,8.473102,0.200765,0.200858,0.200882,15.435702,0.201607,4.033253,31.25251,21.714076
Topic4,14.878217,17.188208,0.206003,1.021899,15.242001,0.201638,5.355992,5.263819,0.532851,20.690164,21.549326,5.289765,9.730491,1.873639,0.266303,0.203863,1.045438,10.865189,25.236514,5.113877,11.090303,17.333268,14.760954,14.11917,20.934429,25.670518,0.209795,9.134491,0.206088,9.481921,10.190244,4.202288,10.014978,33.804651,21.781194,6.97389,3.617481,0.202473,1.13807,0.202978,...,6.583613,11.299058,10.264323,19.159795,9.972636,21.501685,5.74719,0.20169,14.223623,0.488639,5.089918,4.556122,0.201761,15.396014,14.521951,0.203761,105.714512,7.785256,12.195329,0.643478,12.470606,23.041125,0.207105,26.04127,18.146799,303.971204,0.203006,1.71878,48.086306,1.261963,3.71606,0.912423,0.20229,14.195603,9.82882,21.335949,9.988663,11.858433,30.275157,20.823829


In [None]:
# print top n keywords for each topic
def print_topic_words(tfidf_model, lda_model, n_words):
    words = np.array(tfidf_model.get_feature_names())
    topic_words = []
    # for each topic, we have words weight
    for topic_words_weights in lda_model.components_:
        top_words = topic_words_weights.argsort()[::-1][:n_words]
        topic_words.append(words.take(top_words))
    return topic_words

topic_keywords = print_topic_words(tfidf_model=tfidf_model_lda, lda_model=lda, n_words=15)        

df_topic_words = pd.DataFrame(topic_keywords)
df_topic_words.columns = ['Word '+str(i) for i in range(df_topic_words.shape[1])]
df_topic_words.index = ['Topic '+str(i) for i in range(df_topic_words.shape[0])]
df_topic_words

Unnamed: 0,Word 0,Word 1,Word 2,Word 3,Word 4,Word 5,Word 6,Word 7,Word 8,Word 9,Word 10,Word 11,Word 12,Word 13,Word 14
Topic 0,watch,nice,price,time,excel,water,awesom,fast,case,pleas,happi,product,day,ship,got
Topic 1,watch,great,look,love,good,nice,beauti,big,realli,price,light,wear,gift,like,qualiti
Topic 2,work,watch,like,product,good,batteri,use,pretti,week,color,look,month,tri,stop,realli
Topic 3,watch,band,look,like,perfect,replac,size,small,wrist,fit,good,qualiti,expect,leather,order
Topic 4,watch,time,love,hand,day,wear,make,purchas,color,like,look,date,buy,second,dial
