# **Google Reviews Analytics :Topic Modeling of Dominoz Reviews using LDA**

Steps of LDA (Latent Dirichlet Allocation):

1.Choose the number of topics (k) you want to extract from the corpus.

2.Preprocess the reviews corpus by removing stop words, punctuations, and converting words to their root forms using stemming or lemmatization.

3.Create a vocabulary list of all unique words in the corpus.

4.Convert each review in the corpus into a bag-of-words representation, where each word is represented by its index in the vocabulary list and the count of that word in the review.

5.Initialize the model by randomly assigning each word in each review  to one of the k topics.

6.For each review 'r' in the corpus, iterate through each word w in the review and calculate the probability distribution over the k topics, given the current assignments of all other words in the document to their topics and the current topic-word distribution.

7.Sample a new topic assignment for word w based on the probability distribution calculated in step 6.

8.Repeat steps 6 and 7 for all reviewss in the corpus until convergence is achieved.

9.Output the topic-word distribution and document-topic distribution as the final result.


In [None]:
!pip install nltk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
import nltk
from nltk.corpus import stopwords  
from nltk.stem import WordNetLemmatizer  
from sklearn.feature_extraction.text import TfidfVectorizer
stop_words=set(nltk.corpus.stopwords.words('english'))

In [None]:
import pandas as pd
df = pd.read_csv('/content/google_reviews.csv')
df

Unnamed: 0,Text
0,Service is not good as wanted. Very limited st...
1,Dominos has become every ones choice because o...
2,As u know dominos is favorite for all of us.\n...
3,"Small place,\nGood staff\nGot my order very qu..."
4,Less Crowded. A small place with limited Seats...
...,...
315,Too little space to stand or sit
316,Nice place for having food
317,Good place to have pizza
318,Seating space is very small.


In [None]:
from nltk import word_tokenize

In [None]:
import nltk; 
from nltk.corpus import sentiwordnet as swn
from nltk.tag import pos_tag,map_tag
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

In [None]:
import nltk
nltk.download('omw-1.4')
  

[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [None]:
def clean_text(review):
      le=WordNetLemmatizer()
      word_tokens=word_tokenize(review)
      tokens=[le.lemmatize(w) for w in word_tokens if w not in stop_words and len(w)>3]
      cleaned_text=" ".join(tokens)
      return cleaned_text
df['cleaned_text']=df['Text'].apply(clean_text)

TFIDF vectorization on the text 

In [None]:
vect =TfidfVectorizer(stop_words=stop_words,max_features=1000)
vect_text=vect.fit_transform(df['cleaned_text'])

In [None]:
from sklearn.decomposition import LatentDirichletAllocation
lda_model=LatentDirichletAllocation(n_components=7,
learning_method='online',random_state=42,max_iter=1) 
lda_top=lda_model.fit_transform(vect_text)

In [None]:
vocab = vect.get_feature_names()
for i, comp in enumerate(lda_model.components_):
     vocab_comp = zip(vocab, comp)
     sorted_words = sorted(vocab_comp, key= lambda x:x[1], reverse=True)[:7]
     print("Topic "+str(i)+": ")
     for t in sorted_words:
            print(t[0],end=" ")
            print("n")

Topic 0: 
tasty n
pizza n
mosquito n
like n
domino n
aswome n
le n
Topic 1: 
good n
service n
place n
taste n
pizza n
nice n
small n
Topic 2: 
working n
love n
clean n
class n
good n
ambience n
food n
Topic 3: 
yummy n
delicious n
pizza n
family n
nice n
delivered n
minimum n
Topic 4: 
always n
pizza n
amazing n
dine n
dominos n
open n
great n
Topic 5: 
nice n
pizza n
outlet n
food n
domino n
place n
small n
Topic 6: 
place n
much n
awesome n
customer n
pizza n
arrangement n
nice n


In [None]:
print("Review 0: ") #Service is not good as wanted. Very limited staff.otherwise food quality is good.
for i,topic in enumerate(lda_top[0]):
  print("Topic ",i,": ",topic*100,"%")

Review 0: 
Topic  0 :  3.828965764916065 %
Topic  1 :  36.83632966998653 %
Topic  2 :  3.830523458616287 %
Topic  3 :  3.8287345388726792 %
Topic  4 :  3.83063459680743 %
Topic  5 :  44.01734912097454 %
Topic  6 :  3.8274628498264636 %


In [None]:
print("Review 1: ") #Dominos has become every ones choice because of its affordable range of pizzas and variety of options available. This place is small for dine in but service and taste of pizza is maintained. Big groups might not get big place to sit but u can always order and pickup ur orders
for i,topic in enumerate(lda_top[0]):
  print("Topic ",i,": ",topic*100,"%")

Review 1: 
Topic  0 :  3.828965764916065 %
Topic  1 :  36.83632966998653 %
Topic  2 :  3.830523458616287 %
Topic  3 :  3.8287345388726792 %
Topic  4 :  3.83063459680743 %
Topic  5 :  44.01734912097454 %
Topic  6 :  3.8274628498264636 %


# Topics Visualization

In [None]:
from gensim import corpora

In [None]:
reviews=df['cleaned_text']

In [None]:
texts = [[token for token in reviews ] for text in reviews]

In [None]:
dictionary1 = corpora.Dictionary(texts)

In [None]:
print(dictionary1)

Dictionary(311 unique tokens: ['', 'A.c. working table clean slow service given team care customer', 'Amazing Pizzas many Offers 🍕🍕🍕🍕🍕🍕🍕🍕', 'Aswome pizza like domino pizza', 'Awesome little harsh fast leave store']...)


In [None]:
corpus = [dictionary1.doc2bow(text) for text in texts]

In [None]:
word_dict = {};
for i in range(7):
    words = lda_model.show_topic(i, topn = 20)
    word_dict['Topic # ' + '{:02d}'.format(i+1)] = [i[0] for i in words]
pd.DataFrame(word_dict)

Unnamed: 0,Topic # 01,Topic # 02,Topic # 03,Topic # 04,Topic # 05,Topic # 06,Topic # 07
0,Nice arrangement,Excellent customer service,Overall good experience Fast service,Good place pizza,Nice cosy outlet,good,They good service
1,Good food good service,,Place newly renovated better,Nice expensive,Best place pizza vashi,Good,Good open till
2,little space stand,Yummy,Good Place Excillent Pizza,dine place anymore,Worst pizza service,Small place luckily open dine,Your number work Genius
3,Super always,Good,small place waiting time,Good pizza budget,Nice outlet great excellent staff,Excellence service valuable food well cleaned ...,nice place pizza friend family
4,Good experience good staff,Taste good Service quick Staff friendly,This near open,Good best,Good place quick bite,Nice place Awesome test,okay home delivery much high
5,Good,Much better never compromise taste pizza,Delicious pizza peaceful atmosphere,class,Need clean neat,various pizza Domino kind diffrent uniqueness,good white pasta
6,Quick service nice pizzaCX,Very place proper management There many mosqui...,Good experience,great always,Good ambience,Good place place seat,Loved good
7,Very small place washroom,Pizza good must pizza food studio,Good pizza compared Pizza,Taste quality Good,Best service also test happy both😊,Very tasteless pizza,Giving star le late service compacted place
8,good Small place chill preparation hygenic,board eating stuff introduce something much be...,domino knew year back,Nice place pizza,pizza good price expensive,Regular visitor small nice outlet.service slow,Fast service pizza good
9,Domino amazing pizza,Customer service thanks Asma took initiative h...,Very congested place chair table,Seating space small,Cheese burst pizza domino simply awesome,service ..... extra charge ketchup pouch ....,Delicious minimum price


In [None]:
lda_display = pyLDAvis.gensim.prepare(lda_model, corpus, dictionary1, sort_topics=True)
pyLDAvis.display(lda_display)

  default_term_info = default_term_info.sort_values(
