# Anderson Yoshizato - Cohort 10 - Jun/2021


# Project 1

DESCRIPTION

Help a leading mobile brand understand the voice of the customer by analyzing the reviews of their product on Amazon and the topics that customers are talking about. You will perform topic modeling on specific parts of speech. You’ll finally interpret the emerging topics.

Problem Statement: 

A popular mobile phone brand, Lenovo has launched their budget smartphone in the Indian market. The client wants to understand the VOC (voice of the customer) on the product. This will be useful to not just evaluate the current product, but to also get some direction for developing the product pipeline. The client is particularly interested in the different aspects that customers care about. Product reviews by customers on a leading e-commerce site should provide a good view.

Domain: Amazon reviews for a leading phone brand

Analysis to be done: POS tagging, topic modeling using LDA, and topic interpretation

Content: 

Dataset: ‘K8 Reviews v0.2.csv’

Columns:

Sentiment: The sentiment against the review (4,5 star reviews are positive, 1,2 are negative)

Reviews: The main text of the review

Steps to perform:

Discover the topics in the reviews and present it to business in a consumable format. Employ techniques in syntactic processing and topic modeling.

Perform specific cleanup, POS tagging, and restricting to relevant POS tags, then, perform topic modeling using LDA. Finally, give business-friendly names to the topics and make a table for business.

In [1]:
import pandas as pd
import nltk

### Read the .csv file using Pandas. Take a look at the top few records.

In [2]:
review = pd.read_csv('K8 Reviews v0.2.csv')

In [3]:
review.shape

(14675, 2)

In [4]:
review.head(5)

Unnamed: 0,sentiment,review
0,1,Good but need updates and improvements
1,0,"Worst mobile i have bought ever, Battery is dr..."
2,1,when I will get my 10% cash back.... its alrea...
3,1,Good
4,0,The worst phone everThey have changed the last...


In [5]:
review.columns

Index(['sentiment', 'review'], dtype='object')

In [6]:
review.sentiment.value_counts(normalize=True)

0    0.52552
1    0.47448
Name: sentiment, dtype: float64

In [7]:
review.isnull().sum()

sentiment    0
review       0
dtype: int64

In [8]:
review.review.sample().values[0]

'Very poor battery life and no quick chargeDual camera is just for sake of name worst camera'

### Normalize casings for the review text and extract the text into a list for easier manipulation.

In [9]:
review0 = review.review.values
len(review0)

14675

In [10]:
review0[:5]

array(['Good but need updates and improvements',
       "Worst mobile i have bought ever, Battery is draining like hell, backup is only 6 to 7 hours with internet uses, even if I put mobile idle its getting discharged.This is biggest lie from Amazon & Lenove which is not at all expected, they are making full by saying that battery is 4000MAH & booster charger is fake, it takes at least 4 to 5 hours to be fully charged.Don't know how Lenovo will survive by making full of us.Please don;t go for this else you will regret like me.",
       'when I will get my 10% cash back.... its already 15 January..',
       'Good',
       'The worst phone everThey have changed the last phone but the problem is still same and the amazon is not returning the phone .Highly disappointing of amazon'],
      dtype=object)

In [11]:
review_lwr = [rev.lower() for rev in review0]

In [12]:
review_lwr[:10]

['good but need updates and improvements',
 "worst mobile i have bought ever, battery is draining like hell, backup is only 6 to 7 hours with internet uses, even if i put mobile idle its getting discharged.this is biggest lie from amazon & lenove which is not at all expected, they are making full by saying that battery is 4000mah & booster charger is fake, it takes at least 4 to 5 hours to be fully charged.don't know how lenovo will survive by making full of us.please don;t go for this else you will regret like me.",
 'when i will get my 10% cash back.... its already 15 january..',
 'good',
 'the worst phone everthey have changed the last phone but the problem is still same and the amazon is not returning the phone .highly disappointing of amazon',
 "only i'm telling don't buyi'm totally disappointedpoor batterypoor camerawaste of money",
 'phone is awesome. but while charging, it heats up allot..really a genuine reason to hate lenovo k8 note',
 'the battery level has worn down',
 "it'

### Tokenize the reviews using NLTKs word_tokenize function.

In [13]:
from nltk.tokenize import word_tokenize

In [14]:
tkn = word_tokenize(review_lwr[0])

In [15]:
tkn

['good', 'but', 'need', 'updates', 'and', 'improvements']

In [16]:
review_token = [nltk.word_tokenize(rev) for rev in review_lwr]

In [17]:
print(review_token[1])

['worst', 'mobile', 'i', 'have', 'bought', 'ever', ',', 'battery', 'is', 'draining', 'like', 'hell', ',', 'backup', 'is', 'only', '6', 'to', '7', 'hours', 'with', 'internet', 'uses', ',', 'even', 'if', 'i', 'put', 'mobile', 'idle', 'its', 'getting', 'discharged.this', 'is', 'biggest', 'lie', 'from', 'amazon', '&', 'lenove', 'which', 'is', 'not', 'at', 'all', 'expected', ',', 'they', 'are', 'making', 'full', 'by', 'saying', 'that', 'battery', 'is', '4000mah', '&', 'booster', 'charger', 'is', 'fake', ',', 'it', 'takes', 'at', 'least', '4', 'to', '5', 'hours', 'to', 'be', 'fully', 'charged.do', "n't", 'know', 'how', 'lenovo', 'will', 'survive', 'by', 'making', 'full', 'of', 'us.please', 'don', ';', 't', 'go', 'for', 'this', 'else', 'you', 'will', 'regret', 'like', 'me', '.']


### Perform parts-of-speech tagging on each sentence using the NLTK POS tagger.

In [18]:
review_tag = [nltk.pos_tag(tag) for tag in review_token]

In [19]:
print(review_tag[0])

[('good', 'JJ'), ('but', 'CC'), ('need', 'VBP'), ('updates', 'NNS'), ('and', 'CC'), ('improvements', 'NNS')]


### For the topic model, we should want to include only nouns.

### Find out all the POS tags that correspond to nouns.

### Limit the data to only terms with these tags.

In [20]:
(review_tag)

[[('good', 'JJ'),
  ('but', 'CC'),
  ('need', 'VBP'),
  ('updates', 'NNS'),
  ('and', 'CC'),
  ('improvements', 'NNS')],
 [('worst', 'JJS'),
  ('mobile', 'NN'),
  ('i', 'NN'),
  ('have', 'VBP'),
  ('bought', 'VBN'),
  ('ever', 'RB'),
  (',', ','),
  ('battery', 'NN'),
  ('is', 'VBZ'),
  ('draining', 'VBG'),
  ('like', 'IN'),
  ('hell', 'NN'),
  (',', ','),
  ('backup', 'NN'),
  ('is', 'VBZ'),
  ('only', 'RB'),
  ('6', 'CD'),
  ('to', 'TO'),
  ('7', 'CD'),
  ('hours', 'NNS'),
  ('with', 'IN'),
  ('internet', 'JJ'),
  ('uses', 'NNS'),
  (',', ','),
  ('even', 'RB'),
  ('if', 'IN'),
  ('i', 'JJ'),
  ('put', 'VBP'),
  ('mobile', 'JJ'),
  ('idle', 'NN'),
  ('its', 'PRP$'),
  ('getting', 'VBG'),
  ('discharged.this', 'NN'),
  ('is', 'VBZ'),
  ('biggest', 'JJS'),
  ('lie', 'NN'),
  ('from', 'IN'),
  ('amazon', 'NN'),
  ('&', 'CC'),
  ('lenove', 'NN'),
  ('which', 'WDT'),
  ('is', 'VBZ'),
  ('not', 'RB'),
  ('at', 'IN'),
  ('all', 'DT'),
  ('expected', 'VBN'),
  (',', ','),
  ('they', 'PRP'),


In [21]:
review_nouns=[]
for txt in review_tag:
    review_nouns.append([(word, tag) for word, tag in txt if 'NN' in tag])
review_nouns

[[('updates', 'NNS'), ('improvements', 'NNS')],
 [('mobile', 'NN'),
  ('i', 'NN'),
  ('battery', 'NN'),
  ('hell', 'NN'),
  ('backup', 'NN'),
  ('hours', 'NNS'),
  ('uses', 'NNS'),
  ('idle', 'NN'),
  ('discharged.this', 'NN'),
  ('lie', 'NN'),
  ('amazon', 'NN'),
  ('lenove', 'NN'),
  ('battery', 'NN'),
  ('charger', 'NN'),
  ('hours', 'NNS'),
  ('don', 'NN')],
 [('i', 'NN'), ('%', 'NN'), ('cash', 'NN'), ('..', 'NN')],
 [],
 [('phone', 'NN'),
  ('everthey', 'NN'),
  ('phone', 'NN'),
  ('problem', 'NN'),
  ('amazon', 'NN'),
  ('phone', 'NN'),
  ('amazon', 'NN')],
 [('camerawaste', 'NN'), ('money', 'NN')],
 [('phone', 'NN'),
  ('allot', 'NN'),
  ('..', 'NNP'),
  ('reason', 'NN'),
  ('k8', 'NNS')],
 [('battery', 'NN'), ('level', 'NN')],
 [('problems', 'NNS'),
  ('phone', 'NN'),
  ('hanging', 'NN'),
  ('problems', 'NNS'),
  ('note', 'NN'),
  ('station', 'NN'),
  ('ahmedabad', 'NN'),
  ('years', 'NNS'),
  ('phone', 'NN'),
  ('lenovo', 'NN')],
 [('lot', 'NN'), ('glitches', 'NNS'), ('thing',

In [22]:
review_nouns[1]

[('mobile', 'NN'),
 ('i', 'NN'),
 ('battery', 'NN'),
 ('hell', 'NN'),
 ('backup', 'NN'),
 ('hours', 'NNS'),
 ('uses', 'NNS'),
 ('idle', 'NN'),
 ('discharged.this', 'NN'),
 ('lie', 'NN'),
 ('amazon', 'NN'),
 ('lenove', 'NN'),
 ('battery', 'NN'),
 ('charger', 'NN'),
 ('hours', 'NNS'),
 ('don', 'NN')]

### Lemmatize
### Different forms of the terms need to be treated as one.
### No need to provide POS tag to lemmatizer for now.

In [23]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [24]:
review_lemma=[]
for txt in review_nouns:
    review_lemma.append([lemmatizer.lemmatize(word) for word, tag in txt])
review_lemma

[['update', 'improvement'],
 ['mobile',
  'i',
  'battery',
  'hell',
  'backup',
  'hour',
  'us',
  'idle',
  'discharged.this',
  'lie',
  'amazon',
  'lenove',
  'battery',
  'charger',
  'hour',
  'don'],
 ['i', '%', 'cash', '..'],
 [],
 ['phone', 'everthey', 'phone', 'problem', 'amazon', 'phone', 'amazon'],
 ['camerawaste', 'money'],
 ['phone', 'allot', '..', 'reason', 'k8'],
 ['battery', 'level'],
 ['problem',
  'phone',
  'hanging',
  'problem',
  'note',
  'station',
  'ahmedabad',
  'year',
  'phone',
  'lenovo'],
 ['lot', 'glitch', 'thing', 'option'],
 ['wrost'],
 ['phone', 'charger', 'damage', 'month'],
 ['item', 'battery', 'life'],
 ['i',
  'battery',
  'problem',
  'motherboard',
  'problem',
  'month',
  'mobile',
  'life'],
 ['phone', 'slim', 'battry', 'backup', 'screen'],
 ['headset'],
 ['time', 'i'],
 ['product',
  'prize',
  'range',
  'specification',
  'comparison',
  'mobile',
  'range',
  'i',
  'phone',
  'seal',
  'i',
  'credit',
  'card',
  'i',
  '..',
  '..

### Remove stopwords and punctuation (if there are any).

In [25]:
from nltk.corpus import stopwords
print(stopwords.words('english'))

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

In [26]:
stop_words = set(stopwords.words('english'))
review_stop=[]
for txt in review_lemma:
    review_stop.append([word for word in txt if not word in stop_words])
review_stop

[['update', 'improvement'],
 ['mobile',
  'battery',
  'hell',
  'backup',
  'hour',
  'us',
  'idle',
  'discharged.this',
  'lie',
  'amazon',
  'lenove',
  'battery',
  'charger',
  'hour'],
 ['%', 'cash', '..'],
 [],
 ['phone', 'everthey', 'phone', 'problem', 'amazon', 'phone', 'amazon'],
 ['camerawaste', 'money'],
 ['phone', 'allot', '..', 'reason', 'k8'],
 ['battery', 'level'],
 ['problem',
  'phone',
  'hanging',
  'problem',
  'note',
  'station',
  'ahmedabad',
  'year',
  'phone',
  'lenovo'],
 ['lot', 'glitch', 'thing', 'option'],
 ['wrost'],
 ['phone', 'charger', 'damage', 'month'],
 ['item', 'battery', 'life'],
 ['battery', 'problem', 'motherboard', 'problem', 'month', 'mobile', 'life'],
 ['phone', 'slim', 'battry', 'backup', 'screen'],
 ['headset'],
 ['time'],
 ['product',
  'prize',
  'range',
  'specification',
  'comparison',
  'mobile',
  'range',
  'phone',
  'seal',
  'credit',
  'card',
  '..',
  '..',
  'deal',
  'amazon',
  '..'],
 ['battery', '..', 'solution', '

In [27]:
review_punct=[]
for txt in review_stop:
    #words = nltk.word_tokenize(txt)
    review_punct.append([word for word in txt if word.isalpha()])
review_punct

[['update', 'improvement'],
 ['mobile',
  'battery',
  'hell',
  'backup',
  'hour',
  'us',
  'idle',
  'lie',
  'amazon',
  'lenove',
  'battery',
  'charger',
  'hour'],
 ['cash'],
 [],
 ['phone', 'everthey', 'phone', 'problem', 'amazon', 'phone', 'amazon'],
 ['camerawaste', 'money'],
 ['phone', 'allot', 'reason'],
 ['battery', 'level'],
 ['problem',
  'phone',
  'hanging',
  'problem',
  'note',
  'station',
  'ahmedabad',
  'year',
  'phone',
  'lenovo'],
 ['lot', 'glitch', 'thing', 'option'],
 ['wrost'],
 ['phone', 'charger', 'damage', 'month'],
 ['item', 'battery', 'life'],
 ['battery', 'problem', 'motherboard', 'problem', 'month', 'mobile', 'life'],
 ['phone', 'slim', 'battry', 'backup', 'screen'],
 ['headset'],
 ['time'],
 ['product',
  'prize',
  'range',
  'specification',
  'comparison',
  'mobile',
  'range',
  'phone',
  'seal',
  'credit',
  'card',
  'deal',
  'amazon'],
 ['battery', 'solution', 'battery', 'life'],
 ['smartphone'],
 [],
 ['galery', 'problem', 'speaker',

### Create a topic model using LDA on the cleaned up data with 12 topics.

In [28]:
import gensim
from gensim import corpora

dictionary = gensim.corpora.Dictionary(review_punct)

In [29]:
count = 0
for k, v in dictionary.iteritems():
    print(k, v)
    count += 1
    if count > 25:
        break

0 improvement
1 update
2 amazon
3 backup
4 battery
5 charger
6 hell
7 hour
8 idle
9 lenove
10 lie
11 mobile
12 us
13 cash
14 everthey
15 phone
16 problem
17 camerawaste
18 money
19 allot
20 reason
21 level
22 ahmedabad
23 hanging
24 lenovo
25 note


In [30]:
review_bow_corpus = [dictionary.doc2bow(doc) for doc in review_punct]

In [31]:
review_bow_corpus[1]

[(2, 1),
 (3, 1),
 (4, 2),
 (5, 1),
 (6, 1),
 (7, 2),
 (8, 1),
 (9, 1),
 (10, 1),
 (11, 1),
 (12, 1)]

In [32]:
bow_doc = review_bow_corpus[1]
for i in range(len(bow_doc)):
    print("Word {} (\"{}\") appears {} time.".format(bow_doc[i][0], 
                                               dictionary[bow_doc[i][0]], 
bow_doc[i][1]))

Word 2 ("amazon") appears 1 time.
Word 3 ("backup") appears 1 time.
Word 4 ("battery") appears 2 time.
Word 5 ("charger") appears 1 time.
Word 6 ("hell") appears 1 time.
Word 7 ("hour") appears 2 time.
Word 8 ("idle") appears 1 time.
Word 9 ("lenove") appears 1 time.
Word 10 ("lie") appears 1 time.
Word 11 ("mobile") appears 1 time.
Word 12 ("us") appears 1 time.


In [33]:
lda_model = gensim.models.LdaMulticore(review_bow_corpus, num_topics=12, id2word=dictionary, chunksize=1000, passes=20, workers=8, random_state=1)

### Print out the top terms for each topic.

In [34]:
for idx, topic in lda_model.print_topics(-1):
    print('Topic: {} \nWords: {}'.format(idx, topic))

Topic: 0 
Words: 0.245*"battery" + 0.093*"performance" + 0.076*"camera" + 0.072*"backup" + 0.041*"life" + 0.028*"day" + 0.015*"mah" + 0.013*"mode" + 0.011*"thanks" + 0.010*"thing"
Topic: 1 
Words: 0.157*"camera" + 0.102*"quality" + 0.087*"phone" + 0.026*"processor" + 0.020*"battery" + 0.013*"ram" + 0.011*"sound" + 0.011*"picture" + 0.010*"front" + 0.010*"game"
Topic: 2 
Words: 0.428*"phone" + 0.022*"time" + 0.016*"amazon" + 0.014*"lenovo" + 0.010*"charger" + 0.009*"heat" + 0.008*"hour" + 0.008*"battery" + 0.008*"month" + 0.008*"turbo"
Topic: 3 
Words: 0.142*"issue" + 0.089*"battery" + 0.041*"product" + 0.039*"hour" + 0.030*"time" + 0.030*"heating" + 0.028*"month" + 0.028*"drain" + 0.023*"use" + 0.023*"charge"
Topic: 4 
Words: 0.143*"problem" + 0.047*"service" + 0.036*"heating" + 0.036*"network" + 0.031*"charger" + 0.028*"issue" + 0.027*"lenovo" + 0.023*"phone" + 0.021*"day" + 0.019*"sim"
Topic: 5 
Words: 0.119*"note" + 0.055*"speaker" + 0.028*"headphone" + 0.028*"experience" + 0.027*"s

### What is the coherence of the model with the c_v metric?

In [35]:
from gensim.models import CoherenceModel
# Compute Coherence Score
coherence_model_lda = CoherenceModel(model=lda_model, texts=review_punct, dictionary=dictionary, coherence='c_v')
coherence_lda = coherence_model_lda.get_coherence()
print(coherence_lda)

0.5838248743685525


In [36]:
print('Perplexity :', lda_model.log_perplexity(review_bow_corpus))

Perplexity : -6.413544682344618


In [37]:
import pyLDAvis.gensim_models
import pyLDAvis
# Visualize the topics
pyLDAvis.enable_notebook()
LDAvis_prepared = pyLDAvis.gensim_models.prepare(lda_model, review_bow_corpus, dictionary)
LDAvis_prepared

### Analyze the topics through the business lens.

### Determine which of the topics can be combined.

### Create topic model using LDA with what you think is the optimal number of topics

In [38]:
lda_model1 = gensim.models.LdaMulticore(review_bow_corpus, num_topics=10, id2word=dictionary, chunksize=1000, passes=20, workers=8, random_state=1)

  and should_run_async(code)


In [39]:
for idx, topic in lda_model1.print_topics(-1):
    print('Topic: {} \nWords: {}'.format(idx, topic))

Topic: 0 
Words: 0.234*"battery" + 0.061*"camera" + 0.060*"performance" + 0.059*"backup" + 0.033*"day" + 0.030*"hour" + 0.029*"life" + 0.026*"phone" + 0.018*"time" + 0.016*"hr"
Topic: 1 
Words: 0.124*"camera" + 0.078*"quality" + 0.071*"phone" + 0.023*"processor" + 0.015*"hai" + 0.013*"sound" + 0.012*"battery" + 0.010*"h" + 0.010*"performance" + 0.009*"stock"
Topic: 2 
Words: 0.354*"phone" + 0.035*"money" + 0.020*"value" + 0.018*"amazon" + 0.017*"time" + 0.016*"issue" + 0.015*"lenovo" + 0.010*"month" + 0.007*"heat" + 0.007*"battery"
Topic: 3 
Words: 0.388*"product" + 0.037*"service" + 0.034*"amazon" + 0.024*"issue" + 0.020*"super" + 0.019*"replacement" + 0.013*"worth" + 0.013*"customer" + 0.013*"device" + 0.013*"money"
Topic: 4 
Words: 0.128*"problem" + 0.074*"issue" + 0.051*"heating" + 0.044*"network" + 0.026*"lenovo" + 0.023*"charger" + 0.021*"phone" + 0.020*"battery" + 0.020*"camera" + 0.016*"day"
Topic: 5 
Words: 0.093*"note" + 0.023*"music" + 0.022*"sim" + 0.021*"headphone" + 0.020

  and should_run_async(code)


### What is the coherence of the model?

In [40]:
coherence_model_lda1 = CoherenceModel(model=lda_model1, texts=review_punct, dictionary=dictionary, coherence='c_v')
coherence_lda1 = coherence_model_lda1.get_coherence()
print(coherence_lda1)

  and should_run_async(code)


0.5772434392717956


In [41]:
print('Perplexity :', lda_model1.log_perplexity(review_bow_corpus))

  and should_run_async(code)


Perplexity : -6.387417164608824


In [42]:
import pyLDAvis.gensim_models
import pyLDAvis
# Visualize the topics
pyLDAvis.enable_notebook()
LDAvis_prepared = pyLDAvis.gensim_models.prepare(lda_model1, review_bow_corpus, dictionary)
LDAvis_prepared

  and should_run_async(code)


If we set λ equal to 1, then our relevance is given purely by the probability of the word to that topic. Setting it to 0 will result in our relevance being dictated by specificity of that word to the topic 

### The business should be able to interpret the topics.

### Name each of the identified topics.

### Create a table with the topic name and the top 10 terms in each to present to the business.

In [43]:
import pandas as pd

data = {'Camera quality':['camera','quality','phone','processor','sound','battery','performance','android','stock','update'],
        'Phone Value':['phone','money','value','amazon','time','issue','lenonvo','battery','heat','charger'],
        'Product Problem':['problem','issue','heating','network','lenovo','charger','phone','battery','camera','day'],
        'Phone Screen':['phone','screen','day','delivery','camera','time','budget','issue','lenovo','performance'],
        'Battery Performance':['battery','performance','camera','backup','day','hour','life','phone','time','hr'],
        'Features 1':['feature','camera','call','speaker','handset','option','note','clairty','battery','mode'],
        'Features 2':['note','sim','phone','music','headphone','card','excellent','volta','option','earphone'],
        'Product Service':['product','service','amazon','issue','super','replacement','device','customer','worth','service'],
        'Return':['mobile','waste','money','return','heat','item','plz','problem','set','plz'],
        'Price':['price','range','glass','smartphone','gorilla','feature','awesome','ram','note','piece']}
df = pd.DataFrame(data)
df

  and should_run_async(code)


Unnamed: 0,Camera quality,Phone Value,Product Problem,Phone Screen,Battery Performance,Features 1,Features 2,Product Service,Return,Price
0,camera,phone,problem,phone,battery,feature,note,product,mobile,price
1,quality,money,issue,screen,performance,camera,sim,service,waste,range
2,phone,value,heating,day,camera,call,phone,amazon,money,glass
3,processor,amazon,network,delivery,backup,speaker,music,issue,return,smartphone
4,sound,time,lenovo,camera,day,handset,headphone,super,heat,gorilla
5,battery,issue,charger,time,hour,option,card,replacement,item,feature
6,performance,lenonvo,phone,budget,life,note,excellent,device,plz,awesome
7,android,battery,battery,issue,phone,clairty,volta,customer,problem,ram
8,stock,heat,camera,lenovo,time,battery,option,worth,set,note
9,update,charger,day,performance,hr,mode,earphone,service,plz,piece


### Improvements to be considered to improve the model performance:
#### 1. Revise the Stop Words list. Include some meaningless terms found in the analysis.
#### 2. Analyze the inclusion of adjectives tags in the corpus.
#### 3. Try different models.
#### 4. Optimize Hyperparameters.