# Sentimental Analysis using nltk
**Problem statement** - 
There are times when a user writes Good, Nice App or any
other positive text, in the review and gives 1-star rating. Your goal is to identify the reviews where the semantics of review text does not match rating.
Your goal is to identify such ratings where review text is good, but rating is negativeso that the support team can point this to use

In [1]:
# Importing necessary libary
import numpy as np 
import pandas as pd

In [2]:
#pip install textblob

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
df = pd.read_csv("/content/chrome_reviews.csv")  # Reading dataset

In [3]:
df.head()

Unnamed: 0,ID,Review URL,Text,Star,Thumbs Up,User Name,Developer Reply,Version,Review Date,App ID
0,3886,https://play.google.com/store/apps/details?id=...,This is very helpfull aap.,5,0,INDIAN Knowledge,,83.0.4103.106,2020-12-19,com.android.chrome
1,3887,https://play.google.com/store/apps/details?id=...,Good,3,2,Ijeoma Happiness,,85.0.4183.127,2020-12-19,com.android.chrome
2,3888,https://play.google.com/store/apps/details?id=...,Not able to update. Neither able to uninstall.,1,0,Priti D BtCFs-29,,85.0.4183.127,2020-12-19,com.android.chrome
3,3889,https://play.google.com/store/apps/details?id=...,Nice app,4,0,Ajeet Raja,,77.0.3865.116,2020-12-19,com.android.chrome
4,3890,https://play.google.com/store/apps/details?id=...,Many unwanted ads,1,0,Rams Mp,,87.0.4280.66,2020-12-19,com.android.chrome


In [5]:
df.shape

(7204, 10)

In [6]:
# Importing nltk libray
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import re
from nltk.stem import WordNetLemmatizer 
from textblob import TextBlob
lemmatizer = WordNetLemmatizer()

In [8]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [9]:
stop_words = set(stopwords.words('english'))  # setting stop words as english
stop_words.remove('not')   # removing not and no as they hold negative sentiments, which are required.
stop_words.remove('no')

In [11]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [14]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [15]:
nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [16]:
# Cleaning the text ie Removing all the unnecessary space, symbols.
clean_text =[]                                             # creating object to hold clean text
for review in df['Text']:
    review= re.sub(r'[^\w\s]', '', str(review))            # search for alphanumeric char and space, replacing them with blank. 
    review = re.sub(r'\d','',review)                       # searching decimal digit and replacing with blank.
    review_token = word_tokenize(review.lower().strip())   #convert reviews into lower case and strip leading and tailing spaces followed by spliting sentnece into words
    review_without_stopwords=[]                            # creating object holds review_without_stopwords                  
    for token in review_token:
        if token not in stop_words:
            token= lemmatizer.lemmatize(token)             # Reduce word to its stem the word we get is a meaningfull word
            review_without_stopwords.append(token)
    cleaned_review = " ".join(review_without_stopwords)   # join used to join elements of the sequence separated by a string separator This function joins elements of a sequence and makes it a string. 
    clean_text.append(cleaned_review)

In [17]:
df["cleaned_review"] = clean_text    # Creating column of cleaned review
one_star_reviews = df[df.Star ==1]   # Select == 1 in star column of dataframe

In [18]:
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [19]:
sia = SentimentIntensityAnalyzer()
senti_list = []

for i in one_star_reviews["cleaned_review"]:
    score = sia.polarity_scores(i)                   # Cal polarity score of each cleaned text
    blob_score = TextBlob(i).sentiment.polarity
    if (score['pos'] >= 0.6):
        senti_list.append('Positive')
    else:
        senti_list.append('Negative or Neutral')
        
one_star_reviews["sentiment"]= senti_list   # Here we get single star review which has positive, negative and neutral sentiments

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  one_star_reviews["sentiment"]= senti_list   # Here we get single star review which has positive, negative and neutral sentiments


In [20]:
one_star_reviews.head()

Unnamed: 0,ID,Review URL,Text,Star,Thumbs Up,User Name,Developer Reply,Version,Review Date,App ID,cleaned_review,sentiment
2,3888,https://play.google.com/store/apps/details?id=...,Not able to update. Neither able to uninstall.,1,0,Priti D BtCFs-29,,85.0.4183.127,2020-12-19,com.android.chrome,not able update neither able uninstall,Negative or Neutral
4,3890,https://play.google.com/store/apps/details?id=...,Many unwanted ads,1,0,Rams Mp,,87.0.4280.66,2020-12-19,com.android.chrome,many unwanted ad,Negative or Neutral
8,3894,https://play.google.com/store/apps/details?id=...,Very bad app 😞,1,0,Akshat Bhardwaj,,78.0.3904.96,2020-12-19,com.android.chrome,bad app,Negative or Neutral
9,3895,https://play.google.com/store/apps/details?id=...,Many times I tried to update its not updating....,1,0,Aditi Rathor,,86.0.4240.198,2020-12-19,com.android.chrome,many time tried update not updating whenever t...,Negative or Neutral
12,3898,https://play.google.com/store/apps/details?id=...,App is not getting update and it is not gettin...,1,0,Daksh Gulati,,83.0.4103.106,2020-12-19,com.android.chrome,app not getting update not getting open saying...,Negative or Neutral


In [21]:
positive_review_with_one_star = one_star_reviews[one_star_reviews.sentiment == 'Positive']
positive_review_with_one_star.drop("cleaned_review",axis = 1,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [22]:
positive_review_with_one_star.head()

Unnamed: 0,ID,Review URL,Text,Star,Thumbs Up,User Name,Developer Reply,Version,Review Date,App ID,sentiment
42,3928,https://play.google.com/store/apps/details?id=...,Okk kind but bad then brave,1,0,shradha baradiya,,87.0.4280.101,2020-12-19,com.android.chrome,Positive
101,4113,https://play.google.com/store/apps/details?id=...,Good,1,0,Sohail Soomro,,74.0.3729.136,2020-12-21,com.android.chrome,Positive
158,4143,https://play.google.com/store/apps/details?id=...,Good,1,0,Md Rubel khan,,,2020-12-21,com.android.chrome,Positive
258,5217,https://play.google.com/store/apps/details?id=...,It is the best app for browsing,1,0,Favour Nwaejikoma,,56.0.2924.87,2020-12-21,com.android.chrome,Positive
291,5251,https://play.google.com/store/apps/details?id=...,Ok,1,0,Rajesh Prabhu,,76.0.3809.111,2020-12-21,com.android.chrome,Positive


In [24]:
positive_review_with_one_star.shape

(99, 11)

In [None]:
positive_review_with_one_star.to_csv('output.csv')