#### Problem statement - There are times when a user writes Good, Nice App or any other positive text, in the review and gives 1-star rating. Your goal is to identify the reviews where the semantics of review text does not match rating. 

#### Your goal is to identify such ratings where review text is good, but rating is negative- so that the support team can point this to users. 

Deploy it using - Flask/Streamlit etc and share the live link. 

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv('chrome_reviews.csv')

In [3]:
data.head()

Unnamed: 0,ID,Review URL,Text,Star,Thumbs Up,User Name,Developer Reply,Version,Review Date,App ID
0,3886,https://play.google.com/store/apps/details?id=...,This is very helpfull aap.,5,0,INDIAN Knowledge,,83.0.4103.106,2020-12-19,com.android.chrome
1,3887,https://play.google.com/store/apps/details?id=...,Good,3,2,Ijeoma Happiness,,85.0.4183.127,2020-12-19,com.android.chrome
2,3888,https://play.google.com/store/apps/details?id=...,Not able to update. Neither able to uninstall.,1,0,Priti D BtCFs-29,,85.0.4183.127,2020-12-19,com.android.chrome
3,3889,https://play.google.com/store/apps/details?id=...,Nice app,4,0,Ajeet Raja,,77.0.3865.116,2020-12-19,com.android.chrome
4,3890,https://play.google.com/store/apps/details?id=...,Many unwanted ads,1,0,Rams Mp,,87.0.4280.66,2020-12-19,com.android.chrome


In [4]:
data['Star'].unique()

array([5, 3, 1, 4, 2], dtype=int64)

In [5]:
data.columns

Index(['ID', 'Review URL', 'Text', 'Star', 'Thumbs Up', 'User Name',
       'Developer Reply', 'Version', 'Review Date', 'App ID'],
      dtype='object')

In [6]:
data.isnull().sum()

ID                    0
Review URL            0
Text                  1
Star                  0
Thumbs Up             0
User Name             0
Developer Reply    7109
Version              85
Review Date           0
App ID                0
dtype: int64

In [7]:
data.shape

(7204, 10)

In [8]:
data = data.drop(columns=['ID','Review URL','Developer Reply', 'Version', 'Review Date', 'App ID'])

In [9]:
data['Text'].sample(10)

7124                                                 Good
2017    I've just accessed my apps and games library s...
3013                                                 Good
5655         It's quite a unique app and it helps so much
2544                                                 Good
603                                    সুমন xxxx 11223344
6414    I am not able to update this app don't know wh...
1786                                       Very nice tube
4883                                              viusasa
2840                                             Very bad
Name: Text, dtype: object

In [10]:
data['Text'] = data['Text'].astype(str)

In [11]:
data["Text"]

0                              This is very helpfull aap.
1                                                    Good
2          Not able to update. Neither able to uninstall.
3                                                Nice app
4                                       Many unwanted ads
                              ...                        
7199                                            Bagusss..
7200                                        Bad version 😔
7201    One thing that I have to say I can't spelled t...
7202                                            Excellent
7203    After update it lag and always slow same goes ...
Name: Text, Length: 7204, dtype: object

In [12]:
df = data[['Text','Star']]
df.head()

Unnamed: 0,Text,Star
0,This is very helpfull aap.,5
1,Good,3
2,Not able to update. Neither able to uninstall.,1
3,Nice app,4
4,Many unwanted ads,1


In [13]:
## droping nan values
df.dropna()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7204 entries, 0 to 7203
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Text    7204 non-null   object
 1   Star    7204 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 112.7+ KB


In [14]:
#import natural language tool kit
import nltk
import re #regular expressions module

nltk.download('stopwords')
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\HP\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [15]:
#declaring porter stemmer
port = PorterStemmer()
def text_cleaner (text): #function to clean text
    cleaned= re.sub('[^a-zA-Z]', " ", text) 
    cleaned= cleaned.lower()
    cleaned = cleaned.split()
    cleaned= [port.stem (word) for word in cleaned if word not in stopwords.words("english")]
    cleaned= ' '.join(cleaned)
    return cleaned

In [16]:
df["Cleaned_Text"] = df["Text"].apply(lambda x: text_cleaner(str(x))) #declare cleaned text feature
df["Length"] = df["Text"].apply(lambda x:len(str(x))) #declare length feature
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Cleaned_Text"] = df["Text"].apply(lambda x: text_cleaner(str(x))) #declare cleaned text feature


Unnamed: 0,Text,Star,Cleaned_Text,Length
0,This is very helpfull aap.,5,helpful aap,26
1,Good,3,good,4
2,Not able to update. Neither able to uninstall.,1,abl updat neither abl uninstal,46
3,Nice app,4,nice app,8
4,Many unwanted ads,1,mani unwant ad,17


In [17]:
from textblob import TextBlob
from nltk.sentiment.vader import SentimentIntensityAnalyzer

def sentiment_vader(text, sid):
    ss = sid.polarity_scores(text)
    ss.pop('compound')
    return max(ss, key=ss.get)

In [18]:
def sentiment_textblob(text):
        x = TextBlob(text).sentiment.polarity
        
        if x<0:
            return 'negative'
        elif x==0:
            return 'neutral'
        else:
            return 'positive'

def plot_sentiment_barchart(text, method='TextBlob'):
    if method == 'TextBlob':
        sentiment = text.map(lambda x: sentiment_textblob(x))
    elif method == 'Vader':
        nltk.download('vader_lexicon')
        sid = SentimentIntensityAnalyzer()
        sentiment = text.map(lambda x: sentiment_vader(x, sid=sid))
    else:
        raise ValueError('Textblob or Vader')
    
    plt.bar(sentiment.value_counts().index,
            sentiment.value_counts())

In [19]:
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\HP\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [20]:
df["Score"] = df["Cleaned_Text"].apply(lambda review:sid.polarity_scores(review))

In [21]:
df["Compound_Score"]  = df['Score'].apply(lambda score_dict: score_dict['compound'])

In [22]:
df["Result"] = df["Compound_Score"].apply(lambda c: 'positive' if c > 0 else ('negative' if c < 0 else 'neutral'))
df.head()

Unnamed: 0,Text,Star,Cleaned_Text,Length,Score,Compound_Score,Result
0,This is very helpfull aap.,5,helpful aap,26,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive
1,Good,3,good,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4404,positive
2,Not able to update. Neither able to uninstall.,1,abl updat neither abl uninstal,46,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neutral
3,Nice app,4,nice app,8,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive
4,Many unwanted ads,1,mani unwant ad,17,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neutral


In [23]:
df_focus = df[(df.Result == "positive")]
df_focus.head()

Unnamed: 0,Text,Star,Cleaned_Text,Length,Score,Compound_Score,Result
0,This is very helpfull aap.,5,helpful aap,26,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive
1,Good,3,good,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4404,positive
3,Nice app,4,nice app,8,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive
5,This app good,4,app good,13,"{'neg': 0.0, 'neu': 0.256, 'pos': 0.744, 'comp...",0.4404,positive
10,Good,5,good,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4404,positive


In [24]:
Suggestion = []
for row in df_focus["Star"] :
    if row >= 3 :
         Suggestion.append("Correct rating")
    else :
         Suggestion.append("Check rating given")
            
df_focus["Suggestion"] = Suggestion
df_focus.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_focus["Suggestion"] = Suggestion


Unnamed: 0,Text,Star,Cleaned_Text,Length,Score,Compound_Score,Result,Suggestion
0,This is very helpfull aap.,5,helpful aap,26,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive,Correct rating
1,Good,3,good,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4404,positive,Correct rating
3,Nice app,4,nice app,8,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive,Correct rating
5,This app good,4,app good,13,"{'neg': 0.0, 'neu': 0.256, 'pos': 0.744, 'comp...",0.4404,positive,Correct rating
10,Good,5,good,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4404,positive,Correct rating


In [25]:
df_focus.Suggestion.value_counts()

Correct rating        3192
Check rating given     491
Name: Suggestion, dtype: int64

In [26]:
df_focus.sample(10)

Unnamed: 0,Text,Star,Cleaned_Text,Length,Score,Compound_Score,Result,Suggestion
1173,good,5,good,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4404,positive,Correct rating
3481,The app is helpful but keeps on shutting down ...,2,app help keep shut everi sec,58,"{'neg': 0.0, 'neu': 0.649, 'pos': 0.351, 'comp...",0.4019,positive,Check rating given
6933,Nice apps,5,nice app,9,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive,Correct rating
6583,This app is cool,4,app cool,16,"{'neg': 0.0, 'neu': 0.303, 'pos': 0.697, 'comp...",0.3182,positive,Correct rating
2299,Very very nice app,5,nice app,18,"{'neg': 0.0, 'neu': 0.263, 'pos': 0.737, 'comp...",0.4215,positive,Correct rating
5625,Great app,5,great app,9,"{'neg': 0.0, 'neu': 0.196, 'pos': 0.804, 'comp...",0.6249,positive,Correct rating
1133,Ok,5,ok,2,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.296,positive,Correct rating
5368,Everything is best,5,everyth best,18,"{'neg': 0.0, 'neu': 0.192, 'pos': 0.808, 'comp...",0.6369,positive,Correct rating
5172,nice,5,nice,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4215,positive,Correct rating
488,Good,5,good,4,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4404,positive,Correct rating


In [None]:
keyword = ['good','nice','thank you','best','awesome','helpful']

In [None]:
final = df_focus[(df_focus["Suggestion"] == "Focus Needed")]
final = final[final["Cleaned_Text"].isin(keyword)]
final.drop(final.iloc[:, 3:7], inplace = True, axis = 1)
display(final.head())
print(f"There are about {len(final.Suggestion)} reviews that are positive but have a bad rating")

### part 1 ques 2 completed