We attempt to analysis each reviews by extracting the problems/benefits of the application mentioned in each reviews
To do so, we utilise the hugging face question-answering pipeline
Similarly, hugging face provides us with the sentiment analysis pipeline, allowing us to check the sentiment value of each review

In [50]:
#libraries
import pandas as pd


In [51]:
#import data
appStore = pd.read_csv('AppStoreData.csv')
googlePlay = pd.read_csv('PlayStoreData.csv')

In [52]:
#combine review data 
as_review = appStore['review']
gp_review = googlePlay['text']

reviews = as_review.tolist() + gp_review.tolist()

In [53]:
#data cleaning to remove weird comments
print(reviews)



In [54]:
#VADER
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

# Initialize the SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

# Initialize lists to store data
review_texts = []
positive_scores = []
negative_scores = []
neutral_scores = []
compound_scores = []
nps_indiv = []
nps_category = []  # New column for NPS categories

# Perform sentiment analysis and store scores in lists
for review in reviews:
    vs = analyzer.polarity_scores(review)
    review_texts.append(review)
    positive_scores.append(vs['pos'])
    negative_scores.append(vs['neg'])
    neutral_scores.append(vs['neu'])
    compound_scores.append(vs['compound'])
    
    # Map compound scores to nps_indiv based on specified intervals
    if -1 <= vs['compound'] <= -9/11:
        nps_indiv.append(0)
    elif -9/11 < vs['compound'] <= -7/11:
        nps_indiv.append(1)
    elif -7/11 < vs['compound'] <= -5/11:
        nps_indiv.append(2)
    elif -5/11 < vs['compound'] <= -3/11:
        nps_indiv.append(3)
    elif -3/11 < vs['compound'] <= -1/11:
        nps_indiv.append(4)
    elif -1/11 < vs['compound'] <= 1/11:
        nps_indiv.append(5)
    elif 1/11 < vs['compound'] <= 3/11:
        nps_indiv.append(6)
    elif 3/11 < vs['compound'] <= 5/11:
        nps_indiv.append(7)
    elif 5/11 < vs['compound'] <= 7/11:
        nps_indiv.append(8)
    elif 7/11 < vs['compound'] <= 9/11:
        nps_indiv.append(9)
    else:
        nps_indiv.append(10)
    
    # Map nps_indiv scores to NPS categories
    if nps_indiv[-1] >= 9:  # Promoters
        nps_category.append('Promoter')
    elif nps_indiv[-1] >= 7:  # Passives
        nps_category.append('Passive')
    else:  # Detractors
        nps_category.append('Detractor')

# Create dataframe
score_df = pd.DataFrame({
    'Review': review_texts,
    'Positive Score': positive_scores,
    'Negative Score': negative_scores,
    'Neutral Score': neutral_scores,
    'Compound Score': compound_scores,
    'nps_indiv': nps_indiv,
    'nps_category': nps_category  # Adding the new column for NPS categories
})

# Display the dataframe
print(score_df)

                                                Review  Positive Score  \
0    Great banking app with attractive interest rat...           0.367   
1    A bank like no other, no bank have such amazin...           0.147   
2    Notice that the drop in interest rate of 0.8% ...           0.201   
3    Sending money into my GXS account is a breeze ...           0.059   
4    I have to say that the UI/UX is one of the bes...           0.141   
..                                                 ...             ...   
413  Not ready to roll out completely. Aint even al...           0.115   
414                                       Can't work .           0.000   
415  Can not download yet, just always show pending...           0.000   
416  Looks cool and sleek! Can I get an invite if I...           0.208   
417  It's doesn't work, they're just trying to coll...           0.079   

     Negative Score  Neutral Score  Compound Score  nps_indiv nps_category  
0             0.024          0.609

In [56]:
#NPS (pretend the topic splitting actually works)
import pandas as pd
import numpy as np

def net_promoter_score(score_df, topic_column):
    topics = score_df[topic_column].unique()  # Get unique topics from the specified column
    topic_results = {}  # Dictionary to store results for each topic

    for topic in topics:
        # Filter the DataFrame for the current topic
        topic_df = score_df[score_df[topic_column] == topic]

        # Count the occurrences of each label
        label_counts = topic_df['nps_category'].value_counts()

         # Calculate Net Promoter Score (NPS)
        promoter_count = label_counts.get('Promoter', 0)
        detractor_count = label_counts.get('Detractor', 0)
        passive_count = label_counts.get('Passive', 0)
        total_count = promoter_count + detractor_count + passive_count

        # Calculate NPS
        nps = ((promoter_count - detractor_count) / total_count) * 100

        # Store the result for the current category
        topic_results[topic] = round(nps, 2)

        
    return topic_results

# Example usage
example=score_df.loc[:9]
TOPICS = ['A', 'B']
example['Topic'] = np.random.choice(TOPICS, size=len(score_df.loc[:9])) 
print(example)


grouped_df = example.groupby('Topic')
for name, group in grouped_df:
    print(f"Topic: {name}")
    print(group)
    print()

# Calculate the percentage difference by category
nps_score = net_promoter_score(example, 'Topic')
print("NPS Score by topic:")
print(nps_score)


                                              Review  Positive Score  \
0  Great banking app with attractive interest rat...           0.367   
1  A bank like no other, no bank have such amazin...           0.147   
2  Notice that the drop in interest rate of 0.8% ...           0.201   
3  Sending money into my GXS account is a breeze ...           0.059   
4  I have to say that the UI/UX is one of the bes...           0.141   
5  Have been waiting for a slot for the account s...           0.206   
6  Great app, with awesome design and theme unlik...           0.265   
7  Hit with fast reversal when transferring money...           0.099   
8  Very easy to use, user friendly and intuitive....           0.355   
9  What a joke! Nothing happens after you input y...           0.175   

   Negative Score  Neutral Score  Compound Score  nps_indiv nps_category Topic  
0           0.024          0.609          0.9622         10     Promoter     B  
1           0.051          0.801          0.9

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  example['Topic'] = np.random.choice(TOPICS, size=len(score_df.loc[:9]))
