### Sentiment and Thematic Analysis
#### Quantify review sentiment and identify themes to uncover satisfaction drivers and pain points.
* Sentiment Analysis with distilbert-base-uncased-finetuned-sst-2-english
* Thematic Analysis

#### Load The SCraped and Cleaned Data

In [1]:
# import package 
import pandas as pd
df=pd.read_csv("../data/processed/bank_reviews_clean.csv")
df.head(5)


Unnamed: 0,review_id,review,rating,date,bank,source
0,5112423d-e618-44ba-ba49-62677cb76cd6,This application is very important and advanta...,5,2025-11-28,Commercial Bank of Ethiopia,Google Play Store
1,bcb34681-1dd4-4781-b400-4393bb10b1d9,why didn't work this app?,1,2025-11-28,Commercial Bank of Ethiopia,Google Play Store
2,c69f051a-00f8-4144-8423-b7ebcd328d2d,The app makes our life easier. Thank you CBE!,5,2025-11-28,Commercial Bank of Ethiopia,Google Play Store
3,f8002d06-b5c5-4ed1-9d51-a9a379304cf8,the most advanced app. but how to stay safe?,5,2025-11-27,Commercial Bank of Ethiopia,Google Play Store
4,81000db5-aa51-467e-826c-fc96160e96a8,Good application,4,2025-11-27,Commercial Bank of Ethiopia,Google Play Store


#### Sentiment Analysis Using Huggingface DistilBert

In [2]:
from transformers import pipeline


sentiment_model = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)

def get_sentiment(text):
    if not isinstance(text, str) or text.strip() == "":
        return None
    result = sentiment_model(text[:500])[0]  # truncate very long reviews
    return result['label'], result['score']

df['sentiment_label'], df['sentiment_score'] = zip(*df['review'].apply(get_sentiment))


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Error while downloading from https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/model.safetensors: HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Read timed out.
Trying to resume download...
Device set to use cpu


In [3]:
df.head(5)

Unnamed: 0,review_id,review,rating,date,bank,source,sentiment_label,sentiment_score
0,5112423d-e618-44ba-ba49-62677cb76cd6,This application is very important and advanta...,5,2025-11-28,Commercial Bank of Ethiopia,Google Play Store,POSITIVE,0.998468
1,bcb34681-1dd4-4781-b400-4393bb10b1d9,why didn't work this app?,1,2025-11-28,Commercial Bank of Ethiopia,Google Play Store,NEGATIVE,0.999132
2,c69f051a-00f8-4144-8423-b7ebcd328d2d,The app makes our life easier. Thank you CBE!,5,2025-11-28,Commercial Bank of Ethiopia,Google Play Store,POSITIVE,0.999696
3,f8002d06-b5c5-4ed1-9d51-a9a379304cf8,the most advanced app. but how to stay safe?,5,2025-11-27,Commercial Bank of Ethiopia,Google Play Store,NEGATIVE,0.95651
4,81000db5-aa51-467e-826c-fc96160e96a8,Good application,4,2025-11-27,Commercial Bank of Ethiopia,Google Play Store,POSITIVE,0.999855


#### Aggregate Sentiment by Bank and Rating

In [7]:
bank_rating_sentiment=df.groupby(['bank','rating']).agg({'sentiment_score':'mean'}).reset_index()
print(bank_rating_sentiment)


                           bank  rating  sentiment_score
0             Bank of Abyssinia       1         0.992800
1             Bank of Abyssinia       2         0.966670
2             Bank of Abyssinia       3         0.998618
3             Bank of Abyssinia       4         0.969993
4             Bank of Abyssinia       5         0.975590
5   Commercial Bank of Ethiopia       1         0.992261
6   Commercial Bank of Ethiopia       2         0.992607
7   Commercial Bank of Ethiopia       3         0.978476
8   Commercial Bank of Ethiopia       4         0.977407
9   Commercial Bank of Ethiopia       5         0.981973
10                  Dashen Bank       1         0.988763
11                  Dashen Bank       2         0.996925
12                  Dashen Bank       3         0.970207
13                  Dashen Bank       4         0.944521
14                  Dashen Bank       5         0.988043
