#  Task 2.2: Topic Modeling & Keyword Extraction

---

##  Objective
Extract key themes and topics from the reviews to understand specific customer pain points and delights.

**Techniques:**
1. N-gram Analysis (Bigrams/Trigrams)
2. Keyword Extraction

**Output:** Insights on top themes per bank.

---

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from pathlib import Path

INPUT_FILE = Path("../data/processed/sentiment_results.csv")
df = pd.read_csv(INPUT_FILE)
print(f"Loaded {len(df)} records.")

Loaded 1500 records.


## ðŸ”‘ N-Gram Analysis Function

In [None]:
def get_top_ngrams(corpus, n=2, top_k=10):
    vec = CountVectorizer(ngram_range=(n, n), stop_words='english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq = sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:top_k]

def analyze_bank(bank_name):
    print(f"\n=== {bank_name} Analysis ===")
    subset = df[df['bank_name'] == bank_name]
    
    # Cleaned text corpus
    corpus = subset['cleaned_text'].dropna().astype(str).tolist()
    
    # Bigrams
    print("Top 5 Themes (Bigrams):")
    for phrase, freq in get_top_ngrams(corpus, n=2, top_k=5):
        print(f"  - {phrase}: {freq}")
        
    # Negative Bigrams (Pain Points)
    neg_corpus = subset[subset['sentiment_label'] == 'Negative']['cleaned_text'].dropna().astype(str).tolist()
    if neg_corpus:
        print("Top 3 Pain Points (Negative Bigrams):")
        for phrase, freq in get_top_ngrams(neg_corpus, n=2, top_k=3):
            print(f"  - {phrase}: {freq}")

##  Execution

In [None]:
for bank in df['bank_name'].unique():
    analyze_bank(bank)


=== CBE Analysis ===
Top 5 Themes (Bigrams):


  - doesn work: 30
  - mobile banking: 29
  - transaction history: 25
  - using app: 24
  - easy use: 22
Top 3 Pain Points (Negative Bigrams):
  - doesn work: 18
  - mobile banking: 10
  - error message: 9

=== BOA Analysis ===
Top 5 Themes (Bigrams):
  - mobile banking: 49
  - banking app: 35
  - doesn work: 31
  - developer options: 19
  - worst app: 18
Top 3 Pain Points (Negative Bigrams):
  - mobile banking: 22
  - banking app: 19
  - worst app: 16

=== Dashen Analysis ===
Top 5 Themes (Bigrams):


  - dashen bank: 73
  - super app: 64
  - easy use: 32
  - dashen super: 29
  - user friendly: 29
Top 3 Pain Points (Negative Bigrams):
  - worst app: 8
  - banking app: 6
  - worst banking: 4
