### Objective : 
#### 1Ô∏è‚É£ Build the Intent Confusion Matrix
- true intent vs predicted intent
- identify top confusion pairs

#### 2Ô∏è‚É£ Analyze Fallback Patterns
- Which intents trigger fallback most?
- Which utterances fail?


### !pip install scikit-learn



In [7]:
import json
import pandas as pd
from pathlib import Path
from sklearn.metrics import confusion_matrix


#### Load JSON data


In [9]:
data_path = Path("../data/raw_conversations/banking_conversations.json")

with open(data_path, "r") as f:
    conversations = json.load(f)

df = pd.DataFrame(conversations)
df.head(10)

Unnamed: 0,conversation_id,timestamp,channel,user_utterance,true_intent,predicted_intent,confidence_score,entities,fallback_triggered,escalated_to_agent,resolved
0,conv_001,2025-01-05T10:15:00,chat,What is my checking account balance?,Check_Account_Balance,Check_Account_Balance,0.92,{'account_type': 'checking'},False,False,True
1,conv_002,2025-01-05T10:18:00,chat,I see a charge I don't recognize from Amazon,Dispute_Transaction,Transaction_History,0.61,{'merchant_name': 'Amazon'},False,True,False
2,conv_003,2025-01-05T10:22:00,chat,My debit card was stolen,Card_Lost_Or_Stolen,Card_Lost_Or_Stolen,0.95,{},False,False,True
3,conv_004,2025-01-05T10:30:00,voice,Can you tell me when my last five transactions...,Transaction_History,Transaction_History,0.88,{'transaction_count': 5},False,False,True
4,conv_005,2025-01-05T10:35:00,chat,I need help updating my phone number,Update_Personal_Details,Default_Fallback,0.42,{'field': 'phone_number'},True,True,False


In [10]:


conf_matrix = pd.crosstab(
    df["true_intent"],
    df["predicted_intent"],
    normalize="index"
)

conf_matrix


predicted_intent,Card_Lost_Or_Stolen,Check_Account_Balance,Default_Fallback,Transaction_History
true_intent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Card_Lost_Or_Stolen,1.0,0.0,0.0,0.0
Check_Account_Balance,0.0,1.0,0.0,0.0
Dispute_Transaction,0.0,0.0,0.0,1.0
Transaction_History,0.0,0.0,0.0,1.0
Update_Personal_Details,0.0,0.0,1.0,0.0


In [11]:
len(df)

5

## üß†Interpreting Confusion Matrix (Like a Pro)
#### Rows = what the customer actually wanted (true_intent)
#### Columns = what the bot thought they wanted (predicted_intent)
#### Diagonal values = correct predictions
#### Off-diagonal values = errors

## üîç Typical Banking Confusion Patterns:
#### 1Ô∏è‚É£ Dispute_Transaction ‚Üî Transaction_History

##### Why this happens:

- Utterances like:
    - ‚ÄúI see a charge I don‚Äôt recognize‚Äù

- Linguistically ambiguous:
    - Is the user asking what the charge is?
    - Or disputing it?

- Linguistic explanation:
    - Semantics overlap (both mention ‚Äútransaction‚Äù)
    - Pragmatic intent differs (informational vs action)

#### üëâ Business impact:
- High-risk intent being misrouted ‚Üí poor CX + compliance risk


## Update_Personal_Details ‚Üí Default_Fallback

#### Why this happens:

- Users phrase updates in many ways:
    - ‚ÄúChange my number‚Äù
    - ‚ÄúUpdate contact info‚Äù
    - ‚ÄúFix my profile‚Äù

- Linguistic explanation:
    - High lexical variability
    - Missing paraphrases in training data

#### üëâ Business impact:
- Unnecessary escalation ‚Üí higher call center cost

## Check_Account_Balance ‚Üí Correct (High Accuracy)

#### Why this works well:
- Clear keywords:
    - ‚Äúbalance‚Äù
    - ‚Äúchecking‚Äù
    - ‚Äúsavings‚Äù

- Linguistic explanation:
    - Strong lexical cues
    - Low ambiguity

#### üëâ Business impact:
- High containment ‚Üí cost savings


# Intent Confusion Analysis

- The confusion matrix reveals frequent misclassification between transaction-related intents, particularly between Dispute_Transaction and Transaction_History, driven by semantic overlap in user language.
- Profile and account update requests exhibit higher fallback rates due to significant linguistic variation in how users express personal detail changes.
- Balance inquiry intents demonstrate high classification accuracy, indicating strong keyword-based intent signals and sufficient training coverage.
- These findings suggest the need for improved intent differentiation, enhanced training utterance diversity, and refined entity-based disambiguation strategies.