In [3]:
import json
import pandas as pd
from pathlib import Path


In [4]:
data_path = Path("../data/raw_conversations/banking_conversations.json")

with open(data_path, "r") as f:
    conversations = json.load(f)

df = pd.DataFrame(conversations)
df.head()


Unnamed: 0,conversation_id,timestamp,channel,user_utterance,true_intent,predicted_intent,confidence_score,entities,fallback_triggered,escalated_to_agent,resolved
0,conv_001,2025-01-05T10:15:00,chat,What is my checking account balance?,Check_Account_Balance,Check_Account_Balance,0.92,{'account_type': 'checking'},False,False,True
1,conv_002,2025-01-05T10:18:00,chat,I see a charge I don't recognize from Amazon,Dispute_Transaction,Transaction_History,0.61,{'merchant_name': 'Amazon'},False,True,False
2,conv_003,2025-01-05T10:22:00,chat,My debit card was stolen,Card_Lost_Or_Stolen,Card_Lost_Or_Stolen,0.95,{},False,False,True
3,conv_004,2025-01-05T10:30:00,voice,Can you tell me when my last five transactions...,Transaction_History,Transaction_History,0.88,{'transaction_count': 5},False,False,True
4,conv_005,2025-01-05T10:35:00,chat,I need help updating my phone number,Update_Personal_Details,Default_Fallback,0.42,{'field': 'phone_number'},True,True,False


In [5]:
df.info


<bound method DataFrame.info of   conversation_id            timestamp channel  \
0        conv_001  2025-01-05T10:15:00    chat   
1        conv_002  2025-01-05T10:18:00    chat   
2        conv_003  2025-01-05T10:22:00    chat   
3        conv_004  2025-01-05T10:30:00   voice   
4        conv_005  2025-01-05T10:35:00    chat   

                                      user_utterance              true_intent  \
0               What is my checking account balance?    Check_Account_Balance   
1       I see a charge I don't recognize from Amazon      Dispute_Transaction   
2                           My debit card was stolen      Card_Lost_Or_Stolen   
3  Can you tell me when my last five transactions...      Transaction_History   
4               I need help updating my phone number  Update_Personal_Details   

        predicted_intent  confidence_score                      entities  \
0  Check_Account_Balance              0.92  {'account_type': 'checking'}   
1    Transaction_History    

In [None]:
##Sanity Checks

In [12]:
df.isna()

Unnamed: 0,conversation_id,timestamp,channel,user_utterance,true_intent,predicted_intent,confidence_score,entities,fallback_triggered,escalated_to_agent,resolved,intent_correct
0,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False


In [14]:
##Find sum of null values

In [7]:
df.isna().sum()

conversation_id       0
timestamp             0
channel               0
user_utterance        0
true_intent           0
predicted_intent      0
confidence_score      0
entities              0
fallback_triggered    0
escalated_to_agent    0
resolved              0
dtype: int64

In [15]:
## Compute Intent Accuracy

In [10]:
df["intent_correct"] = df["true_intent"] == df["predicted_intent"]
df["intent_correct"]

0     True
1    False
2     True
3     True
4    False
Name: intent_correct, dtype: bool

In [11]:
intent_accuracy = df["intent_correct"].mean()
intent_accuracy

np.float64(0.6)

In [16]:
## Fallback & Escalation Rates

In [18]:
fallback_rate = df["fallback_triggered"].mean()
fallback_rate

np.float64(0.2)

In [19]:
escalation_rate = df["escalated_to_agent"].mean()
escalation_rate

np.float64(0.4)

##  Metrics:
- Intent accuracy = 0.6 (60%)
- Fallback rate = 0.4 (40%)
- Escalation rate = 0.2 (20%)

### High fallback ‚Üí intent coverage gaps

### High escalation ‚Üí cost & CX impact

## Initial Findings

- Intent accuracy is 6%, indicating misclassification issues in certain banking intents.

- Fallback and escalation rates suggest opportunities to improve intent coverage and entity extraction.

- Transaction-related queries show higher escalation risk due to ambiguity.

## What Each Metric Really Means (Not Just the Definition)
#### 1Ô∏è‚É£Intent Accuracy = 60%

#### What it means:
- The chatbot correctly understood the user‚Äôs intent 6 out of 10 times.
- 4 out of 10 conversations were misunderstood

### Why this matters in banking:

- Banking intents are often semantically close
(‚Äútransaction issue‚Äù vs ‚Äúdispute‚Äù vs ‚Äúhistory‚Äù)

- A 60% accuracy is below acceptable production standards
(real-world target: 85‚Äì90%+)

### Business impact:

##### Misclassified intents lead to:
- Wrong responses
- Customer frustration
- Increased fallback & escalation

### üëâInsight:
- Intent taxonomy and training data need refinement, especially for transaction-related flows.
-------------------------------------------------------------------------------------------------------------------
#### 2Ô∏è‚É£ Fallback Rate = 40%

#### What it means:
- In 4 out of 10 conversations, the bot could not confidently handle the request
- The bot likely said something like:

‚ÄúSorry, I didn‚Äôt understand that.‚Äù

#### Why this is a red flag:

#### Fallbacks indicate:
- Missing intents
- Poor utterance coverage
- Entity extraction failures

#### Banking-specific risk:
- Customers expect high precision in financial queries
- Frequent fallback reduces trust in AI systems

###üëâ Insight:
- There are clear intent coverage gaps and insufficient linguistic variation in training utterances.

#### 3Ô∏è‚É£ Escalation Rate = 20%

#### What it means:
- 1 in 5 conversations required a human agent
- Why this matters to the business:
- Every escalation = cost
- Human agents are expensive
- Chatbots are deployed to reduce this

#### But here‚Äôs the nuance (important):
- Some escalations are good (fraud, disputes)
- But unnecessary escalations = AI failure

#### üëâ Insight:
- Escalations are driven by low confidence and ambiguity in transaction-related queries.

### üîó How These Metrics Connect (This Is the Smart Part)

#### These metrics are not independent:
- Low Intent Accuracy
        ‚Üì
- High Fallback Rate
        ‚Üì
- Higher Escalations
        ‚Üì
- Higher Cost + Poor CX


#### Your data tells a coherent story:
- Misclassification ‚Üí fallback ‚Üí escalation
- That‚Äôs exactly what AI performance analysts look for.

## üìù Summary:

#### Initial Findings:
- The conversational AI system demonstrates an intent classification accuracy of 60%, indicating notable misclassification across banking-related intents, particularly for transaction and dispute queries.

- A fallback rate of 40% suggests gaps in intent coverage and insufficient handling of linguistic variability in user utterances.

- An escalation rate of 20% highlights increased dependency on human agents, likely driven by low model confidence and ambiguity in transaction-related conversations.

- These metrics collectively indicate opportunities to improve intent taxonomy, training data quality, and entity extraction to reduce fallback occurrences and optimize customer experience while lowering operational costs.

### Key Note:

‚ÄúI analyzed conversational logs, identified intent misclassification and fallback drivers, and translated NLP performance metrics into customer experience and cost-impact insights.‚Äù



### üí° CX = Customer Experience

#### Customer Experience (CX) is the overall impression a customer has when interacting with a company ‚Äî especially how easy, fast, and satisfying that interaction feels.

### In conversational AI, CX is shaped by things like:

- Did the bot understand me?
- Did I get my answer quickly?
- Did I have to repeat myself?
- Did I end up needing a human anyway?

### üß† CX in the Context of our Project

### When we talk about CX here, we‚Äôre specifically referring to the user‚Äôs experience with the banking chatbot.

##### Good CX looks like:
- User asks a question ‚Üí bot understands correctly
- Minimal back-and-forth
- No fallback messages
- Issue resolved in one flow

##### Poor CX looks like:
- ‚ÄúSorry, I didn‚Äôt understand that‚Äù
- Wrong intent ‚Üí wrong response
- Multiple clarifications
- Forced escalation to an agent

#### üîó How CX Connects to our Metrics

##### Let‚Äôs map it to the numbers we already calculated:
- 60% intent accuracy ‚Üí 40% of users misunderstood ‚Üí frustrating
- 40% fallback rate ‚Üí frequent ‚ÄúI didn‚Äôt get that‚Äù moments
- 20% escalation rate ‚Üí users pushed to human agents

##### üëâ All of these directly degrade CX.

#### ‚ÄúEnhance customer experience‚Äù:
- Reduce friction
- Improve accuracy
- Resolve issues faster
- Build trust in AI systems

### Key Note: 
‚ÄúIn conversational AI, CX refers to how effectively and smoothly users can resolve their issues without confusion, fallbacks, or unnecessary escalation to human agents.‚Äù