## Customer Call Analytics using NLP & Semantic Embeddings

This project applies end-to-end Natural Language Processing (NLP) techniques to customer service call data, including speech-to-text, sentiment analysis, named entity recognition, semantic similarity, and unsupervised clustering to extract actionable business insights

# !pip install SpeechRecognition
# !pip install pydub
# !pip install spacy
# !python3 -m spacy download en_core_web_sm


In [2]:
import pandas as pd

import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

import speech_recognition as sr
from pydub import AudioSegment

import spacy


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\sapan\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


### Audio Download & Speech-to-Text (ASR)
- Convert a customer call audio recording into text using automatic speech recognition.

In [12]:
import requests

url = "https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav"
output_file = "sample_customer_call.wav"

response = requests.get(url)
with open(output_file, "wb") as f:
    f.write(response.content)

print("Audio file downloaded as sample_customer_call.wav")


Audio file downloaded as sample_customer_call.wav


Note: Audio is downloaded for demonstration purposes. Main analysis uses provided call transcripts.

### Sentiment Analysis (VADER)

**Predict sentiment of each customer call using VADER, a lexicon-based sentiment analyzer optimized for conversational text.**

In [16]:
df = pd.read_csv("customer_call.csv")

In [17]:
df.head()

Unnamed: 0,index,text,sentiment_label
0,0,how's it going Arthur I just placed an order w...,negative
1,1,yeah hello I'm just wondering if I can speak t...,neutral
2,2,hey I receive my order but it's the wrong size...,negative
3,3,hi David I just placed an order online and I w...,neutral
4,4,hey I bought something from your website the o...,negative


**Initialize VADER sentiment model**

VADER is lexicon + rule-based (not deep learning).
It outputs 4 scores:
- pos, neu, neg (0–1)
- compound (-1 to +1) overall sentiment

In [18]:
sid = SentimentIntensityAnalyzer()

In [19]:
# Analyze sentiment by evaluating compound score generated by Vader SentimentIntensityAnalyzer
def find_sentiment(text):
    scores = sid.polarity_scores(text)
    compound_score = scores['compound']

    if compound_score >= 0.05:
        return 'positive'
    elif compound_score <= -0.05:
        return 'negative'
    else:
        return 'neutral'

**Apply predictions to each row**

In [20]:
df['sentiment_predicted'] = df.apply(lambda row: find_sentiment(row["text"]), axis=1)


In [None]:
true_positive = len(df.loc[
    (df['sentiment_predicted'] == df['sentiment_label']) &
    (df['sentiment_label'] == 'positive')
])

In [54]:
true_positive

2

In [34]:
errors = df[df["sentiment_label"] != df["sentiment_predicted"]]
errors[["text", "sentiment_label", "sentiment_predicted"]].head()


Unnamed: 0,text,sentiment_label,sentiment_predicted
14,I've just bought a product new guys and I want...,neutral,positive
58,I purchase something from your online store ye...,negative,positive
93,I got my order yesterday and the order number ...,neutral,positive
98,hi I recently ordered a new phone and I'm just...,neutral,positive
47,hi I placed an order a couple days ago and I w...,negative,positive


Misclassifications often occur with polite complaints or implicit dissatisfaction, highlighting limitations of rule-based sentiment models.

In [55]:
from sklearn.metrics import classification_report
print(classification_report(df["sentiment_label"], df["sentiment_predicted"]))

              precision    recall  f1-score   support

    negative       0.75      0.56      0.64        43
     neutral       0.70      0.49      0.58        57
    positive       0.07      1.00      0.12         2

    accuracy                           0.53       102
   macro avg       0.51      0.68      0.45       102
weighted avg       0.71      0.53      0.59       102



**Negative**

- When the model predicts negative, it’s usually right (75%)
- But it misses ~44% of actual negative complaints

**Why**

- Customers often complain politely
- VADER struggles with indirect dissatisfaction (“I was just wondering why…”)

**Neutral**

- Neutral is frequently confused with negative or positive
- Model finds only half of true neutral calls

**Why**

- Customer service language is ambiguous
- Many neutral calls contain emotional words


What’s happening

- There are only 2 positive examples
- Model predicts “positive” too easily
- Almost all positive predictions are wrong

**This is class imbalance,**

In [57]:
errors.shape

(48, 7)

## TASK 3 — Named Entity Recognition (NER)

**Extract named entities (e.g., people, dates) from customer conversations.**

In [22]:
nlp = spacy.load("en_core_web_sm")

In [23]:
def extract_entities(text):
    doc = nlp(text)
    entities = [ent.text for ent in doc.ents]
    return entities


In [None]:
df['named_entities'] = df['text'].apply(extract_entities)

In [25]:
df.head()

Unnamed: 0,index,text,sentiment_label,sentiment_predicted,named_entities
0,0,how's it going Arthur I just placed an order w...,negative,negative,[Arthur]
1,1,yeah hello I'm just wondering if I can speak t...,neutral,positive,[yesterday]
2,2,hey I receive my order but it's the wrong size...,negative,negative,[]
3,3,hi David I just placed an order online and I w...,neutral,neutral,[David]
4,4,hey I bought something from your website the o...,negative,neutral,[]


**Find most frequent entity overall**

In [26]:
all_entities = [ent for entities in df['named_entities'] for ent in entities]
entities_df = pd.DataFrame(all_entities, columns=['entity'])
entities_counts = entities_df['entity'].value_counts().reset_index()
entities_counts.columns = ['entity', 'count']
most_freq_ent = entities_counts["entity"].iloc[0]

In [58]:
most_freq_ent

'yesterday'

### Named Entity Analysis

Named Entity Recognition (NER) was applied using spaCy’s `en_core_web_sm`
model to extract entities from customer call transcripts.

The most frequently occurring entity was **"yesterday"**, classified as a
temporal (`DATE`) entity. This reflects the conversational nature of customer
support interactions, where callers frequently reference recent events such as
order placement or delivery timing.

This insight highlights the importance of temporal context in customer
complaints and suggests that time-based features could be valuable for
downstream analysis or escalation workflows.


## Find most similar complaint (Semantic Similarity)

**Process each call into a spaCy Doc**

In [28]:
df['processed_text'] = df['text'].apply(lambda text: nlp(text))

In [29]:
input_query = "wrong package delivery"
processed_query = nlp(input_query)


In [30]:
df['similarity'] = df['processed_text'].apply(
    lambda text: processed_query.similarity(text)
)


  lambda text: processed_query.similarity(text)


In [31]:
df = df.sort_values(by='similarity', ascending=False)
most_similar_text = df["text"].iloc[0]
print("Most similar text: ", most_similar_text)


Most similar text:  wrong package delivered


## Semantic Similarity (Modern)

Improve similarity search using transformer-based sentence embeddings.

In [43]:
from sentence_transformers import SentenceTransformer, util

embedder = SentenceTransformer("all-MiniLM-L6-v2")


  from .autonotebook import tqdm as notebook_tqdm
Loading weights: 100%|██████████| 103/103 [00:00<00:00, 369.06it/s, Materializing param=pooler.dense.weight]                             
BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


In [44]:
def compute_similarity(corpus, query):
    corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)
    query_embedding = embedder.encode(query, convert_to_tensor=True)
    scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
    return scores.cpu().numpy()

In [45]:
# Semantic Similarity
query = "wrong package delivery"
df["similarity"] = compute_similarity(df["text"].tolist(), query)

In [47]:
df = df.sort_values("similarity", ascending=False)

In [48]:
df.head()

Unnamed: 0,index,text,sentiment_label,sentiment_predicted,named_entities,processed_text,similarity
81,81,wrong package delivered,negative,negative,[],"(wrong, package, delivered)",0.938691
41,41,the shipment I received is wrong,negative,negative,[],"(the, shipment, I, received, is, wrong)",0.726414
44,44,the shipment I received is wrong,negative,negative,[],"(the, shipment, I, received, is, wrong)",0.726414
33,33,a couple of days ago I got a message saying th...,negative,neutral,[a couple of days ago],"(a, couple, of, days, ago, I, got, a, message,...",0.653485
39,39,hello someone from your team delivered my pack...,negative,negative,[today],"(hello, someone, from, your, team, delivered, ...",0.636427


In [49]:
print("\nMost similar complaint to query:\n")
print(df.iloc[0]["text"])


Most similar complaint to query:

wrong package delivered


## Optional Clustering (Unsupervised)

In [51]:
from sklearn.cluster import KMeans
def cluster_texts(embeddings, k=4):
    model = KMeans(n_clusters=k, random_state=42)
    return model.fit_predict(embeddings)

In [52]:
embeddings = embedder.encode(df["text"].tolist())
df["cluster"] = cluster_texts(embeddings, k=4)

In [53]:
print("\nSample clustered complaints:\n")
for c in range(4):
    print(f"\nCluster {c}:")
    print(df[df["cluster"] == c]["text"].head(2).values)


Sample clustered complaints:


Cluster 0:
['I just placed an order I was wondering how long shipping time would be expected to be'
 'hi I just recently placed an order with your company I was just wondering if you know the status of my shipment']

Cluster 1:
["I'm calling out to talk about a package I got yesterday it so I got it but I need to do I need some help with setting it up"
 'hey mate how you doing just calling in regards to the phone I just purchased from you guys faulty not working and there was damaged on the way here']

Cluster 2:
['wrong package delivered' 'the shipment I received is wrong']

Cluster 3:
["just received the product from you guys and it didn't meet my expectations can I please get a refund"
 "hey I receive my order but it's the wrong size can I get a refund please"]


### Unsupervised Complaint Clustering

Customer complaints were embedded using a transformer-based sentence
embedding model and grouped using KMeans clustering.

The model identified four coherent complaint categories without the use of
labels:

- **Cluster 0 – Order Status & Shipping**: inquiries about delivery time and
  shipment status
- **Cluster 1 – Product Issues & Setup**: damaged products and setup assistance
- **Cluster 2 – Wrong Delivery**: incorrect or mismatched shipments
- **Cluster 3 – Refunds & Returns**: dissatisfaction, wrong size, or refund
  requests

The resulting clusters align closely with real-world customer support
workflows, demonstrating the effectiveness of semantic embeddings for
automated issue categorization.
