<p align="center" width="100%">
    <img width="40%" src="customer_support_icon.JPG"> 
</p>

A retail company is on a transformative journey, aiming to elevate their customer services through cutting-edge advancements in Speech Recognition and Natural Language Processing (NLP). As the machine learning engineer for this initiative, you are tasked with developing functionalities that not only convert customer support audio calls into text but also explore methodologies to extract insights from transcribed texts.

In this dynamic project, we leverage the power of `SpeechRecognition`, `Pydub`, and `spaCy` – three open-source packages that form the backbone of your solution. Your objectives are:
  - Transcribe a sample customer audio call, stored at `sample_customer_call.wav`, to showcase the power of open-source speech recognition technology.
  - Analyze sentiment, identify common named entities, and enhance user experience by searching for the most similar customer calls based on a given query from a subset of their pre-transcribed call data, stored at `customer_call_transcriptions.csv`.

This project is an opportunity to unlock the potential of machine learning to revolutionize customer support. Let's delve into the interplay between technology and service excellence.

In [24]:
!pip install SpeechRecognition
!pip install pydub
!pip install spacy
!python3 -m spacy download en_core_web_sm

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-sm==3.6.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m150.6 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [25]:
# Import required libraries
import pandas as pd

import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

import speech_recognition as sr
from pydub import AudioSegment

import spacy

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/repl/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [26]:
recognizer= sr.Recognizer()
transcribed_text_audio= sr.AudioFile('sample_customer_call.wav')


In [27]:
with transcribed_text_audio as source:
    transcribed_audio = recognizer.record(source)
transcribed_text= recognizer.recognize_google(transcribed_audio)

In [28]:
print(transcribed_text)

hello I'm experiencing an issue with your product I'd like to speak to someone about a replacement


In [29]:
audio_segment = AudioSegment.from_file('sample_customer_call.wav', format='wav')

In [30]:
number_of_channels= audio_segment.channels
frame_rate= audio_segment.frame_rate

print("Number of Channels : ", number_of_channels)
print("Frame Rate : ", frame_rate)

Number of Channels :  1
Frame Rate :  44100


In [31]:
df= pd.read_csv("customer_call_transcriptions.csv")
sid= SentimentIntensityAnalyzer()

def find_sentiment(text):
    scores= sid.polarity_scores(text)
    compound_score= scores['compound']
    
    if compound_score>=0.05:
        return 'positive'
    elif compound_score<=-0.05:
        return 'negative'
    return 'neutral'

df['sentiment_predicted']= df.apply(lambda row: find_sentiment(row['text']), axis=1)

In [32]:
true_positive= len(df.loc[(df['sentiment_predicted'] == df['sentiment_label']) &
                         (df['sentiment_label'] == 'positive')])

print("True Positives: ", true_positive)

True Positives:  2


In [33]:
nlp= spacy.load('en_core_web_sm')

def extract_entities(text):
    doc= nlp(text)
    entities= [ent for ent in doc.ents]
    return entities

df['named_entities'] = df['text'].apply(extract_entities)

In [34]:
all_entities= [ent for entities in df['named_entities'] for ent in entities]

In [35]:
print(all_entities)

[Arthur, yesterday, David, the other day, the other day, Caesar, this morning, Michael, yesterday, 64321, this morning, Jacob, yesterday, this morning, yesterday, a couple of days ago, that day, this morning, today, yesterday, Jacob, Daniel I, yesterday, a couple days ago, the other day, two, two, two, yesterday, yesterday, yesterday, today, yesterday, the Wrong Colours, this afternoon, this afternoon, yesterday, Tony, John, yesterday, Steve, the last 30 minutes, Australian, Australian, November the 3rd, yesterday, 64321, this morning, AUD, yesterday, 1863 3845, iPhone, two, yesterday]


In [38]:
# Updated code
entities_df = pd.DataFrame(all_entities)
entities_df = entities_df.explode(entities_df.columns[0])
entities_count = entities_df[entities_df.columns[0]].value_counts().reset_index()
entities_count.columns = ['entity', 'count']

In [40]:
entities_count.head()

Unnamed: 0,entity,count
0,Arthur,1
1,Steve,1
2,yesterday,1
3,yesterday,1
4,today,1


In [42]:
most_freq_ent = entities_count["entity"].iloc[0]
print("Most frequent entity: ", most_freq_ent)

Most frequent entity:  Arthur


In [43]:
nlp= spacy.load('en_core_web_sm')
df['processed_text']= df['text'].apply(lambda text: nlp(text))

input_query= "wrong package delivery"
processed_query = nlp(input_query)

df['similarity'] = df['processed_text']. apply(lambda text: processed_query.similarity(text))

df=df.sort_values(by='similarity', ascending=False)

most_similar_text = df["text"].iloc[0]
print("Most similar text: ", most_similar_text)

Most similar text:  wrong package delivered
