<a href="https://colab.research.google.com/github/Parissai/ML-DS-Playground/blob/main/customer_support_call.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Customer Support Calls

1. Is the audio compatible for future speech recognition modeling?
  - Convert sample_customer_call.wav into text and store the result in transcribed_text.
  - Find the frame rate and number of channels of this audio and save your answer as two numeric variables: frame_rate, number_channels.

2. How many calls have a true positive sentiment?

  - Perform sentiment analysis on customer_call_transcriptions.csv and find the number of true positive predictions; save an integer value to true_positive.
  - Use the compound score in the vader module and threshold values of 0.05 and -0.05 to set a sentiment to positive, neutral or negative.

3. What is the most frequently named entity across all of the transcriptions?
  - Save your answer as a string variable most_freq_ent.

4. Which call is the most similar to "wrong package delivery"?

  - Save your answer as a string variable most_similar_text

In [None]:
!pip install SpeechRecognition
!pip install pydub
!pip install spacy
!python3 -m spacy download en_core_web_sm

Collecting SpeechRecognition
  Downloading SpeechRecognition-3.10.4-py2.py3-none-any.whl (32.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.8/32.8 MB[0m [31m39.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.10.4
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1
Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m82.8 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
ord

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install gitpython

Collecting gitpython
  Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.3/207.3 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gitdb<5,>=4.0.1 (from gitpython)
  Downloading gitdb-4.0.11-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython)
  Downloading smmap-5.0.1-py3-none-any.whl (24 kB)
Installing collected packages: smmap, gitdb, gitpython
Successfully installed gitdb-4.0.11 gitpython-3.1.43 smmap-5.0.1


In [None]:
import pandas as pd
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import speech_recognition as sr
from pydub import AudioSegment
import spacy

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


In [None]:
# Task 1 - Speech to Text: convert the sample audio call, sample_customer_call.wav, to text and store the result in transcribed_text

# Define a recognizer object
recognizer = sr.Recognizer()

url = "/content/drive/MyDrive/data/sample_customer_call.wav"

# Convert the audio file to audio data
transcribe_audio_file = sr.AudioFile(url)
with transcribe_audio_file as source:
    transcribe_audio = recognizer.record(source)

# Convert the audio data to text
transcribed_text = recognizer.recognize_google(transcribe_audio)

# Review trascribed text
print("Transcribed text: ", transcribed_text)

Transcribed text:  hello I'm experiencing an issue with your product I'd like to speak to someone about a replacement


In [None]:
# Task 1 - Speech to Text: store few statistics of the audio file such as number of channels, sample width and frame rate

# Review number of channels and frame rate of the audio file
audio_segment = AudioSegment.from_file(url)
number_channels = audio_segment.channels
frame_rate = audio_segment.frame_rate

# Review number of channels and frame rate
print("Number of channels: ", number_channels)
print("Frame rate: ", frame_rate)

Number of channels:  1
Frame rate:  44100


In [None]:
# Import customer call transcriptions data
df = pd.read_csv("/content/drive/MyDrive/data/customer_call_transcriptions.csv")
df.head()

Unnamed: 0,text,sentiment_label
0,how's it going Arthur I just placed an order w...,negative
1,yeah hello I'm just wondering if I can speak t...,neutral
2,hey I receive my order but it's the wrong size...,negative
3,hi David I just placed an order online and I w...,neutral
4,hey I bought something from your website the o...,negative


In [None]:
# Task 2 - Sentiment Analysis: use vader module from nltk library to determine the sentiment of each text of the customer_call_transcriptions.csv file and store them at a new sentiment_label column using compound score
sid = SentimentIntensityAnalyzer()

# Analyze sentiment by evaluating compound score generated by Vader SentimentIntensityAnalyzer
def find_sentiment(text):
    scores = sid.polarity_scores(text)
    compound_score = scores['compound']

    if compound_score >= 0.05:
        return 'positive'
    elif compound_score <= -0.05:
        return 'negative'
    else:
        return 'neutral'

df['sentiment_predicted'] = df.apply(lambda row: find_sentiment(row["text"]), axis = 1)

In [None]:
# Task 2 - Sentiment Analysis: calculate number of texts with positive label that are correctly labeled as positive
true_positive = len(df.loc[(df['sentiment_predicted'] == df['sentiment_label']) &
                (df['sentiment_label'] == 'positive')])

print("True positives: ", true_positive)

True positives:  2


In [None]:
# Task 3 - Named Entity Recognition: find named entities for each text in the df object and store entities in a named_entities column

# Load spaCy small English Language model
nlp = spacy.load("en_core_web_sm")

# NER using spaCy
def extract_entities(text):
    doc = nlp(text)
    entities = [ent.text for ent in doc.ents]
    return entities

# Apply NER to the entire text column
df['named_entities'] = df['text'].apply(extract_entities)

# Flatten the list of named entities
all_entities = [ent for entities in df['named_entities'] for ent in entities]

# Create a DataFrame with the counts
entities_df = pd.DataFrame(all_entities, columns=['entity'])
entities_counts = entities_df['entity'].value_counts().reset_index()
entities_counts.columns = ['entity', 'count']

# Extract most frequent named entity
most_freq_ent = entities_counts["entity"].iloc[0]
print("Most frequent entity: ", most_freq_ent)


Most frequent entity:  yesterday


In [None]:
# Task 4 - Find most similar text: find the list of customer calls that complained about "wrong package delivery" by finding similarity score of each text to the "wrong package delivery" string using spaCy small English Language model

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Process the text column
df['processed_text'] = df['text'].apply(lambda text: nlp(text))

# Input query
input_query = "wrong package delivery"
processed_query = nlp(input_query)

# Calculate similarity scores and sort dataframe with respect to similarity scores
df['similarity'] = df['processed_text'].apply(lambda text: processed_query.similarity(text))
df = df.sort_values(by='similarity', ascending=False)

# Find the most similar text
most_similar_text = df["text"].iloc[0]
print("Most similar text: ", most_similar_text)


Most similar text:  wrong package delivered


  df['similarity'] = df['processed_text'].apply(lambda text: processed_query.similarity(text))
