### Use Wisper model to Analyze the Audio wav files and convert them into Text

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

#### Conversational Loader 

In [6]:
import os
import time
## load Wisper API 
import json
import pandas as pd 
import whisper
from whisper.audio import SAMPLE_RATE
import torch 
from collections import OrderedDict
audio_path = '../data/Audio/'
print("Defaul sample rate selected :",SAMPLE_RATE)

Defaul sample rate selected : 16000


In [7]:
excel_path = "/mnt/e/Personal/Samarth/repository/NLP-Basic2Advanced/DMAC_project/data/Data_Mortgage.xlsx"
excel_data = pd.read_excel(excel_path)
excel_data.dropna(subset='Opportunity',inplace=True)
excel_data.head()

Unnamed: 0,Date,Company / Account,Opportunity,Unnamed: 3,Lead,Assigned,Priority,Status,Task,Ameyo Recording URL,Call Type,CallDurationInSeconds
1,4/4/2023,Mohammed Jaffer,Mohammed Jaffer,Mohammed Jaffer,,Yaseen Syed Ali,Low,Completed,True,https://prypto-api.aswat.co/surveillance/recor...,Outbound,422
2,4/4/2023,G Abbas,G Abbas,G Abbas,,Yaseen Syed Ali,Low,Completed,True,https://prypto-api.aswat.co/surveillance/recor...,Outbound,237
6,4/4/2023,Ahsan Khan,Ahsan Khan,Ahsan Khan,,Yaseen Syed Ali,Low,Completed,True,https://prypto-api.aswat.co/surveillance/recor...,Outbound,74
11,4/5/2023,Fayiqa Iftikhar,Fayiqa Iftikhar,Fayiqa Iftikhar,,Yaseen Syed Ali,Low,Completed,True,https://prypto-api.aswat.co/surveillance/recor...,Outbound,481
13,4/5/2023,Smith Suresh Shetty,Smith Suresh Shetty,Smith Suresh Shetty,,Yaseen Syed Ali,Low,Completed,True,https://prypto-api.aswat.co/surveillance/recor...,Outbound,269


In [8]:
def download_recording(data):
    url_link = data['Ameyo Recording URL']
    account_name = data['Company / Account']
    assigned_name = data['Assigned']
    os.system('wget {} {}'.format(url_link,'audio_{}_{}.mp3'.format(account_name,assigned_name)))
    return 
# excel_data.apply(lambda x: download_recording(x),axis=1)

In [9]:
# Loading the Whisper model into Device 
if torch.cuda.is_available():
    device = 'cuda'
else:
    device = 'cpu'

whisper_model_name = 'small' # hyperparameter -- Improving this will improve info 
stime = time.time()
wisper_model = whisper.load_model(whisper_model_name,device=device)
# emotion_classifier = pipeline("text-classification",model='bhadresh-savani/distilbert-base-uncased-emotion')
print("Models loaded into memory -- {:.2f}sec".format(time.time()-stime))

Models loaded into memory -- 5.53sec


In [10]:
data_path = '../data/Audio'
audio_files = [fname for fname in os.listdir(data_path) if fname.endswith(('mp3','wav'))]
print("Total number of audio files found : ",len(audio_files))

Total number of audio files found :  309


### Conversion from Speech to text with traslation and emotion Detection

The below functions load the audio file with Sampling rate of 16000 hz/s. We clip the audio file to 30 seconds to get the language of the conversation. If the language is not EN we translate and save the info in dict file else we simply translate . 

In [11]:
conversation_dict = OrderedDict()

#### Get Sentiment analysis From VADER

This will be saved as metadata for analyzing the sentiment of the person talking

In [12]:
# Sentiment Analyzer 
import nltk 
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sia=SentimentIntensityAnalyzer()

def get_sentiment(text_result):
    emotion = None 
    if sia.polarity_scores(text_result)['compound']>0:
        emotion = 'Positive'
    elif sia.polarity_scores(text_result)['compound']<0:
        emotion = 'Negative'
    else :
        emotion = 'Neutral '
    return emotion

In [None]:
def get_language(audio,duration=30):
    clip_audio = whisper.pad_or_trim(audio,length=SAMPLE_RATE*duration)
    result = wisper_model.transcribe(clip_audio)
    return result['language']

def speech2text(audio_file,meta_info=[]):
    audio = whisper.load_audio(audio_file)
    audio_language = get_language(audio)
    if meta_info is None:
        meta_info = ['','']
    if audio_language =='en':
        result = wisper_model.transcribe(audio)
    else:
        print("Spoken Language is not english.  Translating {} to english".format(audio_language))
        # result = wisper_model.transcribe(audio, task='translate')
        return 
    
    result = wisper_model.transcribe(audio_file)

    return {
            'filename':os.path.basename(audio_file),
            'content':{
                'text':result['text'],
                'language':audio_language,
                'Customer':meta_info[0],
                'Assigned': meta_info[1],
                'properties':{
                    'segments':len(result['segments']),
                    'audio_duration':audio.shape[0]/SAMPLE_RATE,
                    }
                }
            }

In [None]:
from tqdm import tqdm
conversation_dict['Data'] = list()
for fname in tqdm(audio_files):
    file_path = os.path.join(data_path,fname)
    meta_information = excel_data[['Opportunity','Assigned']][excel_data['Ameyo Recording URL'].str.contains(fname)].values.tolist()[0]
    conversation_dict['Data'].append(speech2text(audio_file=file_path,meta_info=meta_information))
    break

# with open("../data/Audio_data.json", "w") as outfile:
#     json.dump(conversation_dict, outfile,indent=4)

In [15]:
with open("../data/Audio_data.json", "w") as outfile:
    json.dump(conversation_dict, outfile,indent=4)

### Asking Chat gpt to make some conversations 

#### Load the Conversations as Json Loader 

In [8]:

# with open("../data/Audio_data.json" ,'r') as fd:
#     data = json.load(fd)

# for content in data['Data']:
#     content['content']['text'] = "{} is assigned to customer {}. Conversation:{}".format(content['content']['Assigned'],content['content']['Customer'],content['content']['text'])

# with open("../data/Audio_data_modified.json", "w") as outfile:
#     json.dump(data, outfile,indent=4)

In [24]:
from langchain.document_loaders import JSONLoader
from langchain.text_splitter import CharacterTextSplitter

def metadata_func(record: dict, metadata: dict) -> dict:
    metadata["language"] = record.get("language")
    metadata["audio_properties"] = record.get("properties")
    metadata["Customer"] = record.get("Customer")
    metadata["Assigned"] = record.get("Assigned")
    return metadata

loader = JSONLoader(
    file_path='../data/Audio_data_modified.json',
    jq_schema='.Data[].content',
    content_key="text",
    metadata_func=metadata_func
)

In [25]:
conversation = loader.load()
conversation[:4]

[Document(page_content="Juraira Manzoor is assigned to customer Dina Mariam. Conversation: Hello. Hello, good afternoon. Yes. Am I speaking to Ms. Deena? Deena Mariam. Yes, correct. I am Juradu calling you from Fripco Services. How are you doing today? So, what are you doing? Fripco Services. Regarding your enquiry for the mod gauge, I believe you are looking to buy one studio or one bedroom apartment. And you are... Yes, correct. Yeah, I have received an enquiry from AX properties from Ms. Jamal. Yes, yes. Mr. Jamal, sorry. Yes. Yeah, so Ms. Deena, I have to ask you a few questions so that I can have your profile and we can take your case forward. Yeah, you are a resident of Dubai, right? Yes. May I know how old are you? 25. 25. Yes. Okay. All right. Are you salaried or self-employed? Both. Salaried and self-employed. So, but I believe it's easier to get a mortgage as salary. So, let's say salary. You can get it as a self-employed as well, but it depends upon what is the turnover of t

In [26]:
text_splitter = CharacterTextSplitter(chunk_size=500,chunk_overlap=0) # hyperparameters 
texts = text_splitter.split_documents(documents=conversation)  # metadatas=[doc.metadata for doc in docs]


In [27]:
print("Total texts documents ",len(texts))
print(type(texts),type(texts[0]))

Total texts documents  60
<class 'list'> <class 'langchain.schema.Document'>


### Conversation data into Embedddings 

In [28]:
from langchain.embeddings.base import Embeddings
from typing import List
from sentence_transformers import SentenceTransformer
from langchain.embeddings.openai import OpenAIEmbeddings

import openai
import os 
os.environ['OPENAI_API_KEY'] = "" 

class LocalHuggingFaceEmbeddings(Embeddings):
    def __init__(self, model_id): 
        # Should use the GPU by default
        self.model = SentenceTransformer(model_id)
        
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of documents using a locally running
           Hugging Face Sentence Transformer model
        Args:
            texts: The list of texts to embed.
        Returns:
            List of embeddings, one for each text.
        """
        embeddings =self.model.encode(texts)
        return embeddings

    def embed_query(self, text: str) -> List[float]:
        """Embed a query using a locally running HF 
        Sentence trnsformer. 
        Args:
            text: The text to embed.
        Returns:
            Embeddings for the text.
        """
        embedding = self.model.encode(text)
        return list(map(float, embedding))

local_embeddings = LocalHuggingFaceEmbeddings('multi-qa-mpnet-base-dot-v1')
open_ai_embeddings = OpenAIEmbeddings()

In [29]:
from langchain.vectorstores import FAISS
vectorstore =  FAISS.from_documents(documents=texts,
                                    embedding = open_ai_embeddings
                                    ) # turn dcos into Vectors and store them in RAM also add metadata 
vectorstore.save_local('../data/faiss_dmac_conv')

## Retrival QA
Better version of Query 

###  underneath the hood 
- Underneath the hood is taking our query that we're going which is send to embed.
- Its turned into a vector, send it into device vector store.
- Then it's going to find similar vectors to that vector and it's going to bring those vectors back.
- Translate them into the text and we're going to send this as the context exactly like the vector chain, but with a bit of different parameters, but nothing very different here.

In [30]:
from langchain.chains import RetrievalQA ,VectorDBQAWithSourcesChain ,RetrievalQAWithSourcesChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

new_vector_store = FAISS.load_local("../data/faiss_dmac_conv/",embeddings=open_ai_embeddings)

### QA Retrival Chain to Answer specific questions 

In [31]:
prompt_template = """
Analyze conversations between customer and sales person. 
Use the context to answer.If question is related to customers , answer in points with names of the customers.
If you don't know the answer, just say No idea.

{context}

Question: {question}
Answer in Points: 
"""

prompt = PromptTemplate(input_variables=["context","question"],template=prompt_template)

In [32]:
chain_type_kwargs = {"prompt": prompt}

qa = RetrievalQA.from_chain_type(
                        llm=OpenAI(temperature=0),
                        chain_type="stuff",
                        retriever = new_vector_store.as_retriever(),
                        chain_type_kwargs=chain_type_kwargs
                        )

In [44]:
query = "What product Faraht Khan is interested in ?"
res = qa.run(query)
print("query:{}".format(query))
print("response:\n{}".format(res))


query:What product Faraht Khan is interested in ?
response:
Farhat Khan is interested in the Golden Giza product. He needs to have a property value of two million or more and must be fully paid off. He also needs to have the property be at least 50% complete if it is off plan.


In [45]:
query = "What is next action with Faraht Khan ?"
res = qa.run(query)
print("query:{}".format(query))
print("response:\n{}".format(res))

query:What is next action with Faraht Khan ?
response:
1. Confirm that Faraht Khan is eligible for the Golden Giza. 
2. Ask Faraht Khan to provide documents via email. 
3. Ask Faraht Khan to pay an initial payment of 2,000. 
4. Ask Faraht Khan to be in Dubai for two weeks for medical checkup and biometric.


In [37]:
query = "Customers whos' unit price is more than 1.6 million "
res = qa.run(query)
print("query:{}".format(query))
print("response:\n{}".format(res))


query:Customers whos' unit price is more than 1.6 million 
response:
- Sakar Bhasin: Property cost two million dirhams 
- Rimantas Macevicius: Property cost two million euros 
- Iness Ouichaoui: Original mortgage value was 750,000 
- Farhat Khan: Property bill needs to be two million and above


In [42]:
query = "Summarize conversations with Farhat Khan"
res = qa.run(query)
print("query:{}".format(query))
print("response:\n{}".format(res))

query:Summarize conversations with Farhat Khan 
response:
- Farhat Khan is looking for Golden Giza services from Pripko Moagy Services. 
- The criteria for eligibility is that the property value should be two million or above and it should be fully paid. 
- If the property is off plan, it should be at least 50% ready. 
- The cost of the services is around 16,500, with a government fee of 8,000 to 10,000. 
- The initial payment is 2,000 and Farhat Khan needs to be in Dubai for two weeks for the medical checkup and biometric.


In [43]:
query = "For Faraht khan any follow up leads"
res = qa.run(query)
print("query:{}".format(query))
print("response:\n{}".format(res))


query:For Faraht khan any follow up leads
response:
- Follow up leads for Faraht Khan include: 
- Contacting Michael Lee from the Visa Department for more information on government fees 
- Initial payment of 2,000 
- Emailing documents to TLD department 
- Being in Dubai for two weeks for medical checkup and biometric
