# Multilingual Speech Recognition with Retrieval Augmented Generation Model

## Navigation

### 1. OpenAI Whisper
### 2. Audio Transcription
### 3. Non Parametric Memory
### 4. RAG Model
### 5. Gradio Interface
### 6. Model Evaluation

As suggested in the document, I have used OpenAI Whisper for multilingual speech recognition.


# 1. OpenAI Whisper

In [3]:
# load model with git
!pip install git+https://github.com/openai/whisper.git

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-udas7bll
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-udas7bll
  Resolved https://github.com/openai/whisper.git to commit 1cea4357687b676b293cb5473e1ade25f5b1cef7
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken (from openai-whisper==20231106)
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: openai-whisper
  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25hdone
  Created wheel for openai-whisper: filename=openai_whisper-20231106-py3-none-a

Import Whisper into runtime,

Define function for transcription with input parameters with default strings. Model type chosen is *base* for quicker process.

# 2. Audio Transcription

In [4]:
import whisper

def transcribe(model_type='base', input_path="/content/Football Commentary.mp3"):

  model = whisper.load_model(model_type)

  # load audio and pad/trim it to fit 30 seconds
  audio = whisper.load_audio(input_path)
  audio = whisper.pad_or_trim(audio)

  # make log-Mel spectrogram and move to the same device as the model
  mel = whisper.log_mel_spectrogram(audio).to(model.device)

  # detect the spoken language
  _, probs = model.detect_language(mel)
  print(f"Detected language: {max(probs, key=probs.get)}")

  # decode the audio
  options = whisper.DecodingOptions()
  result = whisper.decode(model, mel, options)

  # return the recognized text
  return (result.text)

In [5]:
# run transcribe on audio
transcription = transcribe()
transcription

100%|████████████████████████████████████████| 139M/139M [00:01<00:00, 112MiB/s]


Detected language: en


"from Rosario Argentina on behalf of every little boy wearing his shirt. Messi on a million backs. Messi for a million flashbolts. One kick of the ball. One kick of the football. He's done it before. He's done it many times before in the new camp around Spain around Europe. He's done it brilliantly for his nation in this competition. He must do it now. Messi must."

Considering RAG model inability to perform tasks in multiple languages, we will utilize the above ASR model.

In [6]:
!git lfs install
!git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
!git clone https://huggingface.co/google/flan-t5-base

Git LFS initialized.
Cloning into 'all-MiniLM-L6-v2'...
remote: Enumerating objects: 46, done.[K
remote: Counting objects: 100% (46/46), done.[K
remote: Compressing objects: 100% (30/30), done.[K
remote: Total 46 (delta 14), reused 46 (delta 14), pack-reused 0[K
Unpacking objects: 100% (46/46), 311.32 KiB | 2.96 MiB/s, done.
Filtering content: 100% (3/3), 260.15 MiB | 48.68 MiB/s, done.
Cloning into 'flan-t5-base'...
remote: Enumerating objects: 58, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 58 (delta 0), reused 0 (delta 0), pack-reused 55[K
Unpacking objects: 100% (58/58), 621.31 KiB | 3.39 MiB/s, done.
Filtering content: 100% (5/5), 3.87 GiB | 51.12 MiB/s, done.


In [7]:
!pip install -q langchain
!pip install -q torch
!pip install -q transformers
!pip install -q faiss-cpu
!pip install -q pypdf
!pip install -q sentence-transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m53.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m55.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m56.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m277.6/277.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

# 3. Non Parametric Memory

Retrieval of similar scored docs are returned, below I used research paper on Arthritis to demonstrate.

In [8]:
from langchain.document_loaders import PyPDFLoader
pdfLoader = PyPDFLoader("/content/paper1.pdf")
documents = pdfLoader.load()

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(documents)

In [10]:
from langchain.embeddings import HuggingFaceEmbeddings
modelPath = "/content/all-MiniLM-L6-v2"
model_kwargs = {'device':'cpu'}
encode_kwargs = {'normalize_embeddings':False}
embeddings = HuggingFaceEmbeddings(
  model_name = modelPath,
  model_kwargs = model_kwargs,
  encode_kwargs=encode_kwargs
)

In [11]:
from langchain.vectorstores import FAISS
db = FAISS.from_documents(docs, embeddings)
# default query
question = "Explain Arthritis?"
searchDocs = db.similarity_search(question)
print(searchDocs[0].page_content)

treatment  propositions  and appraise  their potential  beneﬁts.
©2022 The Author(s).  Published  by Elsevier  Masson  SAS. This is an open access  article  under the CC 
BY-NC-ND  license  (http://creativecommons.org/licenses/by-nc-nd/4.0/ ).
1. Introduction
Arthritis  is a term which  is used for various  inﬂammatory  con-
ditions  that affect different  parts of the body such as joints,  bones,  
and muscles.  It can be of several  types such as Osteoarthritis  
(OA), Rheumatoid  Arthritis  (RA), juvenile  Arthritis,  psoriatic  arthri-
tis, and gouty  Arthritis,  which  can result  in stiffness,  pain, redness  
and swelling  in the joints [47]. According  to [5], it has been re-
vealed  that about  3.6 million  (15%) of people  are affected  from 
arthritis  which  includes  17.9% females  and 12.1% males.  Moreover,  
62% of patients  affected  from arthritis  had Osteoarthritis,  12.7% 
had rheumatoid  arthritis,  and 32.1% had suffered  from an unspeci-


# 4. RAG Model

In [12]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,pipeline
from langchain import HuggingFacePipeline

tokenizer = AutoTokenizer.from_pretrained("/content/flan-t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("/content/flan-t5-base", max_length=1000)
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
llm = HuggingFacePipeline(
    pipeline = pipe,
    model_kwargs={"temperature": 1, "max_length": 512},
)

In [13]:
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
  llm=llm,
  chain_type="stuff",
  retriever=db.as_retriever()
)

In [14]:
print(f"llm type: {type(llm)}")
print(f"db type: {type(db)}")
print(f"question type: {type(question)}")

llm type: <class 'langchain.llms.huggingface_pipeline.HuggingFacePipeline'>
db type: <class 'langchain.vectorstores.faiss.FAISS'>
question type: <class 'str'>


In [15]:
queries = """Arthritis  is a term which  is used for various  inﬂammatory  con-
ditions  that affect different  parts of the body such as joints,  bones,
and muscles.  It can be of several  types such as Osteoarthritis
(OA), Rheumatoid  Arthritis  (RA), juvenile  Arthritis,  psoriatic  arthri-
tis, and gouty  Arthritis,  which  can result  in stiffness,  pain, redness
and swelling  in the joints [47]. According  to [5], it has been re-
vealed  that about  3.6 million  (15%) of people  are affected  from
arthritis  which  includes  17.9% females  and 12.1% males.  Moreover,
62% of patients  affected  from arthritis  had Osteoarthritis,  12.7%
had rheumatoid  arthritis,"""

try:
    output = qa_chain(queries)
    print(output["result"])
except Exception as e:
    print(f"Error: {e}")

Token indices sequence length is longer than the specified maximum sequence length for this model (1238 > 512). Running this sequence through the model will result in indexing errors


Arthritis and its types Arthritis is a degenerative disorder associated with human joints that can result in disability. There are numerous types of arthritis such as rheumatoid arthritis, Osteoarthritis, Juvenile Arthritis, Psoriatic arthritis and gouty Arthritis, which can result in stiffness, pain, redness and swelling in the joints [47].


In [16]:
output['result']
# summary

'Arthritis and its types Arthritis is a degenerative disorder associated with human joints that can result in disability. There are numerous types of arthritis such as rheumatoid arthritis, Osteoarthritis, Juvenile Arthritis, Psoriatic arthritis and gouty Arthritis, which can result in stiffness, pain, redness and swelling in the joints [47].'

# 5. Gradio Interface

In [1]:
!pip install -q gradio

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m50.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.9/92.9 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.7/302.7 kB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.7/311.7 kB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.7/138.7 kB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m381.6/381.6 kB[0m [31m43.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.7/45.7 kB[0m [31m

In [46]:
import gradio as gr
import numpy as np

def rag_tasks(text, operation, language):
    if operation == "Summarize":
      result = qa_chain("Summarize the text: "+ text)
      return result['result']
    else:
      result = qa_chain("Translate to "+ language + ":" + text)
      return result['result']

demo = gr.Interface(
    rag_tasks,
    [
        "text",
        gr.Radio(["Summarize", "Translate"]), gr.Radio(["French", "Spanish", "German"])
    ],
    "text",
    live=False,
)

demo.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://e9831f4c8fc100618d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [38]:
qa_chain("""Arthritis  is a term which  is used for various  inﬂammatory  con-
ditions  that affect different  parts of the body such as joints,  bones,
and muscles.  It can be of several  types such as Osteoarthritis
(OA), Rheumatoid  Arthritis  (RA), juvenile  Arthritis,  psoriatic  arthri-
tis, and gouty  Arthritis,  which  can result  in stiffness,  pain, redness
and swelling  in the joints [47]. According  to [5], it has been re-
vealed  that about  3.6 million  (15%) of people  are affected  from
arthritis  which  includes  17.9% females  and 12.1% males.  Moreover,
62% of patients  affected  from arthritis  had Osteoarthritis,  12.7%
had rheumatoid  arthritis""")['result']

'Arthritis and its types Arthritis is a degenerative disorder associated with human joints that can result in disability. There are numerous types of arthritis such as rheumatoid arthritis, Osteoarthritis, Juvenile Arthritis, Psoriatic arthritis and gouty Arthritis, which can result in stiffness, pain, redness and swelling in the joints [47].'

# 6. Model Evaluation

In [18]:
!pip install datasets evaluate rouge_score

Collecting datasets
  Downloading datasets-2.15.0-py3-none-any.whl (521 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyarrow-hotfix (from datasets)
  Downloading pyarrow_hotfix-0.5-py3-none-any.whl (7.8 kB)
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [19]:
from datasets import load_dataset

multi_news = load_dataset("multi_news")

Downloading builder script:   0%|          | 0.00/3.83k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.82k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/58.8M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/66.9M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.30M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/69.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.31M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/44972 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/5622 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5622 [00:00<?, ? examples/s]

In [20]:
multi_news["train"][0]

{'document': 'National Archives \n \n Yes, it’s that time again, folks. It’s the first Friday of the month, when for one ever-so-brief moment the interests of Wall Street, Washington and Main Street are all aligned on one thing: Jobs. \n \n A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m. New York time offering one of the most important snapshots on how the economy fared during the previous month. Expectations are for 203,000 new jobs to be created, according to economists polled by Dow Jones Newswires, compared to 227,000 jobs added in February. The unemployment rate is expected to hold steady at 8.3%. \n \n Here at MarketBeat HQ, we’ll be offering color commentary before and after the data crosses the wires. Feel free to weigh-in yourself, via the comments section. And while you’re here, why don’t you sign up to follow us on Twitter. \n \n Enjoy the show. ||||| Employers pulled back sharply on hiring last month, a reminder that the U.S. economy 

In [21]:
references = []

for i in range(10):
  references.append(multi_news['train'][i]['summary'])

In [22]:
candidates = []

for i in range(10):
  flag = qa_chain(multi_news['train'][i]['document'])
  candidates.append(flag['result'])

In [23]:
references

['– The unemployment rate dropped to 8.2% last month, but the economy only added 120,000 jobs, when 203,000 new jobs had been predicted, according to today\'s jobs report. Reaction on the Wall Street Journal\'s MarketBeat Blog was swift: "Woah!!! Bad number." The unemployment rate, however, is better news; it had been expected to hold steady at 8.3%. But the AP notes that the dip is mostly due to more Americans giving up on seeking employment.',
 '– Shelly Sterling plans "eventually" to divorce her estranged husband Donald, she tells Barbara Walters at ABC News. As for her stake in the Los Angeles Clippers, she plans to keep it, the AP notes. Sterling says she would "absolutely" fight any NBA decision to force her to sell the team. The team is her "legacy" to her family, she says. "To be honest with you, I\'m wondering if a wife of one of the owners … said those racial slurs, would they oust the husband? Or would they leave the husband in?"',
 '– A twin-engine Embraer jet that the FAA 

In [24]:
candidates

['– The unemployment rate fell sharply last month, but the economy added 120,000 jobs in March, down from more than 200,000 in each of the previous three months. The unemployment rate dropped to 8.2%, the lowest since January 2009. The official unemployment tally only includes those seeking work. The economy has added 858,000 jobs since December, the best four months of hiring in two years. But Federal Reserve Chairman Ben Bernanke has cautioned that the current hiring pace is unlikely to continue without more consumer spending.',
 '– Shelly Sterling is a co-owner of the Los Angeles Clippers, and she\'s a fan of the team. "I\'ve been with the team for 33 years, through the good times and the bad times," she said. "I\'ve been with the team for 33 years, through the good times and the bad times." She also says she\'s "eventually" going to divorce her husband, and that she\'s "eventually" going to divorce him. "I\'m wondering if a wife of one of the owners, and there\'s 30 owners, did som

In [25]:
from evaluate import load
# Load the ROUGE metric
import evaluate
rouge = evaluate.load('rouge')

results = rouge.compute(predictions=candidates, references=references)
print(results)

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

{'rouge1': 0.33230180790739217, 'rouge2': 0.11183092258120962, 'rougeL': 0.20284902897629678, 'rougeLsum': 0.20316932648590144}


In [26]:
books = load_dataset("opus_books", "en-fr")

Downloading builder script:   0%|          | 0.00/6.08k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/161k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/20.5k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/12.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/127085 [00:00<?, ? examples/s]

In [27]:
books['train'][0]

{'id': '0', 'translation': {'en': 'The Wanderer', 'fr': 'Le grand Meaulnes'}}

In [28]:
references2 = []

for i in range(10):
  references2.append(books['train'][i]['translation']['fr'])

In [29]:
candidates2 = []

for i in range(10):
  flag = qa_chain(books['train'][i]['translation']['en'])
  candidates2.append(flag['result'])

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1522, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    re

In [30]:
# Load the BLEU evaluation metric
bleu = evaluate.load("bleu")

# Compute the BLEU score
results = bleu.compute(predictions=candidates2, references=references2)

# Print the results
print(results)

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

{'bleu': 0.0, 'precisions': [0.10869565217391304, 0.08333333333333333, 0.037037037037037035, 0.0], 'brevity_penalty': 0.30915483498901647, 'length_ratio': 0.46, 'translation_length': 46, 'reference_length': 100}
