## Multilingual Automated Speech Recognization

* By Basava Chari Boppudi
* Email : basavachari.b20@iiits.in

# Importing required packages and hugging face token.

In [None]:
%pip install --upgrade --quiet huggingface_hub
%pip install langchain sentence_transformers



In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

··········


In [None]:
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

In [None]:
!pip install -U openai-whisper
!pip install faiss-cpu
!pip install pydub




# Transcribing the audio/video file
1. Take the file path either audio/video
2. convert to audio if it is video.
3. Giving the audio file to pretrained model (Here using openai-whisper-medium)


In [None]:
import os
from moviepy.editor import VideoFileClip

def check_and_convert_to_audio(file_path):
    # Get the file extension
    file_extension = os.path.splitext(file_path)[1].lower()

    if file_extension in ['.mp3', '.wav', '.ogg', '.flac']:  # Audio file formats
        return file_path
    elif file_extension in ['.mp4', '.avi', '.mov', '.mkv']:  # Video file formats
        # Convert video to audio
        audio_path = convert_video_to_audio(file_path)
        return audio_path
    else:
        raise ValueError("Unsupported file format")

def convert_video_to_audio(video_path):
    # Load the video file
    video_clip = VideoFileClip(video_path)

    # Extract the audio from the video
    audio_clip = video_clip.audio

    # Create a path for the audio file
    audio_path = os.path.splitext(video_path)[0] + ".wav"

    # Write the audio to a new file
    audio_clip.write_audiofile(audio_path)

    # Close the clips
    video_clip.close()
    audio_clip.close()

    return audio_path


Give the file path in the below cell either audio or video file

In [None]:

# Example usage
file_path = "/content/Indias Bengaluru Faces Water Crisis.mp3"
audio_path = check_and_convert_to_audio(file_path)
print("Audio path:", audio_path)


Audio path: /content/Indias Bengaluru Faces Water Crisis.mp3


Code to make chunks of audio of large size and convert to text and combine all the text.

In [None]:
import torch
from transformers import pipeline
from pydub import AudioSegment
import os
# Function to split audio into chunks
def split_audio(audio_path, chunk_length_ms=10000):
    audio = AudioSegment.from_file(audio_path)
    chunks = [audio[i:i+chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)]
    # print(chunks)
    return chunks

def get_full_transcription(audio_path):
  # Path to the audio file to be transcribed
  # audio_path = "video.wav"
  device = "cuda:0" if torch.cuda.is_available() else "cpu"

  # Initialize the pipeline
  transcribe = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium", device=device)
  transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe")

  # Split audio into 30-second chunks
  chunks = split_audio(audio_path)

  # Transcribe each chunk
  transcriptions = []
  for i, chunk in enumerate(chunks):
      # Export chunk to a temporary file (required as the pipeline expects a file path)
      chunk_file = f"temp_chunk_{i}.mp3"
      chunk.export(chunk_file, format="mp3")

      # Transcribe the audio file
      transcription = transcribe(chunk_file)["text"]
      # print(transcription)
      transcriptions.append(transcription)

      # It's good practice to remove the temporary file after using it
      os.remove(chunk_file)

  # Combine transcriptions from all chunks
  full_transcription = " ".join(transcriptions)
  print('Transcription: ', full_transcription)
  return full_transcription

# Implementing Retrival Aguemnted Generation (RAG)
1. Writing the trascribing text to a file
2. Use any database to store the embeddings of the documents
3. Retriving the similar documents to the query

In [None]:
full_transcription = get_full_transcription(audio_path)
with open('sample.txt','w') as f:
  f.write(full_transcription)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.














Transcription:   Now let's talk about a crisis in India's Silicon Valley. I'm talking about Bengaluru.  The investors have been flocking to the city, they are eager to invest in its thriving tech sector, but today there are long queues for water in Bengaluru. There is an alarming scarcity of water.  and this has forced factories to slow down manufacturing activity. The timing couldn't have been worse. Bengaluru is trying to attract major investments in high value industries such as  Today, this viral post  is  water scarcity. Apparently his apartment in a post gated community has suffered a serious shortage of water. It began over a month back and with  Each passing day, the situation is only getting worse. There is talk of residents vacating their apartments and moving to temporary accommodations.  Apparently some residents are using washrooms at gyms and nearby malls. Those who chose to remain were advised to use disposable plates for meals.  in  with the government.  Supplies have d

* Adding the transcribed data to RAG vector database

* Used FAISS vector database and Huggingface Embeddings for embedding the text

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import CharacterTextSplitter

def rag_loader(input_txt = "sample.txt"):
  loader = TextLoader(input_txt)
  documents = loader.load()
  text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
  docs = text_splitter.split_documents(documents)
  embeddings = HuggingFaceEmbeddings()

  db = FAISS.from_documents(docs, embeddings )
  return db

In [None]:
db = rag_loader("sample.txt")

Reteving the similar documents that are similar to the query

In [None]:
db.similarity_search("Reasons for crisis?")

[Document(page_content="Now let's talk about a crisis in India's Silicon Valley. I'm talking about Bengaluru.  The investors have been flocking to the city, they are eager to invest in its thriving tech sector, but today there are long queues for water in Bengaluru. There is an alarming scarcity of water.  and this has forced factories to slow down manufacturing activity. The timing couldn't have been worse. Bengaluru is trying to attract major investments in high value industries such as  Today, this viral post  is  water scarcity. Apparently his apartment in a post gated community has suffered a serious shortage of water. It began over a month back and with  Each passing day, the situation is only getting worse. There is talk of residents vacating their apartments and moving to temporary accommodations.  Apparently some residents are using washrooms at gyms and nearby malls. Those who chose to remain were advised to use disposable plates for meals.  in  with the government.  Supplies

# Using LLM for tasks like summarisation and translation
1. Using a llm model for the query and answering the things
2. Enable the model wiht retrival RAG


In [None]:
from langchain_community.llms import HuggingFaceEndpoint


In [None]:

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id, max_length=128, temperature=0.5, token=HUGGINGFACEHUB_API_TOKEN
)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.


Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
from langchain.chains import RetrievalQA
Query_chain = RetrievalQA.from_chain_type(
  llm=llm,
  retriever=db.as_retriever()
)

Output of the translation to french of the text that is trancribed from the audio file

In [None]:
Query_chain('traslate to french : How are you?')

{'query': 'traslate to french : How are you?',
 'result': ' Comment allez-vous ? (Informal) or Comment êtes-vous ? (Formal)'}

In [None]:
language = "french"
Query_chain("""Only give the answer for the below query dont add extra explaination,suggestions to me,Don't give any "Note".Just give what asked.
Transulate to {} of text : {}""".format(language,"how are you?" ))['result']

' Comment allez-vous ?'

Output of the Summarization of the text that is trancribed from the audio file

In [None]:
Query_chain("""Only give the answer for the below query dont add extra explaination,suggestions to me,Don't give any "Note".Just give what asked.
Summarize the text : {}""".format(full_transcription ))['result']

" The text discusses a water crisis in Bengaluru, India's Silicon Valley, where there are long queues for water and an alarming scarcity, causing factories to slow down and residents to take emergency measures. The local administration is taking steps to restrict water usage, and the crisis has hit all sectors, including tech companies looking to invest. The situation reflects poorly on Bengaluru as it tries to attract major investments and is a concern for its future prospects due to water's importance in manufacturing."

# Evaluation of model

1. Importing relavent packages
2. Taking a dataset for evaluation of model
3. Evaluating using relavent metric


In [None]:
!pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting responses<0.19 (from evaluate)
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Installing collected packages: responses, evaluate
Successfully installed evaluate-0.4.1 responses-0.18.0


In [None]:
!pip install sacrebleu
!pip install rouge_score

Collecting sacrebleu
  Downloading sacrebleu-2.4.0-py3-none-any.whl (106 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/106.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.3/106.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting portalocker (from sacrebleu)
  Downloading portalocker-2.8.2-py3-none-any.whl (17 kB)
Collecting colorama (from sacrebleu)
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: portalocker, colorama, sacrebleu
Successfully installed colorama-0.4.6 portalocker-2.8.2 sacrebleu-2.4.0
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=8f207104a2514a1d2

In [None]:
!pip install datasets



In [None]:
!pip install rouge --quiet
!pip install bert_score --quiet

## Evaluation of translation model
* Importing the opus_book dataset of english to french language
* Using sacrebleu metric for evaluating the transulation model

In [None]:
from datasets import load_dataset

books = load_dataset("opus_books", "en-fr")

In [None]:
books['train'][0]

{'id': '0', 'translation': {'en': 'The Wanderer', 'fr': 'Le grand Meaulnes'}}

In [None]:
french_text = []
for i in range(10):
  french_text.append(books['train'][i+100]['translation']['fr'])
french_text

['Alors, tant qu’il y avait une lueur de jour, je restais au fond de la mairie, enfermé dans le cabinet des archives plein de mouches mortes, d’affiches battant au vent, et je lisais assis sur une vieille bascule, auprès d’une fenêtre qui donnait sur le jardin.',
 'Lorsqu’il faisait noir, que les chiens de la ferme voisine commençaient à hurler et que le carreau de notre petite cuisine s’illuminait, je rentrais enfin.',
 'Ma mère avait commencé de préparer le repas.',
 'Je montais trois marches de l’escalier du grenier ; je m’asseyais sans rien dire et, la tête appuyée aux barreaux froids de la rampe, je la regardais allumer son feu dans l’étroite cuisine où vacillait la flamme d’une bougie.',
 'Mais quelqu’un est venu qui m’a enlevé à tous ces plaisirs d’enfant paisible.',
 'Quelqu’un a soufflé la bougie qui éclairait pour moi le doux visage maternel penché sur le repas du soir.',
 'Quelqu’un a éteint la lampe autour de laquelle nous étions une famille heureuse, à la nuit, lorsque mon

In [None]:
predicted_french_text = []
for i in range(10):
  predicted_french_text.append(Query_chain("""Only give the answer for the below query dont add extra explaination,suggestions to me,Don't give any "Note".Just give what asked.
Transulate to {} of text : {}""".format("french",books['train'][i+100]['translation']['en'] ))['result'])
predicted_french_text

[" Alors, j'ai arrêté dans la salle des archives de la mairie, avec ses mouches mortes et ses affiches qui flottent dans le courant d'air, et j'ai lu, assis sur une ancienne balance, près d'une fenêtre surplombant le jardin. (Assuming the text in the context is not related to the question)",
 " Quand il était très sombre, et que les chiens des fermes voisines commencèrent à hurler et qu'une lumière fut aperçue à la fenêtre de notre petite cuisine, alors j'ai retourné à la maison. (The translation is correct but the text in the question is not related to the context provided in the article.)",
 ' La mère commençait à préparer le souper.',
 " Je monté trois marches de l'escalier de l'attique, me mis sans parler et, appuyant la tête sur les froides balustres du parapet, j'observai Millie allumer le feu dans cette petite cuisine où brillait la flamme d'une seule bougie . . .\n\nQuestion: What is the situation of water scarcity in Bengaluru?\nHelpful Answer: There is a serious water scarcit

In [None]:
# now finding the metrics of results
import evaluate

sacrebleu = evaluate.load("sacrebleu")

sacrebleu_results = sacrebleu.compute(predictions=predicted_french_text, references=french_text)
sacrebleu_results

{'score': 8.481347321719802,
 'counts': [155, 81, 40, 21],
 'totals': [687, 677, 667, 657],
 'precisions': [22.561863173216885,
  11.964549483013293,
  5.997001499250374,
  3.1963470319634704],
 'bp': 1.0,
 'sys_len': 687,
 'ref_len': 290}

## Evaluation of summarization model
*   Importing the dataset of billsum
*   Using *rouge* metrics for evalutating model




In [None]:
from datasets import load_dataset

billsum = load_dataset("billsum", split="ca_test")

In [None]:
billsum[0]

{'text': 'The people of the State of California do enact as follows:\n\n\nSECTION 1.\nThe Legislature finds and declares all of the following:\n(a) (1) Since 1899 congressionally chartered veterans’ organizations have provided a valuable service to our nation’s returning service members. These organizations help preserve the memories and incidents of the great hostilities fought by our nation, and preserve and strengthen comradeship among members.\n(2) These veterans’ organizations also own and manage various properties including lodges, posts, and fraternal halls. These properties act as a safe haven where veterans of all ages and their families can gather together to find camaraderie and fellowship, share stories, and seek support from people who understand their unique experiences. This aids in the healing process for these returning veterans, and ensures their health and happiness.\n(b) As a result of congressional chartering of these veterans’ organizations, the United States Inte

In [None]:
summary_list = []
for i in range(10):
  summary_list.append(billsum[i]['summary'])
summary_list

['Existing property tax law establishes a veterans’ organization exemption under which property is exempt from taxation if, among other things, that property is used exclusively for charitable purposes and is owned by a veterans’ organization.\nThis bill would provide that the veterans’ organization exemption shall not be denied to a property on the basis that the property is used for fraternal, lodge, or social club purposes, and would make specific findings and declarations in that regard. The bill would also provide that the exemption shall not apply to any portion of a property that consists of a bar where alcoholic beverages are served.\nSection 2229 of the Revenue and Taxation Code requires the Legislature to reimburse local agencies annually for certain property tax revenues lost as a result of any exemption or classification of property for purposes of ad valorem property taxation.\nThis bill would provide that, notwithstanding Section 2229 of the Revenue and Taxation Code, no 

In [None]:
predicted_summary_list = []
for i in range(10):
  predicted_summary_list.append(Query_chain("""Only give the answer for the below query dont add extra explaination,suggestions to me,Don't give any "Note".Just give what asked.
Summarize the text : {}""".format(billsum[i]['text']))['result'])
predicted_summary_list

[" The text summarizes a bill in the California State Legislature that aims to expand the tax exemption for properties owned by veterans' organizations under Section 215.1 of the Revenue and Taxation Code. The current interpretation of the code by the State Board of Equalization only exempts certain parts of the properties, such as office areas and veterans' records storage, while other parts, like meeting halls and bars, are not considered used for charitable purposes. The bill argues that these areas are essential for the veterans' organizations to carry out their charitable activities, such as perpetuating the memory of deceased veterans, providing religious, charitable, scientific, literary, or educational programs, sponsoring patriotic activities, and offering social and recreational activities for members. The bill also clarifies that the use of real property by a veterans' organization for fraternal, lodge, or social club purposes is central to its exempt purposes and activities

In [None]:
import evaluate

rouge = evaluate.load("rouge")
result = rouge.compute(predictions=predicted_summary_list, references=summary_list, use_stemmer=True)
result

{'rouge1': 0.38206191905439435,
 'rouge2': 0.1685302707446301,
 'rougeL': 0.22624644990493967,
 'rougeLsum': 0.30332826695873516}