<a href="https://colab.research.google.com/github/RahulKoppula/-MoM.summerization.model/blob/main/BDL_Summerization_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 Cell 1: Install Required Python Packages

This cell installs all the necessary libraries for transcription and summarization.

In [1]:
!pip install --quiet openai-whisper torch transformers soundfile nltk

 Cell 2: Upload Your Audio File

This cell lets you upload your audio file and extracts its path automatically for later use

In [2]:
from google.colab import files
uploaded = files.upload()
# Get the filename (assuming only one file is uploaded)
audio_path = list(uploaded.keys())[0]
print(f"Uploaded file: {audio_path}")


Saving Harry_Truman_Announcing_Surrender_Of_Japan.ogg.mp3 to Harry_Truman_Announcing_Surrender_Of_Japan.ogg (2).mp3
Uploaded file: Harry_Truman_Announcing_Surrender_Of_Japan.ogg (2).mp3


Cell 3: Transcribe Audio to Readable Text

This cell loads the Whisper model, transcribes the uploaded audio, and formats the transcript into readable sentences

In [3]:
import whisper
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
import os

# Load Whisper model only once
whisper_model = whisper.load_model("base")

def transcribe_audio(filename):
    if not os.path.exists(filename):
        raise FileNotFoundError(f"{filename} does not exist.")
    result = whisper_model.transcribe(filename)
    return result['text']

# Use audio_path from upload
transcribed_text = transcribe_audio(audio_path)

# Split into sentences for readability
sentences = sent_tokenize(transcribed_text)

print("Transcription (Formatted):\n")
for sentence in sentences:
    print(sentence.strip())



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Transcription (Formatted):

My fellow Americans, Supreme Allied Commander General McArthur and Allied representatives on the battleship Missouri in Tokyo Bay.
The thoughts and hopes of all America, indeed of all the civilized world, are centered tonight on the battleship Missouri.
There on that small piece of American soil, anchored in Tokyo Harbor, the Japanese have just officially laid down their arms.
They have signed terms of unconditional surrender.
Four years ago, the thoughts and fears of the whole civilized world were centered on another piece of American soil, Pearl Harbor.
The mighty threat to civilization which began there is now laid at rest.
It was a long road to Tokyo and a bloody one.
We shall not forget Pearl Harbor.
The Japanese militarists will not forget the USS Missouri.
The evil done by the Japanese warlords can never be repaired or forgotten.
But their power to destroy and kill has been taken from them.
Their armies and what is left of their navy are now impotent.

Cell 4: Generate a Concise Summary of the Transcription

This cell uses a pre-trained BART model to summarize the transcribed text, handling long transcriptions automatically

In [4]:

from transformers import pipeline
from nltk.tokenize import sent_tokenize
import textwrap

# Load the text summarization pipeline (BART model)
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def summarize_text(text, max_chunk_length=500):
    sentences = sent_tokenize(text)
    current_chunk = ""
    chunks = []
    for sentence in sentences:
        if len(current_chunk) + len(sentence) + 1 <= max_chunk_length:
            current_chunk += (sentence + " ")
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + " "
    if current_chunk:
        chunks.append(current_chunk.strip())
    summaries = []
    for chunk in chunks:
        try:
            summary = summarizer(chunk, max_length=100, min_length=25, do_sample=False)
            summaries.append(summary[0]['summary_text'])
        except Exception as e:
            print(f"Error summarizing chunk: {e}")
    combined_summary = " ".join(summaries)
    return combined_summary

# Summarize the transcribed text
concise_summary = summarize_text(transcribed_text)

# Split the summary into sentences for bullet listing
summary_sentences = sent_tokenize(concise_summary)

print("Summary (Bullet Points):\n")
for sentence in summary_sentences:
    # Wrap text for neat display, bullet at the start
    wrapped = textwrap.fill(sentence.strip(), width=100, initial_indent="• ", subsequent_indent="  ")
    print(wrapped)



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu
Your max_length is set to 100, but your input_length is only 80. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=40)
Your max_length is set to 100, but your input_length is only 99. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=49)
Your max_length is set to 100, but your input_length is only 79. Sin

Summary (Bullet Points):

• The thoughts and hopes of all America, indeed of all the civilized world, are centered tonight on
  the battleship Missouri.
• There on that small piece of American soil, anchored in Tokyo Harbor, the Japanese have just
  officially laid down their arms.
• The evil done by the Japanese warlords can never be repaired or forgotten.
• Their power to destroy and kill has been taken from them.
• We shall not forget Pearl Harbor.
• "Their armies and what is left of their navy are now impotent," he said.
• "To all of us, there comes first the sense of gratitude to Almighty God" Our first thoughts, of
  course, thoughts of gracefulness and deep obligation go out to those of our loved ones who have
  been killed or mained in this terrible war.
• God granted in our pride this hour, we may not forget the hard tests that are still before us.
• No victory can make good their loss.
• On land and sea and in the air, American men and women have given their lives.
• No victo

 Cell 5: Complete Pipeline – Transcribe and Summarize

This snippet of code defines and demonstrates a function that takes an audio file, transcribes, and summarizes it in a single step. To be used when implementing a web based UI.

In [5]:
# def transcribe_and_summarize(audio_file):
#     transcription = transcribe_audio(audio_file)
#     summary = summarize_text(transcription)
#     return {"transcription": transcription, "summary": summary}

# # Usage with uploaded file
# result = transcribe_and_summarize(audio_path)
# print("\n--- Full Transcription ---\n")
# print(result['transcription'])

# print("\n--- Summary ---\n")
# print(result['summary'])


Your max_length is set to 100, but your input_length is only 80. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=40)
Your max_length is set to 100, but your input_length is only 99. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=49)
Your max_length is set to 100, but your input_length is only 79. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=39)
Your max_length is set to 100, but your input_length is only 85. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=42)
Your


--- Full Transcription ---

 My fellow Americans, Supreme Allied Commander General McArthur and Allied representatives on the battleship Missouri in Tokyo Bay. The thoughts and hopes of all America, indeed of all the civilized world, are centered tonight on the battleship Missouri. There on that small piece of American soil, anchored in Tokyo Harbor, the Japanese have just officially laid down their arms. They have signed terms of unconditional surrender. Four years ago, the thoughts and fears of the whole civilized world were centered on another piece of American soil, Pearl Harbor. The mighty threat to civilization which began there is now laid at rest. It was a long road to Tokyo and a bloody one. We shall not forget Pearl Harbor. The Japanese militarists will not forget the USS Missouri. The evil done by the Japanese warlords can never be repaired or forgotten. But their power to destroy and kill has been taken from them. Their armies and what is left of their navy are now impoten