## podcast to blog using GPU

Whisper large-v3 is supported in Hugging Face Transformers. To run the model, first install the Transformers library.

In [1]:
!pip install --upgrade pip
!pip install --upgrade transformers datasets[audio] accelerate




In [2]:
!pip install --upgrade transformers



The model can be used with the pipeline class to transcribe audios of arbitrary length

In [3]:
import transformers
print(transformers.__version__) # Ensure it's a recent version (e.g., 4.x.x)
# from transformers import

4.56.1


- Pipeline: Simple and optimized inference class for many machine learning tasks like text generation, image segmentation, automatic speech recognition, document question answering, and more.

In [4]:
# import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
# from datasets import load_dataset

ModuleNotFoundError: Could not import module 'AutoProcessor'. Are this object's requirements defined correctly?

- Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation
- The large-v3 model shows improved performance over a wide variety of languages, showing 10% to 20% reduction of errors compared to Whisper large-v2 
- Whisper large-v3 has following advantages:
    * The spectrogram input uses 128 Mel frequency bins instead of 80
    * A new language token for Cantonese


In [5]:
# Install faster-whisper (cleaner dependencies)
!pip install faster-whisper

# Simple, reliable code
from faster_whisper import WhisperModel
import torch



In [6]:
def setup_fast_whisper():    
    print("🚀 Loading Faster-Whisper...")
    
    # Check GPU
    device = "cuda" if torch.cuda.is_available() else "cpu"
    compute_type = "float32" if device == "cuda" else "int8"
    
    # Load model
    model = WhisperModel(
        "large-v3", 
        device=device, 
        compute_type=compute_type
    )
    
    print(f"✅ Model loaded on {device}")
    return model

In [None]:
def transcribe(model, audio_path):
   
    print(f"🎙️ Transcribing: {audio_path}")
    
    segments, info = model.transcribe(
        audio_path,
        beam_size=5,
        language="en",
        word_timestamps=True,
        vad_filter=True,  # Voice activity detection
        vad_parameters=dict(min_silence_duration_ms=500)
    )
    
    # Collect transcript
    transcript = ""
    for segment in segments:
        transcript += segment.text + " "
    
    print(f"✅ Transcription completed!")
    print(f"📊 Detected language: {info.language}")
    print(f"🕐 Duration: {info.duration:.2f} seconds")
    
    return transcript.strip()

# Usage
model = setup_fast_whisper()
transcript = transcribe(model, "/kaggle/input/vaibhavee/2025-08-07 18-02-57.mkv")

with open("/kaggle/working/generated_blog_post.md", "w", encoding="utf-8") as f:
    f.write(transcript)
print(transcript)

🚀 Loading Faster-Whisper...
✅ Model loaded on cuda
🎙️ Transcribing: /kaggle/input/vaibhavee/2025-08-07 18-02-57.mkv


# Blog generation

In [None]:
!pip install unstructured[docx] langchain langchainhub langchain_community 

Loading the documents

In [None]:
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader("/kaggle/working/")
blog = loader.load()
len(blog)

### Splitting the text

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(blog)


In [None]:
from langchain import hub
from langchain_community.llms import Ollama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import PromptTemplate

llm = Ollama(model="llama3:latest")



blog_prompt = PromptTemplate.from_template(
    """
    You are a professional blog writer specializing in educational content. Transform the following transcription into an engaging, well-structured 700 words blog post about BS DEGREE ELECTRONICS PROGRAMS. 
    Ensure the blog is engaging, creative, informative about the ELECTRONICS PROGRAM, and easy to read. Write in a single, coherent narrative like a story. highlight fee waiver information if any. 
    dont use the word "podcast" in the blog. 
    Include practical advice and insights for students
    dont mention any private instituton names.
    Include relevant quotes from the discussion
    mention the incidents of students meeting with professsors and how it helped them.
    

    Transcript:
    {transcript}

    Create an inspiring 700 words blog:
    """
)


In [None]:
parser = StrOutputParser()

chain = (
    {"transcript": RunnablePassthrough()}
    | blog_prompt
    | llm 
    |parser 
 )


In [None]:
with open("/kaggle/working/generated_blog_post.md", "r", encoding="utf-8") as f:
    transcript_text = f.read()

blog_post = chain.invoke(transcript_text)

print(blog_post)
