**Project Goal: YouTube Video Question Answering & Summarization with Local LLM**



Build an intelligent system that can automatically process any YouTube video,
extract its transcript, and allow users to:
* Summarize the video content in a concise, human‑like way
* Ask natural language questions about the video and get direct answers

**Step 1: Install necessary libraries**

In [None]:
!pip install langchain langchain-community langchain-huggingface youtube-transcript-api transformers accelerate bitsandbytes pytube yt-dlp


Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.1-py3-none-any.whl.metadata (996 bytes)
Collecting youtube-transcript-api
  Downloading youtube_transcript_api-1.2.2-py3-none-any.whl.metadata (24 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting pytube
  Downloading pytube-15.0.0-py3-none-any.whl.metadata (5.0 kB)
Collecting yt-dlp
  Downloading yt_dlp-2025.7.21-py3-none-any.whl.metadata (175 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.4/175.4 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata 

**Step 2: Load YouTube audio and transcribe it using OpenAI Whisper**

In [None]:
from langchain_community.document_loaders import YoutubeLoader
from youtube_transcript_api import YouTubeTranscriptApi # Explicitly import here

def load_youtube_transcript(url):
    """
    Use LangChain's YoutubeLoader to fetch transcript.
    Returns list of Document objects.
    """
    # Using YoutubeLoader might still be the issue, let's try direct access first
    video_id = url.split("=")[1]
    try:
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
        transcript = transcript_list.find_transcript(['en', 'a.en']) # Try to find English or auto-generated English
        full_transcript = " ".join([t['text'] for t in transcript.fetch()])
        from langchain.schema import Document
        docs = [Document(page_content=full_transcript, metadata={"source": url, "video_id": video_id})]
        print("Successfully loaded transcript using youtube-transcript-api directly.")
        return docs
    except Exception as e:
        print(f"Error loading transcript directly: {e}")
        # Fallback to LangChain loader if direct method fails (though it's failing now)
        print("Attempting to use LangChain's YoutubeLoader as a fallback.")
        try:
            loader = YoutubeLoader.from_youtube_url(url, add_video_info=False)
            docs = loader.load()
            print("Successfully loaded transcript using LangChain's YoutubeLoader.")
            return docs
        except Exception as lang_e:
            print(f"Error loading transcript with LangChain: {lang_e}")
            return []

**Step 3: Split transcript into chunks (for long videos)**

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_documents(documents, chunk_size=1000, chunk_overlap=100):
    """
    Split documents into chunks suitable for LLM input.
    """
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = splitter.split_documents(documents)
    return docs


**Step 4: Load Hugging Face LLM with LangChain**

In [None]:
import os
os.environ["HF_TOKEN"] = "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"

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_community.llms import HuggingFacePipeline

def load_local_llm():
    model_id = "meta-llama/Llama-3.2-3B-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto", token = os.getenv("HF_TOKEN"))
    if tokenizer.pad_token_id is None:
       tokenizer.pad_token_id = tokenizer.eos_token_id

    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=1024)

    llm = HuggingFacePipeline(pipeline=pipe)
    return llm


**Step 5: Build prompt template & do summarization or QA**

In [None]:
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
import warnings
warnings.filterwarnings("ignore")

def summarize_docs(llm, docs):
    """
    Load video, split, run summarization chain, and return clean summary.
    """
    chain = load_summarize_chain(
        llm,
        chain_type="map_reduce",  # handles long text better
        verbose=False
    )
    final_summary = chain.run(docs)
    return final_summary.strip()


**Step 6: QA from transcript**

In [None]:
def ask_question(llm, docs, question):
    """
    Ask a question about the transcript (combine all chunks).
    """
    full_text = " ".join([doc.page_content for doc in docs])
    template = """
    You are a helpful AI assistant.

    Here is the transcript:
    \"\"\"
    {transcript}
    \"\"\"

    Now answer the question:
    \"\"\"
    {question}
    \"\"\"

    Return only the answer.
    """
    prompt = PromptTemplate.from_template(template)
    final_prompt = prompt.format(transcript=full_text, question=question)

    answer = llm.invoke(final_prompt)
    return answer


**Step 7: Load the LLM**

In [None]:
llm = load_local_llm()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


**Step 8: Final driver code**

In [None]:
!pip install --upgrade youtube-transcript-api



In [None]:
def load_and_split_transcript(url, chunk_size=1000, chunk_overlap=100):
    documents = load_youtube_transcript(url)
    docs = split_documents(documents, chunk_size, chunk_overlap)
    return docs

In [None]:
url = "https://www.youtube.com/watch?v=62DxELjuRec"
docs = load_and_split_transcript(url)

Error loading transcript directly: type object 'YouTubeTranscriptApi' has no attribute 'list_transcripts'
Attempting to use LangChain's YoutubeLoader as a fallback.
Error loading transcript with LangChain: type object 'YouTubeTranscriptApi' has no attribute 'list_transcripts'


In [None]:
docs

[]

In [None]:
summary = summarize_docs(llm, docs)
print(summary)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Write a concise summary of the following:


""


CONCISE SUMMARY: 
1. **Economic growth and inflation**: The OECD report highlights that the global economy is experiencing a slowdown, with many countries facing significant challenges in maintaining economic growth and controlling inflation.
2. **Supply chain disruptions**: The report notes that supply chain disruptions, caused by the COVID-19 pandemic and other factors, continue to impact global trade and economic activity.
3. **Investment and productivity**: The OECD report emphasizes the need for governments to invest in education, research, and innovation to improve productivity and competitiveness.
4. **Global governance and trade**: The report stresses the importance of global governance and trade agreements in promoting economic stability and cooperation among countries.
5. **Climate change and environmental concerns**: The OECD report notes that climate change and environmental concerns are becoming increasingly important issues

In [None]:
answer = ask_question(llm, docs, "What is great depression as per the video?")
print(answer)


    You are a helpful AI assistant.

    Here is the transcript:
    """
    
    """

    Now answer the question:
    """
    What is great depression as per the video?
    """

    Return only the answer.
     The Great Depression was a global economic downturn that lasted from 1929 to the late 1930s. It was the longest and most severe depression of the 20th century. The Depression was characterized by widespread unemployment, poverty, and a sharp decline in international trade. The stock market crash of 1929 is often seen as the trigger for the Depression, but other factors such as overproduction, underconsumption, and a decline in international trade also contributed to the crisis. The Great Depression had a profound impact on society, leading to widespread poverty, homelessness, and despair. It also led to a significant increase in government intervention in the economy, including the establishment of programs such as the New Deal and the Social Security system.


In [None]:
answer = ask_question(llm, docs, "What is black thursday?")
print(answer)


    You are a helpful AI assistant.

    Here is the transcript:
    """
    
    """

    Now answer the question:
    """
    What is black thursday?
    """

    Return only the answer.
    



    Black Thursday was a stock market crash that occurred on October 24, 1929, also known as Black Tuesday's precursor. It is considered one of the worst days in U.S. stock market history, with stock prices plummeting by as much as 12% in a single day. The crash was a key event in the Great Depression, which began just a few weeks later. Black Thursday marked the beginning of a period of severe economic downturn, including widespread job losses, business failures, and a sharp decline in consumer spending. The crash also led to a sharp increase in unemployment, which peaked at over 25% in 1933. The stock market crash of 1929 was a major turning point in the history of the United States, and it had far-reaching consequences for the economy, politics, and society as a whole.


In [None]:
from youtube_transcript_api import YouTubeTranscriptApi

url = "https://www.youtube.com/watch?v=62DxELjuRec"
video_id = url.split("=")[1]

try:
    transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
    print("Successfully listed transcripts using youtube-transcript-api directly.")
    # You can further process the transcript_list here if needed
except Exception as e:
    print(f"Error when trying to list transcripts directly: {e}")

Error when trying to list transcripts directly: type object 'YouTubeTranscriptApi' has no attribute 'list_transcripts'
