# Imports

In [None]:
import sys
import os
sys.path.append(os.path.dirname(os.getcwd()))

In [2]:
from langchain_community.llms import Ollama
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain.chains import LLMChain
from langchain.text_splitter import TokenTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.schema import Document
from mp4_downloader import process_youtube_video
from transcribe import WhisperTranscriber

In [2]:
process_youtube_video("https://www.youtube.com/watch?v=0bb3-bjgf88")
transcriber = WhisperTranscriber()
transcribed_text = transcriber.transcribe(r"Saved_Media\audio.mp3")
transcribed_text

[youtube] Extracting URL: https://www.youtube.com/watch?v=0bb3-bjgf88
[youtube] 0bb3-bjgf88: Downloading webpage
[youtube] 0bb3-bjgf88: Downloading ios player API JSON
[youtube] 0bb3-bjgf88: Downloading web creator player API JSON
[youtube] 0bb3-bjgf88: Downloading m3u8 information
[info] 0bb3-bjgf88: Downloading 1 format(s): 18
[download] Destination: Saved_Media\video.mp4
[download] 100% of   21.84MiB in 00:00:01 at 13.67MiB/s    
[youtube] Extracting URL: https://www.youtube.com/watch?v=0bb3-bjgf88
[youtube] 0bb3-bjgf88: Downloading webpage
[youtube] 0bb3-bjgf88: Downloading ios player API JSON
[youtube] 0bb3-bjgf88: Downloading web creator player API JSON
[youtube] 0bb3-bjgf88: Downloading m3u8 information
[info] 0bb3-bjgf88: Downloading 1 format(s): 251
[download] Destination: Saved_Media\audio.webm
[download] 100% of    7.14MiB in 00:00:00 at 19.54MiB/s    
Video and audio download completed!
MoviePy - Writing audio in Saved_Media\audio.mp3


                                                                        

MoviePy - Done.
Audio conversion to MP3 completed!
Using device: cuda


Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


" Let's get lean as Torvald's thoughts on Rust in the Linux kernel, which has been a contentious topic over the last few weeks, especially between some of the long-term maintainers and new contributors, and what their individual challenges are by bringing Rust into the Linux kernel with even one maintainer stepping down for their almost four-year role in the Rust for Linux Project, citing some non-technical nonsense that was going on, and that they were lacking the energy to deal with. Anyways, here's a recent conversation that Lean has had with Dirk, who is the head of the  the open source program office app horizon, let's get into what leanest things about rust in the Linux kernel and these recent events.  the folks from Azahi working on the Apple Silicon GFX drivers and the DRM's get people. So why is this so hard? I think I actually enjoyed. I enjoy arguments. I think one of the nice parts about Rust has been how it's live and up some of the discussions. And I mean some of the argu

# Loading and Testing the model

In [3]:
def initialize_llm():
    callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
    return Ollama(base_url="http://localhost:11434", model="unquantphi", callback_manager=callback_manager)

llm = initialize_llm()

system_prompt = "You are a helpful AI assistant that provides clear and concise information."
user_prompt = "Write step by step the answer to 2x+3=7."

# Construct the full prompt using the Hugging Face format
full_prompt = f"<|system|> {system_prompt}<|end|><|user|> {user_prompt}<|end|><|assistant|> "

# Run the test
print("Testing LLM with prompt:")
print(full_prompt)
response = llm(full_prompt)

Testing LLM with prompt:
<|system|> You are a helpful AI assistant that provides clear and concise information.<|end|><|user|> Write step by step the answer to 2x+3=7.<|end|><|assistant|> 
1. First, you need to understand what an equation is: it's a mathematical statement where two expressions are equal (e.g., "5 + 4 = 9"). In this case, we have "2x + 3 = 7", which means that whatever value x takes on, when substituted into the expression '2x + 3', should result in an outcome of 7.

  2. Our goal here is to solve for x by isolating it on one side of the equation. To do this, we'll need to perform operations that will eliminate any other terms or numbers from its immediate vicinity but won't change our original aim (the value of 'x'). 

3. The first step in solving a linear equation is usually straightforward: get rid of constant terms on the same side as your variable(s). In this case, we can subtract 3 from both sides to achieve that goal. So let us do it!
  2x + 3 - 3 = 7 - 3
    Sim

# Initializing LLM

In [4]:
def initialize_llm():
    callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
    return Ollama(base_url="http://localhost:11434", model="quantphi", callback_manager=callback_manager)

llm = initialize_llm()

## Initalizing the text splitter

In [5]:
text_splitter = TokenTextSplitter(
    chunk_size=10000,  
    chunk_overlap=200
)

# Prompts Template

In [6]:
map_template = """<|system|>
You are an AI assistant specialized in summarizing portions of YouTube video transcripts. Your task is to provide clear and concise summaries of transcript chunks.
<|end|>
<|user|>
Please summarize the following portion of a YouTube video transcript. Focus on the main points and key ideas:

{text}

Provide a brief summary of this chunk.
<|end|>
<|assistant|>
"""

map_prompt = ChatPromptTemplate.from_template(map_template)

# Define the refinement template
refine_template = """<|system|>
You are an AI assistant specialized in refining and structuring summaries of YouTube video transcripts. Your task is to create a comprehensive and well-organized final summary.
<|end|>
<|user|>
We have summarized a YouTube video transcript in parts. Here's a summary of what we have so far:

{existing_answer}

We have some new information to add:

{text}

Please refine the existing summary by incorporating this new information. Ensure the final summary is well-structured and follows this format:

1. Main Topic:
2. Key Points:
3. Notable Details:
4. Conclusion or Call to Action (if any):

If any section is not applicable, you may omit it.
<|end|>
<|assistant|>
"""

refine_prompt = ChatPromptTemplate.from_template(refine_template)

# Initializing the summarization chain

In [7]:
summarize_chain = load_summarize_chain(
    llm,
    chain_type="refine",
    question_prompt=map_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
    verbose=True
)


# Summarizing Youtube video

In [8]:
def summarize_transcript(transcript):
    
    chunks = text_splitter.split_text(transcript)
    docs = [Document(page_content=chunk) for chunk in chunks]
    
    result = summarize_chain({"input_documents": docs}) 
    return result["output_text"]

print("Getting YouTube Video's Audio...")
process_youtube_video("https://www.youtube.com/watch?v=E0Hmnixke2g")
print("Transcribing Audio...")
transcriber = WhisperTranscriber()
transcribed_text = transcriber.transcribe(r"Saved_Media\audio.mp3")
print("Summarizing YouTube transcript...")
summary = summarize_transcript(transcribed_text)
print("\nFinal Summary:")
print(summary)

Getting YouTube Video's Audio...
[youtube] Extracting URL: https://www.youtube.com/watch?v=E0Hmnixke2g
[youtube] E0Hmnixke2g: Downloading webpage
[youtube] E0Hmnixke2g: Downloading ios player API JSON
[youtube] E0Hmnixke2g: Downloading web creator player API JSON
[youtube] E0Hmnixke2g: Downloading m3u8 information
[info] E0Hmnixke2g: Downloading 1 format(s): 18
[download] Destination: Saved_Media\video.mp4
[download] 100% of   22.43MiB in 00:00:02 at 10.48MiB/s    
[youtube] Extracting URL: https://www.youtube.com/watch?v=E0Hmnixke2g
[youtube] E0Hmnixke2g: Downloading webpage
[youtube] E0Hmnixke2g: Downloading ios player API JSON
[youtube] E0Hmnixke2g: Downloading web creator player API JSON
[youtube] E0Hmnixke2g: Downloading m3u8 information
[info] E0Hmnixke2g: Downloading 1 format(s): 251
[download] Destination: Saved_Media\audio.webm
[download] 100% of   14.49MiB in 00:00:01 at 13.06MiB/s    
Video and audio download completed!
MoviePy - Writing audio in Saved_Media\audio.mp3


                                                                        

MoviePy - Done.
Audio conversion to MP3 completed!
Transcribing Audio...
Using device: cuda
Summarizing YouTube transcript...


Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3mHuman: <|system|>
You are an AI assistant specialized in summarizing portions of YouTube video transcripts. Your task is to provide clear and concise summaries of transcript chunks.
<|end|>
<|user|>
Please summarize the following portion of a YouTube video transcript. Focus on the main points and key ideas:

 In the next 17 minutes, I will give you an overview of the most important machine learning algorithms to help you decide which one is right for your problem. My name is Tim, and I have been a data scientist for over 10 years and taught all of these algorithms to hundreds of students in real life machine learning boot camps. There's a simple strategy for picking the right algorithm for your problem. In 17 minutes, you will know how to pick the right one for any problem and get a basic intuition of each algorithm and how they relate to each other. My goal is to give as many of you as possible an intuitive understanding of the major machine learn