The provided code is a class called "Summarizer" that uses the Langchain library to summarize YouTube videos. It has the following key functionalities:

Loading Data: The \_load_data() method utilizes the YoutubeLoader class from the Langchain library to load YouTube video data, including the video's content and metadata.

Chunking Data: The create_chunks() method splits the video content into smaller text chunks using the RecursiveCharacterTextSplitter class. This helps in processing large text inputs more efficiently.

Summarizing: The summarize() method initializes a summarization chain from the Langchain library using the load_summarize_chain() function. It then asynchronously runs the summarization chain on the text chunks generated earlier using the \_chain_run() method. The summary result is collected and returned along with the video metadata.

Overall, this code provides a convenient way to summarize YouTube videos by leveraging Langchain's text processing capabilities.


In [6]:
import asyncio
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import YoutubeLoader


class Summarizer:
    def __init__(self, url, llm):
        self.url = url
        self.llm = llm
        self.doc_chunks = []
        self.metadata = []

    def _load_data(self):
        loader = YoutubeLoader.from_youtube_url(self.url, add_video_info=True)

        return loader.load()

    def create_chunks(self):
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=3500,
            chunk_overlap=20,
            length_function=len,
        )
        text = self._load_data()
        self.doc_chunks = text_splitter.create_documents(
            [doc.page_content for doc in text]
        )
        self.metadata = text[0].metadata

        return

    async def _chain_run(self, chain, docs):
        return await chain.arun(docs)

    async def summarize(self):
        summarizer_chain = load_summarize_chain(llm=self.llm, chain_type="map_reduce")
        tasks = [self._chain_run(summarizer_chain, self.doc_chunks)]
        summary = await asyncio.gather(*tasks)

        return {"summary": summary[0], "metadata": self.metadata}

In [20]:
import concurrent.futures
import asyncio
import time


def process_summary(url):
    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.5)
    summarizer = Summarizer(url, llm)
    summarizer.create_chunks()

    return asyncio.run(summarizer.summarize())


def pool_executor(urls):
    results = []

    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Submit each URL to the executor
        futures = [executor.submit(process_summary, url) for url in urls]

        # Wait for all futures to complete
        for future in concurrent.futures.as_completed(futures):
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                print(f"Error processing URL: {e}")

    return results

In [23]:
high_engagement_vids_urls = [
    "https://www.youtube.com/watch?v=_8bMMqy37y8&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=1SWEF-lyW28&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=oc5tHbEK0IQ&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=jrd4snFDSVA&ab_channel=ThisWeekinStartups",
]
low_engagement_urls = [
    "https://www.youtube.com/watch?v=UeIV4KcSUlk",
    "https://www.youtube.com/watch?v=hNcLMN_bZCM",
    "https://www.youtube.com/watch?v=ANd4jPLnMAU",
    "https://www.youtube.com/watch?v=J8YnxrGEzT4",
]
low_engagement_vids_urls = [
    "https://www.youtube.com/watch?v=UeIV4KcSUlk",
    "https://www.youtube.com/watch?v=hNcLMN_bZCM",
    "https://www.youtube.com/watch?v=ANd4jPLnMAU",
    "https://www.youtube.com/watch?v=J8YnxrGEzT4",
]

high_engagement_summaries = pool_executor(high_engagement_vids_urls)

low_engagement_summaries = pool_executor(low_engagement_vids_urls)

Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 11b9bac68b83712f1839ff3d060be04c in your message.).
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out.
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out.
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include t

In [24]:
high_engagement_summaries

[{'summary': 'The hosts of "This Week in Startups" discuss the importance of incorporating social features into products to drive growth, the popularity of AI in Silicon Valley, and the potential impact of AI on jobs. They also discuss various AI tools and their capabilities, as well as the potential for AI in podcasting. The conversation touches on topics such as real estate, drug problems in San Francisco, and the importance of managing cash for startups. The hosts also mention various startups and events in the industry.',
  'metadata': {'source': 'jrd4snFDSVA&ab_channel=ThisWeekinStartups',
   'title': 'GPT-4 web browsing, plugin demos, AI layoffs + more with Sunny Madra & Vinny Lingham | E1737',
   'description': None,
   'view_count': 92613,
   'thumbnail_url': 'https://i.ytimg.com/vi/jrd4snFDSVA/hq720.jpg',
   'publish_date': datetime.datetime(2023, 5, 8, 0, 0),
   'length': 3215,
   'author': 'This Week in Startups'}},
 {'summary': 'The podcast discusses various AI tools and th

In [25]:
low_engagement_summaries

[{'summary': 'In this episode of "This Week in Startups," the speakers discuss the impact of the recent banking crisis on startups and investors, the potential adoption of a CBDC by the US, risk management, and the changing startup ecosystem. They also touch on the recent layoffs at Facebook and the importance of playing the startup game according to the current rules. The episode is sponsored by Cast AI, Vanta, and OrgSpace. The speakers highlight the need for transparency and regulation in the banking and crypto industries, as well as the importance of scenario planning for startups. They also discuss the benefits of innovative tools and technologies for investors and startups.',
  'metadata': {'source': 'UeIV4KcSUlk',
   'title': 'Banking crisis impact, more Meta cuts, and GPT-4 with Sunny Madra and Vinny Lingham | E1700',
   'description': None,
   'view_count': 20415,
   'thumbnail_url': 'https://i.ytimg.com/vi/UeIV4KcSUlk/hq720.jpg',
   'publish_date': datetime.datetime(2023, 3, 

In [61]:
def format_summaries(data):
    formatted_strings = []
    for i, obj in enumerate(data, start=1):
        summary = obj["summary"]
        views = obj["metadata"]["view_count"]
        title = obj["metadata"]["title"].split("|")[0].strip()

        formatted_string = (
            f"Video {i}\nTitle: {title}\nView Count: {views}\nSummary: {summary}\n"
        )
        formatted_strings.append(formatted_string)

    result = "\n".join(formatted_strings)
    return result


hien_prompt_str = format_summaries(high_engagement_summaries)
lowen_prompt_str = format_summaries(low_engagement_summaries)

In [31]:
hien_prompt_str

'Video 1\nTitle: GPT-4 web browsing, plugin demos, AI layoffs + more with Sunny Madra & Vinny Lingham\nView Count: 92613\nSummary: The hosts of "This Week in Startups" discuss the importance of incorporating social features into products to drive growth, the popularity of AI in Silicon Valley, and the potential impact of AI on jobs. They also discuss various AI tools and their capabilities, as well as the potential for AI in podcasting. The conversation touches on topics such as real estate, drug problems in San Francisco, and the importance of managing cash for startups. The hosts also mention various startups and events in the industry.\n\nVideo 2\nTitle: Demoing Google’s MusicLM, AssemblyAI, and other AI tools with Sunny Madra\nView Count: 104079\nSummary: The podcast discusses various AI tools and their potential impact on podcasting, including transcription tools, music creation tools, and even an AI-powered podcast generator. The speakers also discuss the challenges of copyright 

In [63]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI

prompt_template = """ You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.\
Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,\
come up with new ideas for next videos.\
    
High_Engagement_Videos:
{high_perf_vids}
    
Low_Engagement_Videos:
{low_perf_vids}

Given the above High_Engagement_Videos and Low_Engagement_Videos, generate new ideas and themes.
New ideas should be related to the High_Engagement_Videos by keeping the titles and summaries of the episodes in context and must be based on\
the common patterns between the High_Engagement_Videos and the guests in those episodes.\
The new videos should not have any content from Low_Engagement_Videos.\
Make sure to not include any speaker name in your suggested video topics or themes.
Make sure to return at least 10 new ideas. Your response must be a csv file which contains the following columns:\
Topic, Theme, Summary. Summary should contain points to talk on the show and must be 100 words at max. Use | as a seperator\
and do not append any extra line in your csv response. Each row must have proper data and columns in each row must be three.
If you don't know the answer, just say "Hmm, I'm not sure."\
Don't try to make up an answer. 
"""
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.5)
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["high_perf_vids", "low_perf_vids"],
)
ideas_chain = LLMChain(llm=llm, prompt=PROMPT, verbose=True)

In [64]:
response = ideas_chain.run(
    {
        "high_perf_vids": hien_prompt_str,
        "low_perf_vids": lowen_prompt_str,
    }
)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,come up with new ideas for next videos.    
High_Engagement_Videos:
Video 1
Title: GPT-4 web browsing, plugin demos, AI layoffs + more with Sunny Madra & Vinny Lingham
View Count: 92613
Summary: The hosts of "This Week in Startups" discuss the importance of incorporating social features into products to drive growth, the popularity of AI in Silicon Valley, and the potential impact of AI on jobs. They also discuss various AI tools and their capabilities, as well as the potential for AI in podcasting. The conversation touches on topics such as real estate, drug problems in San Francisco, and the importance of managing cash for startups. The hosts also mention various startups and events in th

In [None]:
response

'Topic | Theme | Summary\n--- | --- | ---\nArtificial Intelligence | Future of AI in Healthcare | Discuss the potential of AI in healthcare, including diagnosis, treatment, and patient care. Touch on the ethical concerns surrounding the use of AI in healthcare and the need for transparency and regulation. Highlight the benefits of AI in improving patient outcomes and reducing healthcare costs. | \nEntrepreneurship | Building a Successful Startup Ecosystem | Discuss the key components of a successful startup ecosystem, including access to capital, talent, and mentorship. Touch on the importance of government support and policies that promote entrepreneurship. Highlight successful startup ecosystems around the world and the lessons that can be learned from them. | \nTechnology Trends | The Future of Podcasting | Discuss the potential of AI in podcasting, including transcription tools, music creation tools, and even an AI-powered podcast generator. Touch on the challenges of copyright own

In [None]:
import pandas as pd
from io import StringIO

csv_file = StringIO(response)
# Read the CSV data and create a DataFrame
df = pd.read_csv(csv_file, sep="|")