# AI in Content Generation: Youtube Ideas from historical content performance


## Channel: This Week in Startups

The provided code is a class called "Summarizer" that uses the Langchain library to summarize YouTube videos. It has the following key functionalities:

Loading Data: The \_load_data() method utilizes the YoutubeLoader class from the Langchain library to load YouTube video data, including the video's content and metadata.

Chunking Data: The create_chunks() method splits the video content into smaller text chunks using the RecursiveCharacterTextSplitter class. This helps in processing large text inputs more efficiently.

Summarizing: The summarize() method initializes a summarization chain from the Langchain library using the load_summarize_chain() function. It then asynchronously runs the summarization chain on the text chunks generated earlier using the \_chain_run() method. The summary result is collected and returned along with the video metadata.

Overall, this code provides a convenient way to summarize YouTube videos by leveraging Langchain's text processing capabilities.

Don't forget to install the required packages specified in `requirements.txt` file, before executing the script.

In [1]:
import asyncio
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import YoutubeLoader

import pandas as pd


class Summarizer:
    def __init__(self, url, llm):
        self.url = url
        self.llm = llm
        self.doc_chunks = []
        self.metadata = []

    def _load_data(self):
        loader = YoutubeLoader.from_youtube_url(self.url, add_video_info=True)

        return loader.load()

    def create_chunks(self):
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=3500,
            chunk_overlap=20,
            length_function=len,
        )
        text = self._load_data()
        self.doc_chunks = text_splitter.create_documents(
            [doc.page_content for doc in text]
        )
        self.metadata = text[0].metadata

        return

    async def _chain_run(self, chain, docs):
        return await chain.arun(docs)

    async def summarize(self):
        summarizer_chain = load_summarize_chain(llm=self.llm, chain_type="map_reduce")
        tasks = [self._chain_run(summarizer_chain, self.doc_chunks)]
        summary = await asyncio.gather(*tasks)

        return {"summary": summary[0], "metadata": self.metadata}

In [2]:
import concurrent.futures
import asyncio
import time


def process_summary(url):
    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.5)
    summarizer = Summarizer(url, llm)
    summarizer.create_chunks()

    return asyncio.run(summarizer.summarize())


def pool_executor(urls):
    results = []

    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Submit each URL to the executor
        futures = [executor.submit(process_summary, url) for url in urls]

        # Wait for all futures to complete
        for future in concurrent.futures.as_completed(futures):
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                print(f"Error processing URL: {e}")

    return results

In [3]:
high_engagement_vids_urls = [
    "https://www.youtube.com/watch?v=_8bMMqy37y8&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=1SWEF-lyW28&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=oc5tHbEK0IQ&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=jrd4snFDSVA&ab_channel=ThisWeekinStartups",
]

low_engagement_vids_urls = [
    "https://www.youtube.com/watch?v=UeIV4KcSUlk",
    "https://www.youtube.com/watch?v=hNcLMN_bZCM",
    "https://www.youtube.com/watch?v=ANd4jPLnMAU",
    "https://www.youtube.com/watch?v=J8YnxrGEzT4",
]

high_engagement_summaries = pool_executor(high_engagement_vids_urls)

low_engagement_summaries = pool_executor(low_engagement_vids_urls)

Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-ZMYqvRsyrZeVCAGtcdg16fJA on tokens per

Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-ZMYqvRsyrZeVCAGtcdg16fJA on tokens per min. Limit: 90000 / min. Current: 89096 / min. Contact us through our help center at help.openai.com if you continue to have issues..
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-ZMYqvRsyrZeVCAGtcdg16fJA on tokens per min. Limit: 90000 / min. Current: 89975 / min. Contact us through our help center at help.openai.com if you continue to have issues..
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-ZMYqvRsyrZeVCAGtcdg16fJA on tokens per min. Limi

Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-ZMYqvRsyrZeVCAGtcdg16fJA on tokens per min. Limit: 90000 / min. Current: 89692 / min. Contact us through our help center at help.openai.com if you continue to have issues..
Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-ZMYqvRsyrZeVCAGtcdg16fJA on tokens per min. Limit: 90000 / min. Current: 89689 / min. Contact us through our help center at help.openai.com if you continue to have issues..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-ZMYqvRsyrZeVCAGtcdg16fJA on tokens per

In [4]:
high_eng_df = pd.DataFrame.from_records(high_engagement_summaries)
high_eng_df

Unnamed: 0,summary,metadata
0,The conversation between Jason Calacanis and B...,{'source': 'oc5tHbEK0IQ&ab_channel=ThisWeekinS...
1,The summary covers various topics including th...,{'source': '_8bMMqy37y8&ab_channel=ThisWeekinS...
2,The speaker discusses various topics related t...,{'source': '1SWEF-lyW28&ab_channel=ThisWeekinS...
3,The summary discusses various topics including...,{'source': 'jrd4snFDSVA&ab_channel=ThisWeekinS...


In [5]:
low_eng_df = pd.DataFrame.from_records(low_engagement_summaries)
low_eng_df

Unnamed: 0,summary,metadata
0,The summary discusses a crypto Roundtable disc...,"{'source': 'UeIV4KcSUlk', 'title': 'Banking cr..."
1,The podcast episode covers various topics incl...,"{'source': 'ANd4jPLnMAU', 'title': 'Coinbase c..."
2,"The podcast episode features Coffeezilla, a Yo...","{'source': 'J8YnxrGEzT4', 'title': 'Logan Paul..."
3,The provided text covers a range of topics inc...,"{'source': 'hNcLMN_bZCM', 'title': 'Bing dodge..."


In [6]:
def format_summaries(data):
    formatted_strings = []
    for i, obj in enumerate(data, start=1):
        summary = obj["summary"]
        views = obj["metadata"]["view_count"]
        title = obj["metadata"]["title"].split("|")[0].strip()

        formatted_string = (
            f"Video {i}\nTitle: {title}\nView Count: {views}\nSummary: {summary}\n"
        )
        formatted_strings.append(formatted_string)

    result = "\n".join(formatted_strings)
    return result


high_eng_prompt = format_summaries(high_engagement_summaries)
low_eng_prompt = format_summaries(low_engagement_summaries)

In [7]:
print(high_eng_prompt)

Video 1
Title: Fireside chat with Jason Calacanis & Brad Gerstner hosted by Mubadala’s Ibrahim Ajami
View Count: 100551
Summary: The conversation between Jason Calacanis and Brad Gerstner covers various topics related to the technology industry, including their experiences in Silicon Valley, their respective companies and investments, and the challenges of building successful companies. They discuss the current state of the industry, the importance of innovation and perseverance, and the need for founders to stay focused on their products and customers. They also touch on the impact of AI, the potential of the metaverse, and the evolution of funding and technology. The speaker emphasizes the importance of data, the cloud, and AI in driving technological advancements and economic growth. They also discuss the role of venture capital, the importance of resilience and humility, and the benefits of in-person collaboration. The speaker encourages founders to be intellectually honest and tra

In [8]:
print(low_eng_prompt)

Video 1
Title: Banking crisis impact, more Meta cuts, and GPT-4 with Sunny Madra and Vinny Lingham
View Count: 20470
Summary: The summary discusses a crypto Roundtable discussion with entrepreneurs Sunny and Vinnie. They cover topics such as the impact of the banking contagion on the crypto industry, potential adoption of a CBDC by the US, risk management, social media's influence on banking, and the need for startups to adapt. The summary also mentions the layoffs at Zuckerberg's company and the importance of being Sock 2 compliant for startups. It discusses the challenges faced by banks and startups in the funding environment and the potential risks of inflated valuations. The speaker emphasizes the importance of transparency and stability in the cryptocurrency market, as well as the potential of stablecoins and the need for regulation. The summary also touches on interest rates, the value of podcasts, and the changes being made by Zuckerberg at Facebook. It concludes with a mention 

In [9]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI

prompt_template = """ You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.\
Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,\
come up with new ideas for next videos.\
    
High_Engagement_Videos:
{high_engagement_videos}
    
Low_Engagement_Videos:
{low_engagement_videos}

Given the above High_Engagement_Videos and Low_Engagement_Videos, generate new ideas and themes.
New ideas should be related to the High_Engagement_Videos by keeping the titles and summaries of the episodes in context and must be based on\
the common patterns between the High_Engagement_Videos and the guests in those episodes.\
The new videos should not have any content from Low_Engagement_Videos.\
Make sure to not include any speaker name in your suggested video topics or themes.
Make sure to return at least 10 new ideas. Your response must be a csv file which contains the following columns:\
Topic, Theme, Summary. Summary should contain points to talk on the show and must be 100 words at max. Use | as a seperator\
and do not append any extra line in your csv response. Each row must have proper data and columns in each row must be three.
If you don't know the answer, just say "Hmm, I'm not sure."\
Don't try to make up an answer. 
"""
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.5)
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["high_engagement_videos", "low_engagement_videos"],
)
ideas_chain = LLMChain(llm=llm, prompt=PROMPT, verbose=True)

In [10]:
response = ideas_chain.run(
    {
        "high_engagement_videos": high_eng_prompt,
        "low_engagement_videos": low_eng_prompt,
    }
)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,come up with new ideas for next videos.    
High_Engagement_Videos:
Video 1
Title: Fireside chat with Jason Calacanis & Brad Gerstner hosted by Mubadala’s Ibrahim Ajami
View Count: 100551
Summary: The conversation between Jason Calacanis and Brad Gerstner covers various topics related to the technology industry, including their experiences in Silicon Valley, their respective companies and investments, and the challenges of building successful companies. They discuss the current state of the industry, the importance of innovation and perseverance, and the need for founders to stay focused on their products and customers. They also touch on the impact of AI, the potential of the metaverse, an


[1m> Finished chain.[0m


In [27]:
response

'Topic|Theme|Summary\nAI in the Music Industry|Exploring the impact of AI on music creation and production|Discuss the role of AI in revolutionizing the music industry, including AI-powered music creation tools and platforms. Explore the benefits and limitations of using AI in music production, and the potential for AI to enhance creativity and collaboration among musicians. Highlight successful examples of AI-generated music and discuss the ethical considerations and copyright issues surrounding AI-generated music.\nThe Future of Podcasting|Examining the potential of AI in podcasting|Discuss the role of AI in podcasting, including AI-powered transcription and editing tools. Explore how AI can improve podcast discovery and recommendation algorithms, and enhance the listener experience through personalized content. Highlight the benefits of using AI in podcast production, such as automated editing and audio enhancement. Discuss the potential challenges and ethical considerations of AI i

In [28]:
import pandas as pd
from io import StringIO

csv_file = StringIO(response)
# Read the CSV data and create a DataFrame
df = pd.read_csv(csv_file, sep="|")

In [29]:
df

Unnamed: 0,Topic,Theme,Summary
0,AI in the Music Industry,Exploring the impact of AI on music creation a...,Discuss the role of AI in revolutionizing the ...
1,The Future of Podcasting,Examining the potential of AI in podcasting,"Discuss the role of AI in podcasting, includin..."
2,The Rise of Voice Assistants,Exploring the evolution and impact of voice as...,Discuss the history and evolution of voice ass...
3,The Metaverse: A New Frontier,Examining the concept and potential of the met...,"Discuss the concept of the metaverse, a virtua..."
4,The Future of Venture Capital,Analyzing the evolving landscape of venture ca...,"Discuss the current state of venture capital, ..."
5,The Power of Networking,Exploring the importance of networking in the ...,Discuss the role of networking in building suc...
6,The Future of Funding in Emerging Markets,Examining the potential for tech ecosystems in...,Discuss the opportunities and challenges of de...
7,The Impact of AI in Real Estate,Exploring the role of AI in the real estate in...,Discuss the applications of AI in the real est...
8,The Future of Technology in Education,Examining the role of technology in transformi...,Discuss the potential of technology in improvi...
9,The Evolution of Streaming Platforms,Analyzing the changing landscape of streaming ...,"Discuss the evolution of streaming platforms, ..."
