# AI in Content Generation: Youtube Ideas from historical content performance


## Channel: This Week in Startups

The provided code is a class called "Summarizer" that uses the Langchain library to summarize YouTube videos. It has the following key functionalities:

Loading Data: The \_load_data() method utilizes the YoutubeLoader class from the Langchain library to load YouTube video data, including the video's content and metadata.

Chunking Data: The create_chunks() method splits the video content into smaller text chunks using the RecursiveCharacterTextSplitter class. This helps in processing large text inputs more efficiently.

Summarizing: The summarize() method initializes a summarization chain from the Langchain library using the load_summarize_chain() function. It then asynchronously runs the summarization chain on the text chunks generated earlier using the \_chain_run() method. The summary result is collected and returned along with the video metadata.

Overall, this code provides a convenient way to summarize YouTube videos by leveraging Langchain's text processing capabilities.


In [None]:
import asyncio
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import YoutubeLoader

import pandas as pd


class Summarizer:
    def __init__(self, url, llm):
        self.url = url
        self.llm = llm
        self.doc_chunks = []
        self.metadata = []

    def _load_data(self):
        loader = YoutubeLoader.from_youtube_url(self.url, add_video_info=True)

        return loader.load()

    def create_chunks(self):
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=3500,
            chunk_overlap=20,
            length_function=len,
        )
        text = self._load_data()
        self.doc_chunks = text_splitter.create_documents(
            [doc.page_content for doc in text]
        )
        self.metadata = text[0].metadata

        return

    async def _chain_run(self, chain, docs):
        return await chain.arun(docs)

    async def summarize(self):
        summarizer_chain = load_summarize_chain(llm=self.llm, chain_type="map_reduce")
        tasks = [self._chain_run(summarizer_chain, self.doc_chunks)]
        summary = await asyncio.gather(*tasks)

        return {"summary": summary[0], "metadata": self.metadata}

In [None]:
import concurrent.futures
import asyncio
import time


def process_summary(url):
    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.5)
    summarizer = Summarizer(url, llm)
    summarizer.create_chunks()

    return asyncio.run(summarizer.summarize())


def pool_executor(urls):
    results = []

    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Submit each URL to the executor
        futures = [executor.submit(process_summary, url) for url in urls]

        # Wait for all futures to complete
        for future in concurrent.futures.as_completed(futures):
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                print(f"Error processing URL: {e}")

    return results

In [None]:
high_engagement_vids_urls = [
    "https://www.youtube.com/watch?v=_8bMMqy37y8&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=1SWEF-lyW28&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=oc5tHbEK0IQ&ab_channel=ThisWeekinStartups",
    "https://www.youtube.com/watch?v=jrd4snFDSVA&ab_channel=ThisWeekinStartups",
]

low_engagement_vids_urls = [
    "https://www.youtube.com/watch?v=UeIV4KcSUlk",
    "https://www.youtube.com/watch?v=hNcLMN_bZCM",
    "https://www.youtube.com/watch?v=ANd4jPLnMAU",
    "https://www.youtube.com/watch?v=J8YnxrGEzT4",
]

high_engagement_summaries = pool_executor(high_engagement_vids_urls)

low_engagement_summaries = pool_executor(low_engagement_vids_urls)

In [None]:
high_eng_df = pd.DataFrame.from_records(high_engagement_summaries)
high_eng_df

In [None]:
low_eng_df = pd.DataFrame.from_records(low_engagement_summaries)
low_eng_df

In [None]:
def format_summaries(data):
    formatted_strings = []
    for i, obj in enumerate(data, start=1):
        summary = obj["summary"]
        views = obj["metadata"]["view_count"]
        title = obj["metadata"]["title"].split("|")[0].strip()

        formatted_string = (
            f"Video {i}\nTitle: {title}\nView Count: {views}\nSummary: {summary}\n"
        )
        formatted_strings.append(formatted_string)

    result = "\n".join(formatted_strings)
    return result


high_eng_prompt = format_summaries(high_engagement_summaries)
low_eng_prompt = format_summaries(low_engagement_summaries)

In [None]:
print(high_eng_prompt)

In [None]:
print(low_eng_prompt)

In [None]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI

prompt_template = """ You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.\
Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,\
come up with new ideas for next videos.\
    
High_Engagement_Videos:
{high_engagement_videos}
    
Low_Engagement_Videos:
{low_engagement_videos}

Given the above High_Engagement_Videos and Low_Engagement_Videos, generate new ideas and themes.
New ideas should be related to the High_Engagement_Videos by keeping the titles and summaries of the episodes in context and must be based on\
the common patterns between the High_Engagement_Videos and the guests in those episodes.\
The new videos should not have any content from Low_Engagement_Videos.\
Make sure to not include any speaker name in your suggested video topics or themes.
Make sure to return at least 10 new ideas. Your response must be a csv file which contains the following columns:\
Topic, Theme, Summary. Summary should contain points to talk on the show and must be 100 words at max. Use | as a seperator\
and do not append any extra line in your csv response. Each row must have proper data and columns in each row must be three.
If you don't know the answer, just say "Hmm, I'm not sure."\
Don't try to make up an answer. 
"""
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.5)
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["high_engagement_videos", "low_engagement_videos"],
)
ideas_chain = LLMChain(llm=llm, prompt=PROMPT, verbose=True)

In [None]:
response = ideas_chain.run(
    {
        "high_engagement_videos": high_eng_prompt,
        "low_engagement_videos": low_eng_prompt,
    }
)

In [9]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI

prompt_template = """ You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.\
Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,\
come up with new ideas for next videos.\
    
High_Engagement_Videos:
{high_engagement_videos}
    
Low_Engagement_Videos:
{low_engagement_videos}

Given the above High_Engagement_Videos and Low_Engagement_Videos, generate new ideas and themes.
New ideas should be related to the High_Engagement_Videos by keeping the titles and summaries of the episodes in context and must be based on\
the common patterns between the High_Engagement_Videos and the guests in those episodes.\
The new videos should not have any content from Low_Engagement_Videos.\
Make sure to not include any speaker name in your suggested video topics or themes.
Make sure to return at least 10 new ideas. Your response must be a csv file which contains the following columns:\
Topic, Theme, Summary. Summary should contain points to talk on the show and must be 100 words at max. Use | as a seperator\
and do not append any extra line in your csv response. Each row must have proper data and columns in each row must be three.
If you don't know the answer, just say "Hmm, I'm not sure."\
Don't try to make up an answer. 
"""
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.5)
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["high_engagement_videos", "low_engagement_videos"],
)
ideas_chain = LLMChain(llm=llm, prompt=PROMPT, verbose=True)

In [10]:
response = ideas_chain.run(
    {
        "high_engagement_videos": high_eng_prompt,
        "low_engagement_videos": low_eng_prompt,
    }
)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,come up with new ideas for next videos.    
High_Engagement_Videos:
Video 1
Title: ChatGPT vs Hollywood writers and the WGA strike with Lon Harris
View Count: 137823
Summary: The summary of the given texts is that there are discussions and suggestions regarding various topics such as the need for stable employment and fair compensation for writers in the entertainment industry, the impact of streaming platforms on the industry, the potential use of AI in writing, the rebranding of HBO Max, the development of AI bots in podcasting, the effects of TikTok on mental health, and concerns about excessive smartphone use. Additionally, there are mentions of specific shows, movies, and products thro


[1m> Finished chain.[0m


In [11]:
response = ideas_chain.run(
    {
        "high_engagement_videos": high_eng_prompt,
        "low_engagement_videos": low_eng_prompt,
    }
)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m You are helpful AI assistant that helps to increase the engagement of youtube videos by analyzing the scripts of old videos.Looking at the given videos below in High_engagement_Videos and Low_engagement_videos sections,come up with new ideas for next videos.    
High_Engagement_Videos:
Video 1
Title: ChatGPT vs Hollywood writers and the WGA strike with Lon Harris
View Count: 137823
Summary: The summary of the given texts is that there are discussions and suggestions regarding various topics such as the need for stable employment and fair compensation for writers in the entertainment industry, the impact of streaming platforms on the industry, the potential use of AI in writing, the rebranding of HBO Max, the development of AI bots in podcasting, the effects of TikTok on mental health, and concerns about excessive smartphone use. Additionally, there are mentions of specific shows, movies, and products thro


[1m> Finished chain.[0m


In [12]:
response

'Topic,Theme,Summary\n1,The Future of AI in Entertainment,Explore the potential impact of AI on the entertainment industry and its various aspects such as writing, music generation, and podcasting. Discuss the challenges and opportunities AI presents for writers, musicians, and podcast hosts. Highlight the benefits and limitations of using AI tools in creative processes. Analyze the implications of AI on employment and job opportunities in the industry. Examine the role of streaming platforms and social media in shaping the future of entertainment. | Discuss the potential of AI in revolutionizing the entertainment industry and its impact on various creative processes. Analyze the challenges and opportunities AI presents for writers, musicians, and podcast hosts. Explore the implications of AI on employment in the industry and the role of streaming platforms and social media in shaping the future of entertainment.\n2,The Ethics of AI,Delve into the ethical considerations surrounding the

In [13]:
import pandas as pd
from io import StringIO

csv_file = StringIO(response)
# Read the CSV data and create a DataFrame
df = pd.read_csv(csv_file, sep="|")