Here we should give the model some docs, then let it find out the topics

In [1]:
!pip install openai langchain 

Collecting openai
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
     ---------------------------------------- 0.0/76.5 kB ? eta -:--:--
     ----- ---------------------------------- 10.2/76.5 kB ? eta -:--:--
     --------------- ---------------------- 30.7/76.5 kB 660.6 kB/s eta 0:00:01
     --------------- ---------------------- 30.7/76.5 kB 660.6 kB/s eta 0:00:01
     ------------------------------ ------- 61.4/76.5 kB 328.2 kB/s eta 0:00:01
     -------------------------------------- 76.5/76.5 kB 385.5 kB/s eta 0:00:00
Installing collected packages: openai
Successfully installed openai-0.28.0



[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
# LangChain basics
from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain



# Langchain Loaders:
from langchain.document_loaders import YoutubeLoader

# Vector Store and retrievals
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
#import pinecone

# Chat Prompt templates for dynamic values
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate
)

# Supporting libraries
import os
from dotenv import load_dotenv

load_dotenv()

True

In [11]:

llm3 = ChatOpenAI(temperature=0,
                  model_name="gpt-3.5-turbo-0613",
                  request_timeout = 180
                )

#llm3= OpenAI(model_name="gpt-3.5-turbo-0613")

In [9]:
!pip install youtube-transcript-api

Collecting youtube-transcript-api
  Using cached youtube_transcript_api-0.6.1-py3-none-any.whl (24 kB)
Installing collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-0.6.1



[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [14]:
!pip install tiktoken

Collecting tiktoken
  Using cached tiktoken-0.4.0-cp310-cp310-win_amd64.whl (635 kB)
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0



[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [15]:
youtube_loader = YoutubeLoader.from_youtube_url("https://youtu.be/5p248yoa3oE?si=TATgA2GMtcQ_MjEA")
transcript = youtube_loader.load()

text_splitter = RecursiveCharacterTextSplitter(separators=["\n", " "], chunk_size=10000, chunk_overlap=2200)
docs = text_splitter.split_documents(transcript)
print (f"You have {len(docs)} docs. First doc is {llm3.get_num_tokens(docs[0].page_content)} tokens")


You have 2 docs. First doc is 1956 tokens


In [22]:

# % START OF EXAMPLES
# - Sam's Elisabeth Murdoch Story: Sam got a call from Elizabeth Murdoch when he had just launched The Hustle. She wanted to generate video content.
# - Shaan's Rupert Murdoch Story: When Shaan was running Blab he was invited to an event organized by Rupert Murdoch during CES in Las Vegas.
# - Revenge Against The Spam Calls: A couple of businesses focused on protecting consumers: RoboCall, TrueCaller, DoNotPay, FitIt
# - Wildcard CEOs vs. Prudent CEOs: However, Munger likes to surround himself with prudent CEO's and says he would never hire Musk.
# - Chess Business: Priyav, a college student, expressed his doubts on the MFM Facebook group about his Chess training business, mychesstutor.com, making $12.5K MRR with 90 enrolled.
# - Restaurant Refiller: An MFM Facebook group member commented on how they pay AirMark $1,000/month for toilet paper and toilet cover refills for their restaurant. Shaan sees an opportunity here for anyone wanting to compete against AirMark.
# - Collecting: Shaan shared an idea to build a mobile only marketplace for a collectors' category; similar to what StockX does for premium sneakers.
# % END OF EXAMPLES

template="""
You are a helpful assistant that helps retrieve topics talked about in a youtube video transcript
- Your goal is to extract the topic names and brief 1-sentence description of the topic
- Topics include:
- AI news
- GPT Models
- Google Models
- LLMs
- llama Models
- AI tutorials
- OpenAI
- AI for business 
- AI for education
- AI for medicine 
- AI for art and music
- Deep Learning
- NLP
- Machine Learning
- Data science
- Opportunities in AI
- AI frameworks
- Langchain

- Provide a brief description of the topics after the topic name. Example: 'Topic: Brief Description'
- Use the same words and terminology that is said in the youtube video
- Ignore topics on policy and regulations.
- Do not respond with anything outside of the podcast. If you don't see any topics, say, 'No Topics'
- Do not respond with numbers, just bullet points
- Only pull topics from the transcript. Do not use the examples
- Make your titles descriptive but concise. Example: 'Shaan's Experience at Twitch' should be 'Shaan's Interesting Projects At Twitch'
- A topic should be substantial, more than just a one-off comment

"""
system_message_prompt_map = SystemMessagePromptTemplate.from_template(template)

human_template="Transcript: {text}" # Simply just pass the text as a human message
human_message_prompt_map = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt_map = ChatPromptTemplate.from_messages(messages=[system_message_prompt_map, human_message_prompt_map])

In [23]:
# % START OF EXAMPLES
# - Sam's Elisabeth Murdoch Story: Sam got a call from Elizabeth Murdoch when he had just launched The Hustle. She wanted to generate video content.
# - Shaan's Rupert Murdoch Story: When Shaan was running Blab he was invited to an event organized by Rupert Murdoch during CES in Las Vegas.
# % END OF EXAMPLES

template="""
You are a helpful assistant that helps retrieve topics talked about in a podcast transcript
- You will be given a series of bullet topics of topics vound
- Your goal is to exract the topic names and brief 1-sentence description of the topic
- Deduplicate any bullet points you see
- Only pull topics from the transcript. Do not use the examples.
"""
system_message_prompt_map = SystemMessagePromptTemplate.from_template(template)

human_template="Transcript: {text}" # Simply just pass the text as a human message
human_message_prompt_map = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt_combine = ChatPromptTemplate.from_messages(messages=[system_message_prompt_map, human_message_prompt_map])



In [24]:
chain = load_summarize_chain(llm3,
                             chain_type="map_reduce",
                             map_prompt=chat_prompt_map,
                             combine_prompt=chat_prompt_combine,
                              verbose=True
                            )

In [25]:
topics_found = chain.run({"input_documents": docs})



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
You are a helpful assistant that helps retrieve topics talked about in a youtube video transcript
- Your goal is to extract the topic names and brief 1-sentence description of the topic
- Topics include:
- AI news
- GPT Models
- Google Models
- LLMs
- llama Models
- AI tutorials
- OpenAI
- AI for business 
- AI for education
- AI for medicine 
- AI for art and music
- Deep Learning
- NLP
- Machine Learning
- Data science
- Opportunities in AI
- AI frameworks
- Langchain

- Provide a brief description of the topics after the topic name. Example: 'Topic: Brief Description'
- Use the same words and terminology that is said in the youtube video
- Ignore topics on policy and regulations.
- Do not respond with anything outside of the podcast. If you don't see any topics, say, 'No Topics'
- Do not respond with numbers, just bullet points
- Only pu

In [26]:
print(topics_found)

Topics:
1. AI governance: The need for global governance and collaboration to address the challenges and threats posed by AI.
2. Implications of AI: The potential impact of AI on society, economy, democracy, and politics.
3. AI applications: The use of AI in various fields such as medicine, government, and personalized services.
4. AI advancements: The current state of AI technology and its potential for further development.
5. Policy and regulation: The need for proactive policy-making and regulation to address the ethical and societal implications of AI.
6. AI and job displacement: The concern over the automation of jobs and the potential loss of employment due to AI.
7. AI and misinformation: The risks associated with AI-generated propaganda and fake news.
8. AI and privacy: The need to address privacy concerns and protect personal data in the age of AI.
9. AI and social impact: The potential positive and negative effects of AI on society and communities.
10. AI and education: The r

In [31]:
!pip install kor

Collecting kor
  Downloading kor-0.13.0-py3-none-any.whl (29 kB)
Collecting langchain>=0.0.205
  Downloading langchain-0.0.283-py3-none-any.whl (1.6 MB)
     ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
      --------------------------------------- 0.0/1.6 MB 653.6 kB/s eta 0:00:03
     - -------------------------------------- 0.1/1.6 MB 762.6 kB/s eta 0:00:03
     -- ------------------------------------- 0.1/1.6 MB 744.7 kB/s eta 0:00:03
     -- ------------------------------------- 0.1/1.6 MB 552.2 kB/s eta 0:00:03
     --- ------------------------------------ 0.1/1.6 MB 566.5 kB/s eta 0:00:03
     ---- ----------------------------------- 0.2/1.6 MB 551.6 kB/s eta 0:00:03
     ---- ----------------------------------- 0.2/1.6 MB 588.9 kB/s eta 0:00:03
     ----- ---------------------------------- 0.2/1.6 MB 597.3 kB/s eta 0:00:03
     ------ --------------------------------- 0.3/1.6 MB 582.4 kB


[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [12]:

from kor.extraction import create_extraction_chain
from kor.nodes import Object, Text, Number


In [13]:
schema = {
    "properties": {
        # The title of the topic
        "topic_name": {
            "type": "string",
            "description" : "The title of the topic listed"
        },
        # The description
        "description": {
            "type": "string",
            "description" : "The description of the topic listed"
        },
        "tag": {
            "type": "string",
            "description" : "The type of content being described",
            "enum" : ['AI Models & LLMs', 'AI job opportunities', 'AI frameworks', 'Deep Learning', 'Machine Learning', 'Data science']
        }
    },
    "required": ["topic", "description"],
}

In [None]:
schema = Object(
    id="person",
    description={
            "type": "string",
            "description" : "The description of the topic listed"
        },
    
    attributes=[
        Text(
            id="first_name",
            description="The first name of a person.",
        )
    ],
    many=True,
)

In [15]:
#chain = create_extraction_chain(schema, llm3)
chain = create_extraction_chain(llm3, schema)

ValueError: node must be an Object got <class 'langchain.chat_models.openai.ChatOpenAI'>

In [None]:
topics_structured = chain.run(topics_found)

In [None]:
topics_structured