In [None]:
!pip install langchain neo4j openai tiktoken pytube youtube_transcript_api env

In [None]:
import os

from pytube import Playlist
from langchain.text_splitter import TokenTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Neo4jVector
from langchain.document_loaders import YoutubeLoader

from langchain.chat_models import ChatOpenAI
from langchain.memory import ChatMessageHistory, ConversationBufferWindowMemory
from langchain.chains import ConversationalRetrievalChain

## Using LangChain in combination with Neo4j to process YouTube playlists and perform Q&A flow

### Motivation
In a world of lengthy YouTube playlists, traditional learning can feel time-consuming and dull. Our motivation is to transform this process by making it dynamic and engaging. Rather than passively consuming content, I believe that sparking conversations can make learning more enjoyable and efficient.

### Goal
Our goal is to revolutionize how people interact with YouTube playlists. Users will actively engage in dynamic conversations inspired by the playlist content. We'll extract valuable information from video captions, process it, and integrate it into the Neo4j vector database. The conversational chain serving as a guide that leads users through an dialogue rooted in playlist content. My mission is to provide an interactive and personalized educational dialoge, where users actively shape their learning journey.

#### Technologies used
Embarking on the exciting journey of conversational AI requires a firm grasp of the technological foundations, that meet the needs of our mission. For our purpose we use synergy of two cutting-edge technologies: LangChain, an open-source framework simplifying the orchestration of Large Language Models (LLMs), and Neo4j, a robust graph database made for optimal node and relationship traversal.

LangChain, serving as the linchpin in our quest for seamless interaction with LLMs. Its open-source nature enables developers to easily create and use the capabilities of these expansive language models. In our demo application, LangChain acts as the provider of an interface and construction of a conversational chain.

At the heart of this interaction lies Neo4j, a graph database meticulously designed to unravel the complexities of interconnected nodes and relationships. This database isn't used just for storage; it's a dynamic source of truth that we integrate into our conversational framework.

Picture this: a user initiates the conversation with a query, setting in motion a captivating exchange with our Large Language Model. The magic happens as the vector representation of the user's input becomes a beacon for exploration within the Neo4j graph database. The result? A seamless fusion of structured knowledge and natural language understanding, culminating in a response that is not just accurate but deeply connected to the context of the user's inquiry.

Recognizing the importance of user experience, we introduce a conversational memory chain. Imagine a conversation where every question asked and every answer given becomes part of an evolving dialogue. This approach ensures that the interaction remains clear and coherent. By feeding all past questions and answers into the conversational memory chain alongside the latest query, we create a continuous narrative thread. The result? A more engaging, relevant, and user-centric conversation that evolves intelligently with each interaction.

### What will I cover in this tutorial
1. Processing of YouTube playlists; reading captions
2. Splitting each video captions into documents
3. Feeding documents into Neo4j database
4. Constructing conversational retrieval chain
5. Performing queries

### Processing of YouTube playlists
Use `Playlist` package to retrieve all video IDs that are inside the given playlist. For every video, using `YouTubeLoader`, extract caption documents. Feed each document into text splitter. It is important to clear and preprocess the data before feeding it into text splitters. In our case, we ensured that we only considered English captions. The size of each chunk varies and should be set based on the nature of the documents. Smaller chunks, up to 256 tokens, capture information more granularly. Larger chunks provide our LLM with more context based on the information within each document. In our case, I decided to use a chunk size of 512. This decision was made because context is more imporant so we ensure contextual connection over multiple videos.

In [None]:
# Process all videos from the playlist
playlist_url = "https://www.youtube.com/watch?v=1CqZo7nP8yQ&list=PL9Hl4pk2FsvUu4hzyhWed8Avu5nSUXYrb"
playlist = Playlist(playlist_url)
video_ids = [_v.split('v=')[-1] for _v in playlist.video_urls]
print(f"Processing {len(videos)} videos.")

In [None]:
# Setup username, passwords and api keys
# from env import setup_env
# setup_env()

In [None]:
# Read their captions and process it into documents with above defined text splitter
documents = []
for video_id in video_ids:
    try:
      loader = YoutubeLoader(video_id=video_id)
      documents.append(loader.load()[0])
    except: # if there are no english captions
      pass
print(f"Read captions for {len(documents)} videos.")

In [None]:
# Init text splitter with chunk size 512 (https://www.pinecone.io/learn/chunking-strategies/)
text_splitter = TokenTextSplitter.from_tiktoken_encoder(chunk_size=512, chunk_overlap=20)
# Split documents
splitted_documents = text_splitter.split_documents(documents)
print(f"{len(splitted_documents} documents ready to be processed.")

### Feeding documents into Neo4j database
As mentioned earlier, all the documents will be stored inside the Neo4j database. In return, we will obtain a vector index that will later be utilized in conjunction with LangChain. Creating a Neo4j database is fairly straightforward and can be done without any additional knowledge of how the database operates and functions. Since we have already prepared all our documents and split them, we used the `from_documents` function, which accepts a `List[Document]`. To simplify this process even further, we could also use the `from_texts` function. However, in this case, we would lose control over documents. Therefore, I believe that `from_texts` should only be used when we quickly want to demonstrate an application. Setting `search_type` to `hybrid` will allow us to search over keywords and vectors. 

In [None]:
# Contruct vector
neo4j_vector = Neo4jVector.from_documents(
    embedding=OpenAIEmbeddings(),
    documents=splitted_documents,
    url=os.environ['NEO4J_URI'],
    username=os.environ['NEO4J_USERNAME'],
    password=os.environ['NEO4J_PASSWORD'],
    search_type="hybrid"
)

![Graph1](youtube_playlist_1.png "Graph1")

### Constructing conversational retrieval chain
Conversational chain will be used to perform Q&A flow, while leveraging previous user input and LLM outputs. We first construct the memory object. By setting `k` to `3`we are signaling our retrieval chain, to keep the last 3 messages inside the memory. This 3 messages will be passed to LLM, while preforming queries. Modifying this value will enable LLM to have more context during Q&A flow. As a retriever we will use Neo4j vector instance that we generated before. We are also setting max tokens (`max_tokens_limit`) to ensure that we stay below the limit. 

In [None]:
# Prepare Q&A object
chat_mem_history = ChatMessageHistory(session_id="1")
mem = ConversationBufferWindowMemory(k=3, memory_key="chat_history", chat_memory=chat_mem_history, return_messages=True)
q = ConversationalRetrievalChain.from_llm(
    llm=ChatOpenAI(temperature=0.2),
    memory=mem,
    retriever=neo4j_vector.as_retriever(),
    verbose=True,
    max_tokens_limit=4000
)

![Graph2](youtube_playlist_2.png "Graph2")

In [None]:
# Perform Q&A flow - first question
response = q.run('What can you tell me about the GenAI stack?')
response

In [None]:
# Follow up question that requires previous answers (memory)
response = q.run('Who talked about it?')
response