In [None]:
!pip install langchain neo4j openai tiktoken pytube youtube_transcript_api env langchain_community

In [None]:
import os

from pytube import Playlist
from langchain.text_splitter import TokenTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Neo4jVector
from langchain.document_loaders import YoutubeLoader

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain_community.chat_message_histories.neo4j import Neo4jChatMessageHistory
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory

## Using LangChain in combination with Neo4j to process YouTube playlists and perform Q&A flow

### Motivation
In a world of lengthy YouTube playlists, traditional learning can feel time-consuming and dull. Our motivation is to transform this process by making it dynamic and engaging. Rather than passively consuming content, I believe that sparking conversations can make learning more enjoyable and efficient.

### Goal
Our goal is to revolutionise how people interact with YouTube playlists. Users will actively engage in dynamic conversations inspired by the playlist content. We'll extract valuable information from video captions, process it, and integrate it into the Neo4j vector database. The conversational chain serving as a guide that leads users through an dialogue rooted in playlist content. My mission is to provide an interactive and personalised educational dialog, where users actively shape their learning journey.

#### Technologies used
Embarking on the exciting journey of conversational AI requires a firm grasp of the technological foundations, that meet the needs of our mission. For our purpose we use synergy of two cutting-edge technologies: LangChain, an open-source framework simplifying the orchestration of Large Language Models (LLMs), and Neo4j, a robust graph database made for optimal node and relationship traversal.

LangChain, serving as the backbone in our quest for seamless interaction with LLMs. Its open-source nature enables developers to easily create and use the capabilities of these expansive language models. In our demo application, LangChain acts as the provider of an interface and construction of a conversational chain.

At the heart of this interaction lies Neo4j, a graph database designed to unravel the complexities of interconnected nodes and relationships.

Picture this: a user initiates the conversation with a query, setting in motion a captivating exchange with our Large Language Model. The magic happens as the vector representation of the user's input becomes a beacon for exploration within the Neo4j graph database. The result? A seamless fusion of structured knowledge and natural language understanding, culminating in a response that is not just accurate but deeply connected to the context of the user's inquiry.

Recognising the importance of user experience, we introduce a conversational memory chain. Imagine a conversation where every question asked and every answer given becomes part of an evolving dialogue. This approach ensures that the interaction remains clear and coherent. By feeding all past questions and answers into the conversational memory chain alongside the latest query, we create a continuous narrative thread. The result? A more engaging, relevant, and user-centric conversation that evolves intelligently with each interaction.

### What will I cover in this tutorial
1. Processing of YouTube playlists; reading captions
2. Splitting each video captions into documents
3. Feeding documents into Neo4j database
4. Constructing conversational retrieval chain
5. Performing queries

### Processing of YouTube playlists
Use `Playlist` package to retrieve all video IDs that are inside the given playlist. For every video, using `YouTubeLoader`, extract caption documents. Feed each document into text splitter. It is important to clear and preprocess the data before feeding it into text splitters. In our case, we ensured that we only considered English captions. The size of each chunk varies and should be set based on the nature of the documents. Smaller chunks, up to 256 tokens, capture information more granularly. Larger chunks provide our LLM with more context based on the information within each document. In our case, I decided to use a chunk size of 512. This decision was made because context is more imporant so we ensure contextual connection over multiple videos.

In [None]:
# Process all videos from the playlist
playlist_url = "https://www.youtube.com/watch?v=1CqZo7nP8yQ&list=PL9Hl4pk2FsvUu4hzyhWed8Avu5nSUXYrb"
playlist = Playlist(playlist_url)
video_ids = [_v.split('v=')[-1] for _v in playlist.video_urls]
print(f"Processing {len(video_ids)} videos.")

In [None]:
# Read their captions and process it into documents with above defined text splitter
documents = []
for video_id in video_ids:
    try:
      loader = YoutubeLoader(video_id=video_id)
      documents.append(loader.load()[0])
    except: # if there are no english captions
      pass
print(f"Read captions for {len(documents)} videos.")

In [None]:
# Init text splitter with chunk size 512 (https://www.pinecone.io/learn/chunking-strategies/)
text_splitter = TokenTextSplitter.from_tiktoken_encoder(chunk_size=512, chunk_overlap=20)
# Split documents
splitted_documents = text_splitter.split_documents(documents)
print(f"{len(splitted_documents)} documents ready to be processed.")

### Feeding documents into Neo4j database
As mentioned earlier, all the documents will be stored inside the Neo4j database. In return, we will obtain a vector index that will later be utilized in conjunction with [LangChain](https://www.langchain.com/). Creating a [Neo4j](https://neo4j.com/) database is fairly straightforward and can be done without any additional knowledge of how the database operates and functions. Since we have already prepared all our documents and split them, we used the `from_documents` function, which accepts a `List[Document]`. To simplify this process even further, we could also use the `from_texts` function. However, in this case, we would lose control over documents. Therefore, I believe that `from_texts` should only be used when we quickly want to demonstrate an application. 

Setting `search_type` to `hybrid` will allow us to search over keywords and vectors. Hybrid search combines results from both [full text search](https://neo4j.com/docs/cypher-manual/current/indexes-for-full-text-search/) and vector queries which use different functions such as [HNSW](https://www.pinecone.io/learn/series/faiss/hnsw/). To merge the results, a [Reciprocal Rank Fusion (RRF)](https://safjan.com/Rank-fusion-algorithms-from-simple-to-advanced/#:~:text=The%20Reciprocal%20Rank%20Fusion%20(RRF,into%20a%20unified%20result%20set.) algorithm is used. Response at the end provides only one result set, which is determined by RRF algorithm. This combination of vector search with traditional search methods allows for more nuanced and contextually relevant search results, improving the accuracy and depth of insights. This approach is particularly useful in applications such as ours where we have new answer every time.

In [None]:
# Setup env variables
os.environ['OPENAI_API_KEY'] = "OPENAI_API_KEY"
os.environ['NEO4J_URI'] = "NEO4J_URI"
os.environ['NEO4J_USERNAME'] = "NEO4J_USERNAME"
os.environ['NEO4J_PASSWORD'] = "NEO4J_PASSWORD"

# Contruct vector
neo4j_vector = Neo4jVector.from_documents(
    embedding=OpenAIEmbeddings(),
    documents=splitted_documents,
    url=os.environ['NEO4J_URI'],
    username=os.environ['NEO4J_USERNAME'],
    password=os.environ['NEO4J_PASSWORD'],
    search_type="hybrid"
)

![Graph1](youtube_playlist_1.png "Graph1")

### Constructing conversational retrieval chain
A conversational chain will be used to facilitate the Q&A flow. Setting `k` to `3` signals our retrieval chain to retain the last 3 messages in memory. These three messages will be passed to the LLM while performing queries. Adjusting this value will provide the LLM with more context during the Q&A flow. For retrieval, we will use the Neo4j vector instance generated earlier. Additionally, we set the maximum tokens (`max_tokens_limit`) to ensure that we stay within the specified limit.

Understanding conversational retrieval chain is much easier if we split the process of asking a question and getting back the answer into three parts:
1. Use the chat history and new question to create a "standalone question". This is done so that this question can be passed into the retrieval step to fetch relevant documents. If only the new question was passed in, then relevant context may be lacking. If the whole conversation was passed into retrieval, there may be unnecessary information there that would distract from retrieval.
2. This new question is passed to the retriever and relevant documents are returned.
3. The retrieved documents are passed to an LLM along with either the new question (default behavior) or the original question and chat history to generate a final response.

In [None]:
# Prepare Q&A object
chat_mem_history = Neo4jChatMessageHistory(session_id="1")
mem = ConversationBufferWindowMemory(
    k=3,
    memory_key="chat_history", 
    chat_memory=chat_mem_history, 
    return_messages=True
)
q = ConversationalRetrievalChain.from_llm(
    llm=ChatOpenAI(temperature=0.2),
    memory=mem,
    retriever=neo4j_vector.as_retriever(),
    verbose=False,
    max_tokens_limit=4000
)

In [None]:
# Perform Q&A flow - first question
response = q.run('What can you tell me about the GenAI stack?')
response

In [None]:
# Follow up question that requires previous answers (memory)
response = q.run('Who talked about it?')
response

### Graph representation
Neo4j Aura database offers us [workspace](https://neo4j.com/docs/browser-manual/current/deployment-modes/neo4j-aura/), where we can run cypher queries and have a graphical presentation of graph that is being constructed during our interactions. To showcase how this conversation history chain is presented, we can take a look at the following graph.

![Graph3](youtube_playlist_3.png "Graph3")

The above graph was displayed by executing the following cypher query:
```
MATCH p=(n:Session {id: "1"})-[:LAST_MESSAGE]->()<-[:NEXT*0..3]-() 
RETURN p
```

Chat history is represented as two types of nodes; an _ai_ and _human_ node. Each of this nodes are connected with a _NEXT_ connection that forms a chain relationship. Each node has its id and content. There is also our main session node, which has _LAST_MESSAGE_ connection to the last message that was returned by the ai. By following the chain, we can see how questions were asked by the _human_ and how responses were returned by the _ai_. The above graph was constructed when we ran two questions from above.

### Conclusion 
In summary, our mission involved transforming YouTube learning by fostering dynamic conversations. We processed YouTube playlists using the Playlist package to extract video IDs and obtained caption documents through YouTubeLoader. Data preprocessing ensured consideration of only English captions, and text splitters handled document chunks. With a chunk size of 512, we prioritized context, crucial for maintaining connections across multiple videos.

To facilitate a Q&A flow, a conversational chain with a retrieval chain (k set to 3) was employed. This retention of the last 3 messages aided the Language Model (LLM) in contextual understanding during queries. Retrieval leveraged a Neo4j vector instance generated earlier.

The conversational retrieval chain involved creating a "standalone question" from the chat history and new question. This question was then passed to the retriever, which returned relevant documents. Finally, the LLM, given the retrieved documents and either the new or original question with chat history, generated a comprehensive response.

_Disclaimer: This article was written with the help of ChatGPT._