# RAG Project with Knowledge Graph using Neo4j with DeepSeek R1
* Notebook by Adam Lang
* Date: 2/3/2025

# Overview
* In this notebook we will implement a project peforming retrieval augmented generation using a knowledge graph in Neo4j Aura graph database.

# Technical Stack
* LangChain
* Neo4J Aura Graph DB
* GROQ API --> Deepseek R1 LLM
* Hugging Face --> Embeddings

# Most interesting takeaways from using DeepSeek R1
* DeepSeek R1 uses 3 components that make it so efficient:

1. **Chain of Thought**
* If you ask most AI models a tough question, they give you an answer but not the reasoning behind it.
* This is a problem. If the answer is wrong, you dont know where it went off track.
* Chain of Thought fixes this. Instead of spitting out an answer, the model explains its reasoning step by step. If it makes a mistake, you can see exactly where. More importantly, the model itself can see where.
* There are 3 processes within chain of thought that enhance a models reasoning process:
  * 1) **Reasoning Process** - ability to articulate their reasoning step by step.
  * 2) **Error Identification** - models identify mistakes in their reasoning.
  * 3) **Self-Correction** - models correct errors based on identified mistakes in retrieval and generation.


2. **Reinforcement Learning**
3. **Distillation**

* Further reading on this: https://medium.com/@tahirbalarabe2/deepseek-r1-explained-chain-of-thought-reinforcement-learning-and-model-distillation-0eb165d928c9

# Install Dependencies
* Note: using `yfiles jupyter graphs` for visualizing graphs in jupyter notebooks
  * github repo: https://github.com/yWorks/yfiles-jupyter-graphs

In [45]:
%%capture
!pip install --upgrade --quiet langchain langchain-community langchain-openai langchain-experimental langchain-neo4j wikipedia tiktoken yfiles_jupyter_graphs langchain-groq langchain-huggingface sentence-transformers

# Setup Environment Variables - LLM & Neo4j
* Normally you would do this in a `.env` file when building an application. However since this is a notebook experimental project we will do it here.
* First we setup the LLM we will use GROQ API.

In [18]:
## enter password
from getpass import getpass
import os

GROQ_API_KEY = getpass('Enter your GROQ API key: ')

Enter your GROQ API key: ··········


In [19]:
## set openai env variable
os.environ['GROQ_API_KEY'] = GROQ_API_KEY

## Neo4j imports
* We need to setup neo4j instance in Aura DB.
* I am doing this with a free Neo4j graph database.
* Note: if you run this in a `.env` file these are NOT string variables below, you would remove the string quotations.

In [20]:
## init NEO4J env variables
NEO4J_URI="neo4j+s://50e3ee6e.databases.neo4j.io"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="tcO6x4D93NJEsk6-pjZplADrkxixZ2wz_99DqnFmqwY"
#AURA_INSTANCEID="50e3ee6e"
#AURA_INSTANCENAME="Instance01"

In [21]:
## set NEO4J environment variables
os.environ["NEO4J_URI"]=NEO4J_URI
os.environ["NEO4J_USERNAME"]=NEO4J_USERNAME
os.environ["NEO4J_PASSWORD"]=NEO4J_PASSWORD
#NEO4J_DATABASE=os.environ["NEO4J_DATABASE"]
#AURA_INSTANCENAME=os.environ["AURA_INSTANCENAME"]
#AUTH=(NEO4J_USERNAME, NEO4J_PASSWORD)

In [22]:
from langchain_neo4j import Neo4jGraph
from neo4j import GraphDatabase

## need to setup a driver for NEO4J Graph DB
#driver = GraphDatabase.driver(NEO4J_URI, auth=AUTH)


## init graph instance
graph = Neo4jGraph()

# Data
* This is using "dummy" data from Wikipedia. However, this can be done with just about any data source including CSV, TSV, PDF, etc.
* There are different ways of ingesting structured data such as CSVs, briefly here are 2 methods:
1. Neo4j Runway
  * This is a package released by Neo4j that uses OpenAI and allows you to create your own knowlege graph on a CSV.
  * It requires promopting and a human in the loop to iterate on the graph data model.
  * Links:
    * https://a-s-g93.github.io/neo4j-runway/
    * https://medium.com/neo4j/easy-data-ingestion-with-neo4j-runway-and-arrows-app-1e5f121333a7
    * runway github: https://github.com/a-s-g93/neo4j-runway/tree/main

2. WhyHow.AI
  * Thehy have numerous resources, but here is a great blog post to get you started: https://medium.com/enterprise-rag/tables-csvs-to-graphs-whyhow-ai-sdk-9ed6d95f2f8b

3. Rebel
  * Rebel is an open source model that you can utilize via Hugging Face.
  * The idea is that the model has been trained to extract triplets for building knowledge graphs from unstructured data.
  * The triplets extracted are helpful for inserting a graph into a Property Graph DB such as Neo4j. So this is similar to what the Neo4j Runway does but it uses an open source BART-Large encoder-decoder transformer on which the original model was trained.
  * The original training dataset was based on wikipedia abstracts.
  * Rebel resources:
    * 1. REBEL huggingface: https://huggingface.co/Babelscape/rebel-large
    * 2. REBEL github: https://github.com/Babelscape/rebel?tab=readme-ov-file
    * 3. REBEL blogpost: https://medium.com/@sauravjoshi23/building-knowledge-graphs-rebel-llamaindex-and-rebel-llamaindex-8769cf800115



In [7]:
## dummy example using wikipedia data
from langchain.document_loaders import WikipediaLoader

## load documents
raw_documents = WikipediaLoader(query="Elizabeth I").load()



  lis = BeautifulSoup(html).find_all('li')


In [8]:
## view slice of raw docs
raw_documents[:100]

[Document(metadata={'title': 'Elizabeth I', 'summary': 'Elizabeth I (7 September 1533 – 24 March 1603) was Queen of England and Ireland from 17 November 1558 until her death in 1603. She was the last and longest reigning monarch of the House of Tudor. Her eventful reign, and its effect on history and culture, gave name to the Elizabethan era.\nElizabeth was the only surviving child of Henry VIII and his second wife, Anne Boleyn. When Elizabeth was two years old, her parents\' marriage was annulled, her mother was executed, and Elizabeth was declared illegitimate. Henry restored her to the line of succession when she was 10, via the Third Succession Act 1543. After Henry\'s death in 1547, Elizabeth\'s younger half-brother Edward VI ruled until his own death in 1553, bequeathing the crown to a Protestant cousin, Lady Jane Grey, and ignoring the claims of his two half-sisters, the Catholic Mary and the younger Elizabeth, in spite of statutes to the contrary. Edward\'s will was set aside w

In [9]:
len(raw_documents)

23

Summary
* There are 23 documents.

# Document Splitting & Chunking
* First we need to split the documents into chunks.
* We can use the `TokenTextSplitter` from OpenAI, but there are numerous approaches to this depending on your data and use case.
* Since LLMs process text data on a token level it is always important to consider using a token based splitting method (instead of character splitting) but we would need a separate notebook to explain the reasoning.

## TokenTextSplitter
* `TokenTextSplitter` is designed to split text based on the number of tokens, which is particularly useful when working with models that have specific token limits.
* A great overview of various text splitters in LangChain: https://medium.com/@harsh.vardhan7695/mastering-text-splitting-in-langchain-735313216e01

* When to use:
1. Working with token-sensitive models
2. To ensure that chunks fit within model token limits
3. For more precise control over input size for language models

### Choosing the Right Encoding
* Refer to the OpenAI cookbook for more details here: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
* Tiktoken works seamlessly with this tokensplitter as seen here: https://github.com/openai/tiktoken?tab=readme-ov-file
* If you want to dive into token counting read this blogpost: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/
* There is also an open source token counter for other models than OpenAI: https://michaelcurrin.github.io/token-translator/



In [10]:
## text splitting
from langchain.text_splitter import TokenTextSplitter

## init TokenTextSplitter
text_splitter = TokenTextSplitter(encoding_name="cl100k_base", ##
                                  chunk_size=512,
                                  chunk_overlap=100)

## split/chunk documents --> use only 3 of the 23 documents as an example
documents = text_splitter.split_documents(raw_documents[:3])

# Load LLM Model
* We will tryout the `deepseek-r1-distill-llama-70b` model from GROQ.
* What is most interesting about DeepSeek R1 is that it uses 3 really important methods which is what makes it so powerful:
1. **Chain of Thought** - you will see the model will explain its thought process as it answers a query step by step.

2. **Reinforcement learning** - the model trains itself.

3. **Distillation** - shrinking model without losing power.

In [25]:
from langchain_groq import ChatGroq

## init llm
llm = ChatGroq(groq_api_key=GROQ_API_KEY,
               model="deepseek-r1-distill-llama-70b",
              temperature=0,
               )

# Create Graph with `LLMGraphTransformer`
* This is the class from Neo4j that instantly creates a knowledge graph using the LLM.
* Neo4j graph builder github: https://github.com/neo4j-labs/llm-graph-builder

In [26]:
#from langchain_experimental.graph_transformers import LLMGraphTransformer

## init model
llm_graph_transformer = LLMGraphTransformer(llm=llm)

In [27]:
## create graph documents
graph_docs = llm_graph_transformer.convert_to_graph_documents(documents)

In [29]:
## view graph docs
graph_docs

[GraphDocument(nodes=[Node(id='Elizabeth I', type='Person', properties={}), Node(id='Queen Of England', type='Title', properties={}), Node(id='Ireland', type='Country', properties={}), Node(id='House Of Tudor', type='Dynasty', properties={}), Node(id='Henry Viii', type='Person', properties={}), Node(id='Anne Boleyn', type='Person', properties={}), Node(id='Edward Vi', type='Person', properties={}), Node(id='Mary I', type='Person', properties={}), Node(id='Lady Jane Grey', type='Person', properties={}), Node(id='William Cecil', type='Person', properties={}), Node(id='Baron Burghley', type='Title', properties={}), Node(id='English Protestant Church', type='Organization', properties={}), Node(id='Church Of England', type='Organization', properties={}), Node(id='James Vi Of Scotland', type='Person', properties={}), Node(id='Sir Francis Walsingham', type='Person', properties={}), Node(id='Video Et Taceo', type='Motto', properties={})], relationships=[Relationship(source=Node(id='Elizabeth I

In [30]:
## add to neo4j graph db
graph.add_graph_documents(
    graph_docs,
    baseEntityLabel=True,
    include_source=True,
)

In [35]:
# show graph result using cypher query
default_cypher_query = "MATCH (s)-[r:!MENTIONS]->(t) RETURN s,r,t LIMIT 50"

### Widget enabled
* This below is essential to enable custom widgets within the Colab environment, allowing you to display and interact with the graph visualization generated by your code.

In [39]:
#from langchain_neo4j import Neo4jGraph
from neo4j import GraphDatabase
from yfiles_jupyter_graphs import GraphWidget

In [37]:
## widget manager
try:
  import google.colab
  from google.colab import output
  output.enable_custom_widget_manager()
except:
  pass

In [42]:
## show graph
def showGraph(cypher: str=default_cypher_query):
  ## init neo4j driver session
  driver = GraphDatabase.driver(
      uri=os.environ['NEO4J_URI'],
      auth=(os.environ["NEO4J_USERNAME"],
            os.environ["NEO4J_PASSWORD"]))

  ## init session
  session = driver.session()
  widget = GraphWidget(graph=session.run(cypher).graph())
  widget.node_label_mapping = 'id'
  display(widget)
  return widget


In [43]:
## show the graph
showGraph()

GraphWidget(layout=Layout(height='800px', width='100%'))

GraphWidget(layout=Layout(height='800px', width='100%'))

# Retrievers
* We have to create 3 retrieverrs
1. Unstructured Data Retriever (vectors & keywords)
2. Graph Retriever (knowledge graph in neo4j)
3. Final Retriever (brings it all together for input to LLM)


## 1. Unstructured Data Retriever
* Now we create a vector index. This is the same step we do in most RAG applications is creating vector embeddings.
* The purpose of this is to create a hybrid search allowing both:

1. Keyword retrieval
2. Vector embeddings retrieval


In [46]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings ## use hugging face open source embeddings
from sentence_transformers import SentenceTransformer, util

## create embeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [55]:
from langchain_community.vectorstores import Neo4jVector

## create vector index for Neo4j
vector_index = Neo4jVector.from_existing_graph(
    embedding=embeddings, ## your embedding model here
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)

## 2. Graph Retriever
* This uses full text to index the respective nearest neighbors/nodes for the entity/entities that were matched to the query.

In [57]:
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Tuple, List, Optional
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.prompt import PromptTemplate

## pydantic class --> extract entities from text
class Entities(BaseModel):
  """
  Identifies information about the entities."""
  names: List[str] = Field(
      ...,
      description="All the person, organization, or business entities that appear in the text",
  )

## prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        (
        "system",
        "You are extracting organization and person entities from the text.",
        ),
        (
        "human",
        "Use the given format to extract information from the following"
        "input: {question}",
        ),

    ]
)
## create entity chain with LCEL syntax
entity_chain = (prompt
                  |
                llm.with_structured_output(Entities))

### Testing the retriever - can we detect entities?

In [62]:
#from langchain_community.vectorstores.neo4j_vector import remove_lucene_chars

entities_test = entity_chain.invoke({"question": "Lebron James plays in the NBA and lives in Los Angeles, CA. Nike sells him shoes."})
entities_test

Entities(names=['Lebron James', 'NBA', 'Nike'])

Summary
* This is just testing the prompt to the LLM chain it is not testing our neo4j vector store. But, we can see that it extracted the name from the chain which is Lebron James.

### Full index mapping entities to knowledge graph


In [63]:
graph.query(
    "CREATE FULLTEXT INDEX entity IF NOT EXISTS FOR (e:__Entity__) ON EACH [e.id]")

def generate_full_text_query(input: str) -> str:
    """
    Generate a full-text search query for a given input string.

    """
    full_text_query = ""
    words = [el for el in remove_lucene_chars(input).split() if el]
    for word in words[:-1]:
        full_text_query += f" {word}~2 AND"
    full_text_query += f" {words[-1]}~2"
    return full_text_query.strip()


In [68]:
from langchain_neo4j.vectorstores.neo4j_vector import remove_lucene_chars

# Fulltext index query
def structured_retriever(question: str) -> str:
    """
    Collects the neighborhood of entities mentioned
    in the question
    """
    result = ""
    entities = entity_chain.invoke({"question": question})
    for entity in entities.names:
        response = graph.query(
            """CALL db.index.fulltext.queryNodes('entity', $query, {limit:2})
            YIELD node,score
            CALL {
              MATCH (node)-[r:!MENTIONS]->(neighbor)
              RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output
              UNION
              MATCH (node)<-[r:!MENTIONS]-(neighbor)
              RETURN neighbor.id + ' - ' + type(r) + ' -> ' +  node.id AS output
            }
            RETURN output LIMIT 50
            """,
            {"query": generate_full_text_query(entity)},
        )
        result += "\n".join([el['output'] for el in response])
    return result

### Testing the Retriever in the graph

In [69]:
print(structured_retriever("Who is Elizabeth I?"))



Elizabeth I - QUEEN_OF -> Queen Of England
Elizabeth I - QUEEN_OF -> Ireland
Elizabeth I - MEMBER_OF -> House Of Tudor
Elizabeth I - DAUGHTER_OF -> Henry Viii
Elizabeth I - DAUGHTER_OF -> Anne Boleyn
Elizabeth I - SUCCEEDED -> Edward Vi
Elizabeth I - SUCCEEDED -> Mary I
Elizabeth I - SUCCEEDED -> Lady Jane Grey
Elizabeth I - ADVISED_BY -> William Cecil
Elizabeth I - ESTABLISHED -> English Protestant Church
Elizabeth I - SUCCEEDED_BY -> James Vi Of Scotland
Elizabeth I - PROTECTED_BY -> Sir Francis Walsingham
Elizabeth I - MOTTO_OF -> Video Et Taceo
William Cecil - CREATED_AS -> Baron Burghley
English Protestant Church - EVOLVED_INTO -> Church Of England
Elizabeth - MEMBER_OF -> House Of Romanov
Elizabeth - DAUGHTER_OF -> Peter The Great
Elizabeth - DAUGHTER_OF -> Catherine
Elizabeth - MOTTO -> Video Et Taceo
Elizabeth - MINISTER -> Sir Francis Walsingham
Elizabeth - MANOEUVERED_BETWEEN -> France
Elizabeth - MANOEUVERED_BETWEEN -> Spain
Elizabeth - SUPPORTED_MILITARY_CAMPAIGNS_IN -> Ire

## 3. Final Retriever
* This concatenates the 2 previous retrievers from the vector DB and the Graph DB which we then pass to the LLM.

In [70]:
def retriever(question: str):
    print(f"Search query: {question}")
    structured_data = structured_retriever(question)
    unstructured_data = [el.page_content for el in vector_index.similarity_search(question)]
    final_data = f"""Structured data:
{structured_data}
Unstructured data:
{"#Document ". join(unstructured_data)}
    """
    return final_data

# RAG Chain
* now we define the RAG chain

In [71]:
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a question in its original language.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

In [72]:
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

In [76]:
from langchain_core.messages import AIMessage, HumanMessage

## format chat history --- conversational memory
def _format_chat_history(chat_history: List[Tuple[str, str]]) -> List:
  buffer = []
  for huma, ai in chat_history:
    buffer.append(HumanMessage(content=human))
    buffer.append(AIMessage(content=ai))
  return buffer

In [77]:
from langchain_core.runnables import (
    RunnableBranch,
    RunnableLambda,
    RunnableParallel,
    RunnablePassthrough,
)
from langchain_core.output_parsers import StrOutputParser

## Final RAG chain
_search_query = RunnableBranch(
    # if input includes chat_history then we condense it with the follow-up question
    (
        RunnableLambda(lambda x: bool(x.get("chat_history"))).with_config(
            run_name="HasChatHistoryCheck"
        ), # condense follow-up question and chat into standalone_question
        RunnablePassthrough.assign(
            chat_history=lambda x: _format_chat_history(x["chat_history"])
        )
        | CONDENSE_QUESTION_PROMPT
        | ChatGroq(temperature=0)
        | StrOutputParser(),
    ),
    # else we have no chat history so just pass through question
    RunnableLambda(lambda x: x['question']),
)

In [78]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
Use natural language and be concise.
Answer: """
prompt = ChatPromptTemplate.from_template(template)

chain = (
    RunnableParallel(
        {
            "context": _search_query | retriever,
            "question": RunnablePassthrough(),
        }
    )
    | prompt
    | llm
    | StrOutputParser()
)


## Test HYBRID RAG chain

In [82]:
from IPython.display import display, Markdown
result = chain.invoke({"question": "Which house did Elizabeth I belong to?"})

display(Markdown(result))

Search query: Which house did Elizabeth I belong to?




<think>
Okay, so I need to figure out which house Elizabeth I belonged to. Let me start by looking at the structured data provided. I see that under "Elizabeth I," there's a relation "MEMBER_OF -> House Of Tudor." That seems pretty straightforward. 

Looking at the unstructured text, it mentions that Elizabeth I was the last and longest reigning monarch of the House of Tudor. That confirms the information from the structured data. 

I don't see any conflicting information elsewhere. The other mentions of "House Of Romanov" are under a different Elizabeth, so that's probably referring to another historical figure, maybe Elizabeth Petrovna of Russia. 

So, putting it all together, Elizabeth I was a member of the House of Tudor.
</think>

Elizabeth I belonged to the House of Tudor.

In [83]:
result = chain.invoke({"question": "Where was she born?"})


display(Markdown(result))

Search query: Where was she born?




<think>
Okay, so I need to figure out where "she" was born based on the given context. Let me look through the documents provided.

First, I see that there are multiple documents, each talking about a different Elizabeth. The first document mentions Elizabeth, Empress of Russia, who was born at Kolomenskoye near Moscow. The second document also talks about an Elizabeth, but the context seems to shift to Elizabeth I of England, who was born at Greenwich Palace.

Wait, the question is about where "she" was born, but the context includes two different Elizabeths. I need to determine which one the question is referring to. The question doesn't specify, but looking at the structure, the user provided three documents. The first two are about Elizabeth of Russia, and the third is about Elizabeth I of England.

In the first document, under "Early life," it says Elizabeth was born at Kolomenskoye near Moscow. The second document also starts with Elizabeth Petrovna, Empress of Russia, born on December 18, 1709, at Kolomenskoye. The third document is about Elizabeth I of England, born at Greenwich Palace.

So, the question is asking about the birthplace of "she." Since the first two documents are about the Russian Empress and the third about the English queen, I need to see which one is being referred to. The question might be ambiguous, but perhaps the user is asking about the first Elizabeth mentioned, which is the Russian one.

Alternatively, maybe the user is referring to the most recent document, which is about Elizabeth I. But the way the question is phrased, it's a bit unclear. However, in the context provided, the first two documents are about the Russian Empress, and the third is a separate document about Elizabeth I.

Wait, looking again, the user provided three separate documents. The first two are about the Russian Elizabeth, and the third is about Elizabeth I. So, the question is likely referring to one of them. Since the user's question is in the same context, perhaps they are asking about the Russian Elizabeth.

But to be thorough, I should check both. The Russian Elizabeth was born at Kolomenskoye near Moscow, and Elizabeth I was born at Greenwich Palace. Since the question is about "she" without context, I might need to consider which one is more likely. However, the user might be referring to the first document they provided, which is about the Russian Empress.

Alternatively, perhaps the user is asking about the last document, which is about Elizabeth I. But the question is a bit ambiguous. To be safe, I should consider both possibilities.

Wait, the user's question is: "Where was she born?" without specifying which Elizabeth. So, I need to determine based on the context which one is being referred to. Since the first two documents are about the Russian Empress and the third is about the English queen, perhaps the user is asking about the Russian one, as the first document is more detailed about her birthplace.

But I'm not entirely sure. Maybe I should look for the most recent document. The third document is about Elizabeth I, born at Greenwich Palace. So, perhaps the user is referring to that one.

Wait, the user provided three separate documents, each starting with "text:". The first two are about the Russian Empress, and the third is about Elizabeth I. So, the question is likely referring to the last document, which is about Elizabeth I, born at Greenwich Palace.

But I'm a bit confused because the first two documents are about the Russian Elizabeth. Maybe the user is asking about the first one. Alternatively, perhaps the user is referring to the most recent document, which is the third one about Elizabeth I.

I think the safest approach is to check each document for the birthplace. The first document says she was born at Kolomenskoye near Moscow. The third document says she was born at Greenwich Palace. Since the question is about "she" without context, I might need to consider which one is more likely the subject.

Alternatively, perhaps the user is asking about the first document, which is about the Russian Empress. So, the answer would be Kolomenskoye near Moscow.

But I'm not entirely sure. Maybe I should look for the most detailed information. The first document provides the birthplace as Kolomenskoye, while the third document says Greenwich Palace.

Wait, the user's question is in the context of the provided documents, so I should answer based on the information given. Since the first two documents are about the Russian Empress, and the third is about Elizabeth I, perhaps the user is asking about the Russian one.

Alternatively, maybe the user is referring to the last document they provided, which is about Elizabeth I. So, the answer would be Greenwich Palace.

I think I need to make a decision. Since the first two documents are about the Russian Empress and the third is about Elizabeth I, and the question is about "she," perhaps the user is referring to the first one. So, the answer is Kolomenskoye near Moscow.

But I'm not entirely certain. Alternatively, the user might be referring to the last document, so the answer would be Greenwich Palace.

Wait, looking at the structure, the user provided three separate documents, each starting with "text:". The first two are about the Russian Empress, and the third is about Elizabeth I. So, the question is likely referring to the last document, which is about Elizabeth I, born at Greenwich Palace.

Therefore, the answer is Greenwich Palace.
</think>

Greenwich Palace.

In [85]:
result = chain.invoke({"question": "Who is her first cousin?"})


display(Markdown(result))

Search query: Who is her first cousin?




<think>
Okay, so I need to figure out who Elizabeth's first cousin is based on the provided context. Let me start by reading through the text carefully.

First, I see there are two Elizabeths mentioned: Elizabeth I of England and Elizabeth (Elizaveta Petrovna) of Russia. The question is about her first cousin, so I need to determine which Elizabeth the question refers to. Since the question is singular and the context includes both, I'll have to check both sections.

Looking at Elizabeth I's section, she was the daughter of Henry VIII and Anne Boleyn. Her siblings included Edward VI and Mary I. Her successors included James VI of Scotland, who was her first cousin twice removed. Wait, that's a bit more distant than a first cousin. So maybe the first cousin is someone else.

In the Russian Elizabeth's section, she was the daughter of Peter the Great and Catherine. Her siblings included Anna, and she had a nephew named Peter III, whom she declared as her heir. She also had a cousin named Ivan VI, who was briefly ruler before she took over. So Ivan VI is her cousin, but is he her first cousin?

Wait, first cousins share the same grandparents. Elizabeth's parents were Peter the Great and Catherine. Ivan VI's parents would be Ivan V (Peter's brother) and Catherine, but I'm not sure. Alternatively, since Elizabeth's father was Peter the Great, and Ivan VI's father was Ivan V, who was Peter's half-brother, that would make Ivan VI Elizabeth's nephew, not first cousin. Hmm, maybe I'm getting this wrong.

Wait, in the text, it says after Anna's rule, Ivan VI was her infant great-nephew, so that's a different generation. So Elizabeth's first cousin would be someone whose parents are her father's siblings. Since Peter the Great had a son Alexei, who was Elizabeth's half-brother, Alexei's son would be Peter III, who is Elizabeth's nephew, not cousin.

Wait, maybe I'm overcomplicating. Let me check the text again. In the Russian Elizabeth's section, it says she declared her nephew, the future Peter III, her heir. So Peter III is her nephew, not cousin. Then, who is her first cousin? It mentions her cousin Anna, who ruled from 1730–1740. Anna is her first cousin because their parents were siblings. Peter the Great and his sibling would have had children, making Anna and Elizabeth first cousins.

Wait, the text says: "then to her nephew Peter II, who died in 1730 and was succeeded by Elizabeth's first cousin Anna (r. 1730–1740)." So Anna is Elizabeth's first cousin. Therefore, Anna is her first cousin.

But wait, the question is about "her first cousin." If the question is about Elizabeth I, then her first cousin would be someone else. But in the context, the Russian Elizabeth's first cousin is Anna. However, the question might be referring to Elizabeth I, whose first cousin is mentioned as James VI of Scotland, but he's her first cousin twice removed, not first cousin.

Wait, the question is a bit ambiguous because there are two Elizabeths. But looking back, the question is: "Who is her first cousin?" without specifying which Elizabeth. However, in the context provided, the Russian Elizabeth's first cousin is Anna, while Elizabeth I's first cousin is Lady Jane Grey, but she was a cousin, not necessarily first cousin. Wait, no, Lady Jane Grey was a cousin, but perhaps more distantly related.

Wait, in the first text about Elizabeth I, it says she was succeeded by her first cousin twice removed, James VI of Scotland. So her first cousin would be someone else. But in the text, it's mentioned that Edward VI bequeathed the crown to Lady Jane Grey, ignoring Mary and Elizabeth. So Lady Jane Grey was a cousin, but perhaps a first cousin once removed or something.

Wait, perhaps I'm overcomplicating. The question is about "her first cousin." Since the context includes both Elizabeths, but the answer is more likely referring to the Russian Elizabeth because the second document ends with her, but I'm not sure.

Alternatively, perhaps the answer is that her first cousin is James VI of Scotland for Elizabeth I, but he's a first cousin twice removed. So maybe the answer is that she doesn't have a first cousin mentioned, but the closest is Anna for the Russian Elizabeth.

Wait, the question is based only on the provided context, so I need to check both sections.

In the first document about Elizabeth I, her first cousin twice removed is James VI. Her first cousins would be the children of her aunts and uncles. Since her father was Henry VIII, his siblings' children would be her first cousins. But the text doesn't mention them specifically.

In the second document about the Russian Elizabeth, her first cousin is Anna, who ruled from 1730–1740. So the answer would be Anna.

But the question is about "her," so I need to determine which Elizabeth the question refers to. Since the question is singular and the context includes both, but the last part of the context is about the Russian Elizabeth, perhaps the answer refers to her.

Alternatively, the question might be about Elizabeth I, but her first cousin isn't clearly mentioned. So perhaps the answer is Anna.

Wait, but the question is: "Who is her first cousin?" So I need to see in the context who is referred to as her first cousin.

Looking back, in the Russian Elizabeth's text, it says: "then to her nephew Peter II, who died in 1730 and was succeeded by Elizabeth's first cousin Anna (r. 1730–1740)." So Anna is her first cousin.

Therefore, the answer is Anna.
</think>

Anna