# Vector Retriever

You will create a vector retriever using the Neo4j GraphRAG Python package.

You will be able to review how the vector index is used to retrieve similar results and how the context can be used by an LLM to provide a response.

---

Import the required Python modules and load the environment variables.

In [1]:
from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.generation import GraphRAG

# Load environment variables
import os
from dotenv import load_dotenv
load_dotenv()

NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USER = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
NEO4J_DATABASE = os.getenv('NEO4J_DATABASE')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

Create and verify the connection to your Neo4j graph database.

In [2]:
driver = GraphDatabase.driver(
    NEO4J_URI, 
    auth=(
        NEO4J_USER, 
        NEO4J_PASSWORD
    ))
driver.verify_connectivity()

## Initialize LLM and Embedder

Set up the Large Language Model (LLM) and the embedding model you will use in retrieval-augmented generation (RAG) workflows.

- **LLM**: Uses OpenAI's GPT-4o model via the `OpenAILLM` interface.
- **Embedder**: Uses OpenAI's embedding API via the `OpenAIEmbeddings` class.

In [4]:
# --- Initialize LLM and Embedder ---
llm = OpenAILLM(model_name='gpt-4o', api_key=OPENAI_API_KEY)
embedder = OpenAIEmbeddings(api_key=OPENAI_API_KEY)
embedder

<neo4j_graphrag.embeddings.openai.OpenAIEmbeddings at 0x21cf4b9cb10>

## Initialize Vector Retriever

Set up the vector-based retriever for semantic search over your Neo4j knowledge graph.

> Vector search enables semantic retrieval of text chunks from your Neo4j graph.  
> Instead of keyword matching, it finds the most contextually similar passages to your query, even if the wording is different.

In [6]:
# --- Initialize Vector Retriever ---
vector_retriever = VectorRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    return_properties=['text'])

The **VectorRetriever** class:
- Connects to the Neo4j database using the provided `driver`.
- Uses the `chunkEmbeddings` vector index for efficient semantic retrieval.
- The `embedder` generates embeddings for the query.
- Returns the `text` property from matching chunks.

> **Tip:**  
> You can modify the `return_properties` list to include additional properties from the retrieved nodes.

# Simple Vector Search Diagnostic 

You can use the vector retriever to search for semantically similar data.

Test the vector search by retrieving the top 10 most relevant text chunks from the Neo4j knowledge graph for the given query.

In [7]:
# --- Simple Vector Search ---
query = "What are the risks that Apple faces?"
result = vector_retriever.search(query_text=query, top_k=10)
for item in result.items:
    print(f"Score: {item.metadata['score']:.4f}, Content: {item.content[0:100]}..., id: {item.metadata['id']}")

Score: 0.9168, Content: {'text': "the Company to potential liabilities, increased costs, reputational harm, and other\nadver..., id: 4:561e1efb-9f89-4f78-b62a-54811093b9fa:3469
Score: 0.9159, Content: {'text': "the Company to potential liabilities, increased costs, reputational harm, and other\nadver..., id: 4:561e1efb-9f89-4f78-b62a-54811093b9fa:3468
Score: 0.9138, Content: {'text': "relief against the Company, and has\nfrom time to time required, and can in the future req..., id: 4:561e1efb-9f89-4f78-b62a-54811093b9fa:3467
Score: 0.9121, Content: {'text': "relief against the Company, and has\nfrom time to time required, and can in the future req..., id: 4:561e1efb-9f89-4f78-b62a-54811093b9fa:3465
Score: 0.9116, Content: {'text': " in both\nfrequency and sophistication with increasing use by actors of tools and techniqu..., id: 4:561e1efb-9f89-4f78-b62a-54811093b9fa:3464
Score: 0.9114, Content: {'text': "interruptions can harm or disrupt international\ncommerce and the global economy,


**How it works:**  
1. The example `query`, "What are the risks that Apple faces?", is created
2. `vector_retriever.search()` runs the query and returns the top 10 matches based on vector similarity.
3. The results are formatted displaying:
    * The similarity score (`Score`)
    * A snippet of the retrieved content (`Content`)
    * The unique identifier for each chunk (`id`)

This diagnostic helps you verify that the vector search is working and inspect the quality of the top results for your query.

> **Tip:**
> Inspecting the returned results to verify relevance can help you to adjust your chunking or embedding strategy.

## Retrieval-Augmented Generation (RAG) Query

You can use the `GraphRAG` class to create a retrieval-augmented generation (RAG) pipeline.

The `GraphRAG` class combines a Large Language Model (LLM) with a vector-based retriever to answer questions using both semantic search and generative reasoning.

In [8]:
# --- Initialize RAG and Perform Search ---
query = "What are the risks that Apple faces?"
rag = GraphRAG(
    llm=llm,
    retriever=vector_retriever
)
response = rag.search(query)

print(response.answer)

[#F77E]  _: <CONNECTION> error: Failed to read from defunct connection ResolvedIPv4Address(('18.234.199.165', 7687)) (ResolvedIPv4Address(('18.234.199.165', 7687))): ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)
Unable to retrieve routing information
Transaction failed and will be retried in 1.0277796194789013s (Unable to retrieve routing information)


Apple faces several risks, including:

1. **Environmental, Social, and Governance (ESG) Considerations**: Apple is under pressure from various stakeholders to meet ESG goals, including climate change initiatives, diversity, and inclusion. Failing to meet these goals can lead to legal and regulatory proceedings and harm Apple’s reputation and financial condition.

2. **Regulatory and Legal Challenges**: Apple is subject to intense media, political, and regulatory scrutiny. Future changes in legislation related to the App Store, such as compliance with the EU Digital Markets Act by March 2024, may require Apple to alter its business practices, potentially impacting its financial performance.

3. **Antitrust Investigations**: Apple faces antitrust investigations worldwide. Adverse findings could lead to significant fines and required changes to its business practices, particularly related to the App Store, which could negatively impact Apple’s business and financial health.

4. **Complian

- **How it works:**  
  1. The retriever (`vector_retriever`) finds the most relevant text chunks from the Neo4j graph based on the input query.
  2. The LLM (`llm`) uses the retrieved context to generate a natural language answer.
  3. The `rag` pipeline is used to `search`.
  4. The `answer` in the `response` is printed

The `GraphRAG` pipeline provides context-aware, accurate answers grounded in your knowledge graph data.

---

Experiment with the vector retriever by modifying the `query`, for example:
- What products does Microsoft reference?
- What warnings have Nvidia given?
- What companies mention AI in their filings?

[View the complete code](solutions/01_01_vector_retriever.py)

[Move on to the Advanced RAG:Vector Cypher Retriever Notebook](01_02_vector_cypher_retriever.ipynb)