# (Improved) Property Graph RAG in C#

Currently this notebook uses the following resources:
* Azure Open AI
* Neo4j

If there is enough interest, I can add the changes needed to just use OpenAI - so if this is something you'd like, let me know on twitter @haleyjason or open an issue on github.

## Setup

Add the references and using statements used in the rest of the notebook


In [None]:
#r "nuget: Azure.AI.OpenAI, *-*"
#r "nuget: Azure, *-*"
#r "nuget: Azure.Identity, *-*"
#r "nuget: dotenv.net, *-*"
#r "nuget: Microsoft.DotNet.Interactive.AIUtilities, *-*"
#r "nuget: Microsoft.ML.Tokenizers, *-*"
#r "nuget: Microsoft.SemanticKernel.Core, *-*"
#r "nuget: Neo4j.Driver, *-*"

using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;
using dotenv.net;
using Azure.AI.OpenAI;
using Azure;
using Azure.Identity;
using OpenAI.Chat;
using System;
using System.Text.Json;
using System.Text.Json.Serialization;
using System.Text.RegularExpressions;
using System.IO;
using Microsoft.SemanticKernel.Text;
using Microsoft.ML.Tokenizers;
using Neo4j.Driver;


Load the environment variables. **The notebook assumes you have a .env file** with the following contents:

```cmd
AZURE_OPENAI_ENDPOINT="<you azure open ai endpoint>"
AZURE_OPENAI_RESOURCE="<you azure open ai resource name>"
AZURE_OPENAI_API_KEY="<your azure open ai key>"
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT="<name of your embeddings deployment>"
AZURE_OPENAI_CHAT_DEPLOYMENT="<name of your chat deployment>"

NEO4J_URI="neo4j://localhost:7687"
NEO4J_USER="<neo4 user name>"
NEO4J_PASSWORD="<neo4j user password>"
NEO4J_DATABASE="<name of you neo4j database>",
```

> Note: I did my testing using text-embedding-ada-002 for embeddings and gpt-4o for the chat service

In [None]:
DotEnv.Load();

var envVars = DotEnv.Read();

AzureOpenAIClient client = new(new Uri(envVars["AZURE_OPENAI_ENDPOINT"]), 
    new AzureKeyCredential(envVars["AZURE_OPENAI_API_KEY"]));

var embeddings = envVars["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT"];
var llm = envVars["AZURE_OPENAI_CHAT_DEPLOYMENT"];

## Neo4j connection

I've been running this wit Neo4j Desktop. There are other ways to run it. Please check out their [Installation Page](https://neo4j.com/docs/operations-manual/current/installation/) for more information.

Once you get a Neo4j database running, you'll need to make sure the information is saved in the .env file mentioned earlier before running this next step.

In [None]:
IAuthToken token = AuthTokens.Basic(
                envVars["NEO4J_USER"],
                envVars["NEO4J_PASSWORD"]
            );
IDriver driver = GraphDatabase.Driver(envVars["NEO4J_URI"], token);

QueryConfig config = new QueryConfig();

## Ingestion

The ingestion phase is broken up in to the following steps, which should allow for some experimentation with the different steps:
1. define the data structures used in extracting the entities and populating the Neo4j database
2. call the LLM to extract entities
3. process the results into a unique list of entities and their relationships
4. generate the cypher to populate Neo4j
5. populate the Neo4j database
6. create and populate vector and full text indexes

## Declare the data structures and utility methods

In [None]:
public record DocunentMetadata(string id, string source);
public record ChunkMetadata(string id, string name, int sequence, string documentId, string text);
public record TripletRow(string head, string head_type, string relation, string tail, string tail_type);
public class EntityMetadata
{
    public string name { get; set; }
    public string type { get; set; }
    public string id { get; set; }
    public string text { get; set; }
    public Dictionary<string, ChunkMetadata> mentionedInChunks {get; set;} = new Dictionary<string, ChunkMetadata>();
}

public class Utilities
{    
    public static EntityMetadata PopulateEntityMetadata(ChunkMetadata chunkMetadata, TripletRow triplet, EntityMetadata entityMetadata, bool isHead = true)
    {
        entityMetadata.id = Guid.NewGuid().ToString("N");

        if (isHead)
        {
            entityMetadata.name = CreateName(triplet.head);
            entityMetadata.type = triplet.head_type;
            entityMetadata.text = triplet.head;
        }
        else
        {
            entityMetadata.name = CreateName(triplet.tail);
            entityMetadata.type = triplet.tail_type;
            entityMetadata.text = triplet.tail;
        }

        entityMetadata.mentionedInChunks.Add(chunkMetadata.id, chunkMetadata);
        
        return entityMetadata;
    }

    public static string CreateName(string text)
    {
        if (string.IsNullOrEmpty(text))
            return text;

        // Split the text into words
        string[] words = text.Split(new[] { ' ', '-', '_' }, StringSplitOptions.RemoveEmptyEntries);

        StringBuilder nameText = new StringBuilder();
        
        foreach (string word in words)
        {
            // Capitalize the first letter and make the rest lowercase
            var lword = word;
            if (char.IsDigit(word[0]))
            {
                lword = "_" + word;
            }

            nameText.Append(lword.ToLower());
        }
        return Regex.Replace(nameText.ToString(), "[^a-zA-Z0-9_]", "");
    }
    
    public static List<string> SplitPlainTextOnEmptyLine(string[] lines)
    {
        List<string> allLines = new List<string>(lines);
        List<string> result = new List<string>();

        // Make sure there is an empty string as last line to split into paragraph
        var last = allLines.Last();
        if (last.Length > 0)
        {
            allLines.Add("");
        }

        StringBuilder paragraphBuilder = new StringBuilder();
        foreach (string input in allLines)
        {
            if (input.Length == 0)
            {
                result.Add(paragraphBuilder.ToString());
                paragraphBuilder.Clear();
            }
            paragraphBuilder.Append($"{input} ");
        }

        return result;
    }
}

## Entity extraction

This step is where the LLM takes the chunks of text and extracts up to 10 entities per chunk

Steps include:
* Chunk the data into individual summaries (this was changed from the initial notebook)
* Provide some default entities and relation types for the prompt to use in directing the LLM in extracting the entities (extration works best if you customize this to match your data file contents)
* Loop through all the chunks calling the LLM for each chunk **(Warning: this can get expensive - so change the ```paragraphs.Count``` limit to 1 or 2 until you are happy with your results)**
* Parse each JSON result form the LLM calls and keep the ```chunks``` variable for later post processing

In [None]:
ChatClient chatClient = client.GetChatClient(llm);
string fileName = "data/summaries.txt";
string fileText = File.ReadAllText(fileName);

DocunentMetadata documentMetatdata = new (Guid.NewGuid().ToString("N"), fileName);

var simpleLines = File.ReadAllLines(documentMetatdata.source);
var paragraphs = Utilities.SplitPlainTextOnEmptyLine(simpleLines);

string entityTypes = "BLOG_POST,PRESENTATION,EVENT,ORGANIZATION,PERSON,PLACE,TECHNOLOGY,SOFTWARE_SYSTEM,REVIEW,ACTION";
string relationTypes = "WRITTEN_BY,PRESENTED_BY,PART_OF,LOCATED_IN,LIVES_IN,TRAVELED_TO";

Dictionary<ChunkMetadata, List<TripletRow>> chunks = new Dictionary<ChunkMetadata, List<TripletRow>>();
int maxTripletsPerChunk = 20;
string preamble = "The given text document contains blog entry summaries with a Title, Author, Posted On date, Topics and Summary. Make sure to add the WRITTEN_BY relationship for the author.";
for (int i = 0; i < paragraphs.Count; i++)
{
    string text = paragraphs[i];

    ChunkMetadata chunkMetadata = new (Guid.NewGuid().ToString("N"), $"DocumentChunk{i}", i, documentMetatdata.id, text);

	string prompt =  $@"Please extract up to {maxTripletsPerChunk} knowledge triplets from the provied text.
    {{$preamble}}
    Each triplet should be in the form of (head, relation, tail) with their respective types.
    ######################
    ONTOLOGY:
    Entity Types: {entityTypes}
    Relation Types: {relationTypes}
    
    Use these entity types and relation types as a starting point, introduce new types if necessary based on the context.
    
    GUIDELINES:
    - Output in JSON format: [{{""head"": """", ""head_type"": """", ""relation"": """", ""tail"": """", ""tail_type"": """"}}]
    - Use the full form for entities (ie., 'Artificial Intelligence' instead of 'AI')
    - Keep entities and relation names concise (3-5 words max)
    - Break down complex phrases into multiple triplets
    - Ensure the knowledge graph is coherent and easily understandable
    ######################
    EXAMPLE:
    Text: Jason Haley, chief engineer of Jason Haley Consulting, wrote a new blog post titled 'Study Notes: GraphRAG - Property Graphs' about creating a property graph RAG system using Semantic Kernel. 
    Output:
    [{{""head"": ""Jason Haley"", ""head_type"": ""PERSON"", ""relation"": ""WORKS_FOR"", ""tail"": ""Jason Haley Consulting"", ""tail_type"": ""COMPANY""}},
    {{""head"": ""Study Notes: GraphRAG - Property Grids"", ""head_type"": ""BLOG_POST"", ""relation"": ""WRITTEN_BY"", ""tail"": ""Jason Haley"", ""tail_type"": ""PERSON""}},
    {{""head"": ""Study Notes: GraphRAG - Property Grids"", ""head_type"": ""BLOG_POST"", ""relation"": ""TOPIC"", ""tail"": ""Semantic Kernel"", ""tail_type"": ""TECHNOLOGY""}},
    {{""head"": ""property grid RAG system"", ""head_type"": ""SOFTWARE_SYSTEM"", ""relation"": ""USES"", ""tail"": ""Semantic Kernel"", ""tail_type"": ""TECHNOLOGY""}}]
    ######################
    Text: {text}
    ######################
    Output:";

	ChatCompletion completion = chatClient.CompleteChat(
    	[
        	new UserChatMessage(prompt),
    	]);

	Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");
    List<TripletRow> rows =  JsonSerializer.Deserialize<List<TripletRow>>(completion.Content[0].Text.Replace("```json", "").Replace("```","").Replace("'", "").Trim());
    
    chunks.Add(chunkMetadata, rows);
}

Console.WriteLine($"Number of chunks: {chunks.Count}");

Loop through the LLM results and create a dictionary of the entitites. In order to create a relation from each entity to the document chunk it was extracted from we also keep a mentionedInChunk dictionary (this could be a 1 to many relationship).

In [None]:
Dictionary<string,EntityMetadata> entities = new Dictionary<string,EntityMetadata>();

foreach (ChunkMetadata key in chunks.Keys)
{
    List<TripletRow> triplets = chunks[key];
    foreach (var triplet in triplets)
    {
        EntityMetadata entity;
        string pcHead = Utilities.CreateName(triplet.head);
        if (entities.ContainsKey(pcHead)) 
        {
            entity = entities[pcHead];
            if (!entity.mentionedInChunks.ContainsKey(key.id))
            {
                entity.mentionedInChunks.Add(key.id, key);
            }
        }
        else
        {
            entity = new EntityMetadata();   
            entities.Add(pcHead, Utilities.PopulateEntityMetadata(key, triplet, entity, true));
        }      

        string pcTail = Utilities.CreateName(triplet.tail);
        if (entities.ContainsKey(pcTail)) 
        {
            entity = entities[pcTail];
            if (!entity.mentionedInChunks.ContainsKey(key.id))
            {
                entity.mentionedInChunks.Add(key.id, key);
            }
        }
        else
        {
            entity = new EntityMetadata();   
            entities.Add(pcTail, Utilities.PopulateEntityMetadata(key, triplet, entity, false));
        }
    }
}

Console.WriteLine($"Unique entity count: {entities.Count}");

If you want to see the entities and list of which document chunks they were extracted from, you can run the following:

In [None]:
foreach(var key in entities.Keys)
{
    var e = entities[key];
    Console.WriteLine($"{key} Mentioned In {e.mentionedInChunks.Count} chunks");
}

This step is all about generating the cypher to populate the entities extracted by the LLM into Neo4j.

The results of this step look something like this:
```cypher
MERGE (Document1:DOCUMENT { id: '54e9916c99ef4459ae8eabb227a5c341', name:'Document1', type:'DOCUMENT', source: 'data/summaries.txt'})
MERGE (DocumentChunk0:DOCUMENT_CHUNK { id: '8dcf15992ced4c8ba77e9dd6f9372241', name: 'DocumentChunk0', type: 'DOCUMENT_CHUNK', documentId: '54e9916c99ef4459ae8eabb227a5c341', sequence: '0', text: "Title:		(Personal Update) Learning AI
Author:		Jason 
Posted On:	Thursday, January 18, 2024
Topics:		AI, Learning, Azure, Personal Update
Summary:	This is the first of many blog posts I plan to make this year, stay tuned (please subscribe) for more soon. Learning AI Currently I am working my way through the four stages of competence with the topic of AI. This quarter (Q1 of 2024), I’m currently working on moving from stage 2 to stage 3 in the four stages of competence. For reference, those stages are: Unconscious incompetence Conscious incompetence Conscious competence Unconscious competence Last year I moved from stage 1 to stage 2: In the beginning of last year (2023) I had my head buried in the sand while all the other leaders in my industry were actively learning how to use the latest and greatest AI tool (ChatGPT).

Title:		RAG Demo Chronicles Author:		Jason
Posted On:	Wednesday, February 7, 2024
Topics:		AI, Learning, RAG, RAG Demo Series
"})
MERGE (learningai:ENTITY { name: 'learningai', type: 'BLOG_POST', id: '1226ef1f38a04c05b13f1f794089cd3e', text: 'Learning AI'})
MERGE (learningai)-[:MENTIONED_IN]->(DocumentChunk0)
MERGE (jason:ENTITY { name: 'jason', type: 'PERSON', id: '16b00aa6e21e4f30a3ed9577bbf65aba', text: 'Jason'})
MERGE (jason)-[:MENTIONED_IN]->(DocumentChunk0)
...
```

In [None]:

List<string> entityCypherText = new List<string>(); // Document, DocumentChunk and Entity

entityCypherText.Add($"MERGE (Document1:DOCUMENT {{ id: '{documentMetatdata.id}', name:'Document1', type:'DOCUMENT', source: '{documentMetatdata.source}'}})"); 

foreach (var chunk in chunks.Keys)
{
    entityCypherText.Add($"MERGE (DocumentChunk{chunk.sequence}:DOCUMENT_CHUNK {{ id: '{chunk.id}', name: '{chunk.name}', type: 'DOCUMENT_CHUNK', documentId: '{chunk.documentId}', sequence: '{chunk.sequence}', text: \"{chunk.text.Replace("\"", "'")}\"}})");
    entityCypherText.Add($"MERGE (Document1)-[:CONTAINS]->(DocumentChunk{chunk.sequence})");
}

HashSet<string> types = new HashSet<string>();
foreach(var entity in entities.Keys)
{
    var labels = entities[entity];
    var pcEntity = entity;

    // Handle strange issue when type is empty string
    if (string.IsNullOrEmpty(labels.type))
    {
        continue;
    }
    entityCypherText.Add($"MERGE ({pcEntity}:ENTITY {{ name: '{pcEntity}', type: '{labels.type}', id: '{labels.id}', text: '{labels.text}'}})");

    if (!types.Contains(labels.type))
    {
        types.Add(labels.type);
    }

    foreach(var key in labels.mentionedInChunks.Keys)
    {
        var documentChunk = labels.mentionedInChunks[key];
        entityCypherText.Add($"MERGE ({pcEntity})-[:MENTIONED_IN]->(DocumentChunk{documentChunk.sequence})");
    }
}

HashSet<string> relationships = new HashSet<string>();
foreach (ChunkMetadata key in chunks.Keys)
{
    List<TripletRow> triplets = chunks[key];
    foreach (var triplet in triplets)
    {
        var pcHead = Utilities.CreateName(triplet.head);
        var pcTail = Utilities.CreateName(triplet.tail);
        var relationName = triplet.relation.Replace(" ", "_").Replace("-","_");
        if (string.IsNullOrEmpty(relationName))
        {
            relationName = "RELATED_TO";
        }
        entityCypherText.Add($"MERGE ({pcHead})-[:{relationName}]->({pcTail})");

        string headRelationship = $"MERGE (DocumentChunk{key.sequence})-[:MENTIONS]->({pcHead})";
        if (!relationships.Contains(headRelationship))
        {
            relationships.Add(headRelationship);
            entityCypherText.Add(headRelationship);
        }
        
        string tailRelationship = $"MERGE (DocumentChunk{key.sequence})-[:MENTIONS]->({pcTail})";
        if (!relationships.Contains(tailRelationship))
        {
            relationships.Add(tailRelationship);
            entityCypherText.Add(tailRelationship);
        }
    }
}

If you want to see all the cypher youc an run this next block:

In [None]:
foreach(var t in entityCypherText)
{
    Console.WriteLine(t);
}

If you want to see the unique list of entity types you can run this:

In [None]:
foreach(var t in types)
{
    Console.WriteLine(t);
}

## Populate the graph db


If for some reason you need to debug the cypher text being passed to Neo4j, run this next block to see what the contents are. I had to debug some characters and duplicates getting through the logic when testing. I fixed the bugs I found, but there may be more.

In [None]:
Console.WriteLine(entityCypherText.ToArray().Length);

Populate Neo4j with the generated cypher text.

In [None]:

StringBuilder all = new StringBuilder();
all.AppendJoin(Environment.NewLine, entityCypherText.ToArray());
await driver.ExecutableQuery(all.ToString()).WithConfig(config).ExecuteAsync();


I have enabled two plugins to Neo4j: GenAI, which you'll need for some of the following used features.

Create a vector index on the DOCUMENT_CHUNK embedding field:

In [None]:
string createVectorIndex = @"CREATE VECTOR INDEX CHUNK_EMBEDDING IF NOT EXISTS
                            FOR (c:DOCUMENT_CHUNK) ON c.embedding
                            OPTIONS {indexConfig: {
                           `vector.dimensions`: 1536,
                            `vector.similarity_function`: 'cosine'
                            }}";

await driver.ExecutableQuery(createVectorIndex).WithConfig(config).ExecuteAsync();

Populate the Vector index using the DOCUMENT_CHUNK text field:

In [None]:
string populateEmbeddings = $@"
                            MATCH (n:DOCUMENT_CHUNK) WHERE n.text IS NOT NULL
                            WITH n, genai.vector.encode(
                                n.text,
                                'AzureOpenAI',
                                {{
                                    token: $token,
                                    resource: $resource,
                                    deployment: $deployment
                                }}) AS vector
                            CALL db.create.setNodeVectorProperty(n, 'embedding', vector)
                            ";
await driver.ExecutableQuery(populateEmbeddings)
    .WithParameters(new() { 
        {"token", envVars["AZURE_OPENAI_API_KEY"]}, 
        {"resource", envVars["AZURE_OPENAI_RESOURCE"]}, 
        {"deployment", envVars["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT"]}})
    .WithConfig(config)
    .ExecuteAsync();

Add a vector index to the ENTITY:

In [None]:
string createEntityVectorIndex = @"CREATE VECTOR INDEX TEXT_EMBEDDING IF NOT EXISTS
                                    FOR (e:ENTITY) ON e.embedding
                                    OPTIONS {indexConfig: {
                                        `vector.dimensions`: 1536,
                                        `vector.similarity_function`: 'cosine'
                                    }}";

await driver.ExecutableQuery(createEntityVectorIndex).WithConfig(config).ExecuteAsync();

Populate the ENTITITY vector index

In [None]:
string populateEntittyEmbeddings = $@"
                            MATCH (n:ENTITY) WHERE n.text IS NOT NULL
                            WITH n, genai.vector.encode(
                                n.text,
                                'AzureOpenAI',
                                {{
                                    token: $token,
                                    resource: $resource,
                                    deployment: $deployment
                                }}) AS vector
                            CALL db.create.setNodeVectorProperty(n, 'embedding', vector)
                            ";
await driver.ExecutableQuery(populateEntittyEmbeddings)
    .WithParameters(new() { 
        {"token", envVars["AZURE_OPENAI_API_KEY"]}, 
        {"resource", envVars["AZURE_OPENAI_RESOURCE"]}, 
        {"deployment", envVars["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT"]}})
    .WithConfig(config)
    .ExecuteAsync();

Create a full text index on the entity's text field:

In [None]:

string createFulltextIndex = @"CREATE FULLTEXT INDEX ENTITY_TEXT IF NOT EXISTS 
                                FOR (n:ENTITY) ON EACH [n.text]";
await driver.ExecutableQuery(createFulltextIndex).WithConfig(config).ExecuteAsync();

Now if you open the Neo4j Browser for you database and run this command, you should see the entities and relationships:

```cypher
MATCH (n) RETURN (n)
```

![Summaries.txt Entities and Relations](.\images\summaries-entities-relations.jpg)

## Retrieval

Now that we have a graph database populated, we get to decide what sort of retrieval steps we want to include to provide usefal graph data to the RAG workflow.

This notebook uses these steps:
1. Capture the user's input
2. Make a call to the LLM to get a keyword that sums up the user's request (this is a change from the first notebook)
3. (Optionally) do a full text search on entities
4. Do a vector search on the entity text for the keyword extracted in #2
5. Deduplicate the entities found in step 4
6. Do vector similarity search on the document chunks


In [None]:
//string questionText = "what are the blog post titles that are about Semantic Kernel?";
string questionText = "How many blog post did Jason write about Semantic Kernel and what are their titles?";


# Keyword extractor

In [None]:
ChatClient chatClient = client.GetChatClient("chat");

int maxSynonyms = 10;
string prompt = $@"
Given a user question, pick or use 1 to 3 words to create a keyword to capture what the user is asking for'.

QUERY: {questionText}
######################
KEYWORDS:
";
ChatCompletion completion = chatClient.CompleteChat(
    [
        new UserChatMessage(prompt),
    ]);

Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");

Data structure for search results and scores:

In [None]:
public record FulltextResult(string text, double score);

This is just one approach to getting additional information from the graph. 

If you want do do a full text search on the entity text and get the related entities, that could also be done with the following:

In [None]:
var synonyms = completion.Content[0].Text.Split("~");

var uniqueNodes = new HashSet<FulltextResult>();
foreach(var synonym in synonyms)
{
    Console.WriteLine(synonym);
    string cypher = $@"
                        CALL db.index.fulltext.queryNodes(""ENTITY_TEXT"", ""{synonym}"")
                        YIELD node AS e1, score
                        MATCH (e1)-[r]-(e2:ENTITY)
                        RETURN '(' + COALESCE(e1.text,'') + ')-[:' + COALESCE(type(r),'') + ']->(' + COALESCE(e2.text,'') + ')' as triplet, score
                    ";

    var textSearchResult = await driver.ExecutableQuery(cypher)
                    .WithConfig(config)
                    .ExecuteAsync();
    if (textSearchResult.Result.Count() > 0)
    {
        foreach(var r in textSearchResult.Result)
        {
            var tripletText = $"{r["triplet"]}";
            var fullTextResult = new FulltextResult(tripletText, Convert.ToDouble(r["score"]));
            if (!uniqueNodes.Contains(fullTextResult))
        {
            uniqueNodes.Add(fullTextResult);
            Console.WriteLine($"{fullTextResult.text} {fullTextResult.score}");
        }  
        }
    }
}

Console.WriteLine("");
Console.WriteLine($"{uniqueNodes.Count} Unique nodes with matches:");
foreach(var key in uniqueNodes)
{
    Console.WriteLine($"{key}");
}

Perform a vector search for the keyword on the entity text field:

In [None]:
string question = $@"
                    WITH genai.vector.encode(
                            $question,
                            'AzureOpenAI',
                            {{
                                token: $token,
                                resource: $resource,
                                deployment: $deployment
                            }}) AS question_embedding
                        CALL db.index.vector.queryNodes(
                            'TEXT_EMBEDDING',
                            $top_k, 
                            question_embedding
                            ) 
                        YIELD node AS e1, score
                        MATCH (e1)-[r]-(e2:ENTITY)-[r2:MENTIONED_IN]->(dc)
                        RETURN '(' + COALESCE(e1.text,'') + ')-[:' + COALESCE(type(r),'') + ']->(' + COALESCE(e2.text,'') + ')' as triplet, dc.text as t, score
                    ";

var chunkResult = await driver.ExecutableQuery(question)
                .WithParameters(new() { 
                    {"question", questionText},
                    {"token", envVars["AZURE_OPENAI_API_KEY"]}, 
                    {"resource", envVars["AZURE_OPENAI_RESOURCE"]}, 
                    {"deployment", envVars["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT"]},
                    {"top_k", 5}})
                .WithConfig(config)
                .ExecuteAsync();

var uniqueNodes = new HashSet<FulltextResult>();
if (chunkResult.Result.Count() > 0)
{
    foreach(var r in chunkResult.Result)
    {
        var tripletText = $"{r["triplet"]}";
        var fullTextResult = new FulltextResult(tripletText, Convert.ToDouble(r["score"]));
        if (!uniqueNodes.Contains(fullTextResult))
        {
            uniqueNodes.Add(fullTextResult);
            Console.WriteLine($"{fullTextResult.text} {fullTextResult.score}");
        }   
    }
}

In this step, we perform the typical RAG functionality - a vector similarity search on the document chunk text:

In [None]:
string question = $@"
                    WITH genai.vector.encode(
                        $question,
                        'AzureOpenAI',
                        {{
                            token: $token,
                            resource: $resource,
                            deployment: $deployment
                        }}) AS question_embedding
                    CALL db.index.vector.queryNodes(
                        'CHUNK_EMBEDDING',
                        $top_k, 
                        question_embedding
                        ) YIELD node AS chunk, score 
                    RETURN chunk.id, chunk.text, score
                    ";

var chunkResult = await driver.ExecutableQuery(question)
                .WithParameters(new() { 
                    {"question", questionText},
                    {"token", envVars["AZURE_OPENAI_API_KEY"]}, 
                    {"resource", envVars["AZURE_OPENAI_RESOURCE"]}, 
                    {"deployment", envVars["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT"]},
                    {"top_k", 5}})
                .WithConfig(config)
                .ExecuteAsync();

In order to add the chunk results to the LLM request, I serialize the vector search results as a JSON string:

In [None]:
Console.WriteLine(JsonSerializer.Serialize(chunkResult, new JsonSerializerOptions {
             WriteIndented = true
         }));

StringBuilder chunkTexts = new StringBuilder();
foreach(var r in chunkResult.Result)
{
    chunkTexts.AppendLine($"Document: {{ text: {r["chunk.text"].ToString()} }}");
}

Console.WriteLine(chunkTexts.ToString());

## Perform the typical RAG request (no entity or relation information)

In [None]:
ChatClient chatClient = client.GetChatClient("chat");

string context = $@"Unstructured data:
{chunkTexts.ToString()}
";

string prompt = $@"Answer the question based only on the following context:
			    {context}
                ######################
                Question: {questionText}
                ######################
                Answer:";

string sysprompt = @"Be brief in your answers.
                    Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
                    For tabular information return it as an html table. Do not return markdown format. If the question is not in English, answer in the language used in the question.";

ChatCompletion completion = chatClient.CompleteChat(
    [
        new SystemChatMessage(sysprompt),
        new UserChatMessage(prompt),
    ]);

Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");

## Perform the graph RAG request (with entity or relation information)

In [None]:
ChatClient chatClient = client.GetChatClient("chat");

string context = $@"
######################
Structured data:
{string.Join(Environment.NewLine, uniqueNodes.Select(c => c.text).Take(50).ToArray())}
######################
Unstructured data:
{chunkTexts.ToString()}
";

string prompt = $@"
To plan the response, begin by examining the Neo4j entity relations and their structured data to determine if the answer is present within. Follow these steps:

Analyze the provided Neo4j entity relations and their structured data:

Look at the nodes, relationships, and properties in the graph.
Identify the entities and their connections relevant to the question.
Identify relevant information:

Extract data points and relationships that are pertinent to the question.
Consider how these relationships influence the answer.
Synthesize the identified information:

Combine the extracted information logically.
Formulate a coherent and comprehensive response.
Here are some examples to guide the process:

######################
Example:
(Semantic Kernel)-[:TOPIC]->(Blog Post Title 1)
(Semantic Kernel)-[:HAS_TOPIC]->(Blog Post Title 2)
(Semantic Kernel)-[:INCLUDES_TOPIC]->(Blog Post Title 3)

Question:
What blog posts are about Semantic Kernel?

Answer:
Blog Post is about Semantic Kernel
######################
Answer the question based solely on the following context:
{context}

######################
Question: {questionText}
######################
Answer:";

string sysprompt = @"Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
                    For tabular information return it as an html table. Do not return markdown format. If the question is not in English, answer in the language used in the question.";

ChatCompletion completion = chatClient.CompleteChat(
    [
        new SystemChatMessage(sysprompt),
        new UserChatMessage(prompt),
    ]);

Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");

See what the prompt was to the LLM

In [None]:

Console.WriteLine(sysprompt);
Console.WriteLine("######################");
Console.WriteLine(prompt);