# Knowledge-Graph-Based Retrieval Augmented Generation for FSU Search

#### The problem
The search functionality on fsu.edu sites is fragmented and cumbersome. Its functionality reflects none of the recent advancements in natural language processing - specifically, Large Language Models and the ability to intelligently and naturally respond to queries.

![image.png](attachment:c99493ef-b153-4d79-a795-b4dc7d6f59e4.png)

We seek to do two main modifications:
1. combine these searches into one, providing only an option to limit to the current site.
2. Wield the power of Large Language Models to allow for conversational or question/answer capability using direct information from FSU's databases and websites.

It is with these requirements that we present a possible solution:
**Combining the power of knowledge graphs, a powerful way to represent relational data, and Retrieval Augmented Generation (RAG), a technique for informing Language Models with real data, we can make a powerful pairing that allows for users to get information about FSU faculty, staff, departments, buildings, events, classes, and more.**

let's explore how we do this.

#### Making the knowledge graph

In [3]:
%pip install --upgrade --quiet  langchain langchain-community langchain-openai langchain-experimental neo4j wikipedia tiktoken yfiles_jupyter_graphs pandas

Note: you may need to restart the kernel to use updated packages.


In [4]:
from dotenv import load_dotenv
import os

from langchain_community.graphs import Neo4jGraph

# Warning control
import warnings
warnings.filterwarnings("ignore")

##### Reading in Faculty and Staff Data

In [5]:
# Load from environment
load_dotenv('.env', override=True)
NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
# NEO4J_DATABASE = os.getenv('AURA_INSTANCEID')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

##### Initializing the Knowledge Graph

In [6]:
# Connect to the knowledge graph instance using LangChain
kg = Neo4jGraph(
    url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD)

##### Reading in Departments and Services Data

In [7]:
# with open("data/Dept_Address_URL.csv") as depts:
#     for i,dept in enumerate(depts):
#         if i == 0 or i == 1:
#             print(dept)
#         else:
#             pass
        
# with open("data/Fac and Staff Directory.csv") as fsd:
#     for i,fs in enumerate(fsd):
#         if i == 0 or i == 1:
#             print(fs)
#         else:
#             pass
# #Use with open() instead
# # depts = pd.read_csv("data/Dept_Address_URL.csv")
# # fac_staff = pd.read_csv("data/Fac and Staff Directory.csv")

##### Crawling FSU Websites

In [8]:
import ScrapingUtils as SU

##### Updating Knowledge Graph

In [None]:
kg.query("""
  CREATE VECTOR INDEX movie_tagline_embeddings IF NOT EXISTS
  FOR (m:Movie) ON (m.taglineEmbedding) 
  OPTIONS { indexConfig: {
    `vector.dimensions`: 1536,
    `vector.similarity_function`: 'cosine'
  }}"""
)

kg.query("""
  SHOW VECTOR INDEXES
  """
)

#### Retrieval

In [None]:
kg.query("""
    MATCH (movie:Movie) WHERE movie.tagline IS NOT NULL
    WITH movie, genai.vector.encode(
        movie.tagline, 
        "OpenAI", 
        {
          token: $openAiApiKey,
          endpoint: $openAiEndpoint
        }) AS vector
    CALL db.create.setNodeVectorProperty(movie, "taglineEmbedding", vector)
    """, 
    params={"openAiApiKey":OPENAI_API_KEY, "openAiEndpoint": OPENAI_ENDPOINT} )

result = kg.query("""
    MATCH (m:Movie) 
    WHERE m.tagline IS NOT NULL
    RETURN m.tagline, m.taglineEmbedding
    LIMIT 1
    """
)

#### Feeding into a language model

#### Question and answer

In [None]:
question = "What movies are about love?"

In [None]:
kg.query("""
    WITH genai.vector.encode(
        $question, 
        "OpenAI", 
        {
          token: $openAiApiKey,
          endpoint: $openAiEndpoint
        }) AS question_embedding
    CALL db.index.vector.queryNodes(
        'movie_tagline_embeddings', 
        $top_k, 
        question_embedding
        ) YIELD node AS movie, score
    RETURN movie.title, movie.tagline, score
    """, 
    params={"openAiApiKey":OPENAI_API_KEY,
            "openAiEndpoint": OPENAI_ENDPOINT,
            "question": question,
            "top_k": 5
            })