Knowledge Graphs provide a method for modeling and storing interlinked information in a format that is both human- and machine-understandable. These graphs consist of nodes and edges, representing entities and their relationships. Unlike traditional databases, the inherent expressiveness of graphs allows for richer semantic understanding, while providing the flexibility to accommodate new entity types and relationships without being constrained by a fixed schema.

By combining knowledge graphs with embeddings (vector search), we can leverage multi-hop connectivity and contextual understanding of information to enhance reasoning and explainability in LLMs.

![](https://raw.githubusercontent.com/dcarpintero/generative-ai-101/main/static/knowledge-graphs.png)

In [None]:
!pip install -q neo4j langchain langchain_openai langchain-community python-dotenv --quiet

In [12]:
from dotenv import load_dotenv

load_dotenv()

True

In [14]:
import os
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD"),
)

dataset into graph

In [15]:
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph()

q_load_articles = """
LOAD CSV WITH HEADERS
FROM 'https://raw.githubusercontent.com/dcarpintero/generative-ai-101/main/dataset/synthetic_articles.csv'
AS row
FIELDTERMINATOR ';'
MERGE (a:Article {title:row.Title})
SET a.abstract = row.Abstract,
    a.publication_date = date(row.Publication_Date)
FOREACH (researcher in split(row.Authors, ',') |
    MERGE (p:Researcher {name:trim(researcher)})
    MERGE (p)-[:PUBLISHED]->(a))
FOREACH (topic in [row.Topic] |
    MERGE (t:Topic {name:trim(topic)})
    MERGE (a)-[:IN_TOPIC]->(t))
"""

graph.query(q_load_articles)

[]

In [16]:
graph.refresh_schema()
print(graph.get_schema)

Node properties:
Article {title: STRING, abstract: STRING, publication_date: DATE}
Researcher {name: STRING}
Topic {name: STRING}
Relationship properties:

The relationships:
(:Article)-[:IN_TOPIC]->(:Topic)
(:Researcher)-[:PUBLISHED]->(:Article)


![](https://raw.githubusercontent.com/dcarpintero/generative-ai-101/main/static/kg_sample_00.png)

In [19]:
# making vector index
from langchain_community.vectorstores import Neo4jVector
from langchain_openai import OpenAIEmbeddings

vector_index = Neo4jVector.from_existing_graph(
    OpenAIEmbeddings(api_key=os.getenv("OPENAI_API_KEY")),
    url=os.environ["NEO4J_URI"],
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    index_name="articles",
    node_label="Article",
    text_node_properties=["topic", "title", "abstract"],
    embedding_node_property="embedding",
)

q&a

In [21]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

vector_qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(api_key=os.getenv("OPENAI_API_KEY")),
    chain_type="stuff",
    retriever=vector_index.as_retriever()
)

In [22]:
r = vector_qa.invoke(
    {
        "query": "which articles discuss how AI might affect our daily life? include the article titles and abstracts."
    }
)
print(r["result"])

The articles that discuss how AI might affect our daily life are:

1. **The Impact of AI on Employment: A Comprehensive Study**
   - *Abstract*: This study analyzes the potential effects of AI on various job sectors and suggests policy recommendations to mitigate negative impacts.

2. **The Societal Implications of Advanced AI: A Multidisciplinary Analysis**
   - *Abstract*: Our study brings together experts from various fields to analyze the potential long-term impacts of advanced AI on society, economy, and culture.

Unfortunately, the other articles provided do not directly address how AI might affect our daily life.


Knowledge graphs are excellent for making connections between entities, enabling the extraction of patterns and the discovery of new insights.

This section demonstrates how to implement this process and integrate the results into an LLM pipeline using natural language queries.

In [25]:
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI

graph.refresh_schema()

cypher_chain = GraphCypherQAChain.from_llm(
    cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4o", api_key=os.getenv("OPENAI_API_KEY")),
    qa_llm=ChatOpenAI(temperature=0, model_name="gpt-4o", api_key=os.getenv("OPENAI_API_KEY")),
    graph=graph,
    verbose=True,
    allow_dangerous_requests=True
)

"How many articles has published Emily Chen?"

```md
MATCH (r:Researcher {name: "Emily Chen"})-[:PUBLISHED]->(a:Article)
RETURN COUNT(a) AS numberOfArticles
```

In [26]:
cypher_chain.invoke({"query": "How many articles has published Emily Chen?"})



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (r:Researcher {name: "Emily Chen"})-[:PUBLISHED]->(a:Article)
RETURN COUNT(a) AS numberOfArticles
[0m
Full Context:
[32;1m[1;3m[{'numberOfArticles': 7}][0m

[1m> Finished chain.[0m


{'query': 'How many articles has published Emily Chen?',
 'result': 'Emily Chen has published 7 articles.'}

In [27]:
# the answer should be 'David Johnson'
cypher_chain.invoke({"query": "Which researcher has collaborated with the most peers?"})



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (r:Researcher)-[:PUBLISHED]->(:Article)<-[:PUBLISHED]-(peer:Researcher)
WITH r, COUNT(DISTINCT peer) AS peerCount
RETURN r.name AS researcher, peerCount
ORDER BY peerCount DESC
LIMIT 1
[0m
Full Context:
[32;1m[1;3m[{'researcher': 'David Johnson', 'peerCount': 6}][0m

[1m> Finished chain.[0m


{'query': 'Which researcher has collaborated with the most peers?',
 'result': 'David Johnson has collaborated with the most peers, with a peer count of 6.'}

![](https://raw.githubusercontent.com/dcarpintero/generative-ai-101/main/static/kg_sample_03.png)