# Knowledge Graph Query Engine

Creating a Knowledge Graph usually involves specialized and complex tasks. However, by utilizing the Llama Index (LLM), the KnowledgeGraphIndex, and the GraphStore, we can facilitate the creation of a relatively effective Knowledge Graph from any data source supported by [Llama Hub](https://llamahub.ai/).

Furthermore, querying a Knowledge Graph often requires domain-specific knowledge related to the storage system, such as Cypher. But, with the assistance of the LLM and the LlamaIndex KnowledgeGraphQueryEngine, this can be accomplished using Natural Language!

In this demonstration, we will guide you through the steps to:

- Extract and Set Up a Knowledge Graph using the Llama Index
- Query a Knowledge Graph using Cypher
- Query a Knowledge Graph using Natural Language

Let's first get ready for basic preparation of Llama Index.

In [1]:
# For OpenAI
import os
import subprocess

openai_api = subprocess.run(['pass', 'show', 'api/tokens/openai'], stdout=subprocess.PIPE)
os.environ["OPENAI_API_KEY"] = openai_api.stdout.decode('utf-8').rstrip('\n')

import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output

from llama_index import (
    KnowledgeGraphIndex,
    LLMPredictor,
    ServiceContext,
    SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
from llama_index.llms import OpenAI

from IPython.display import Markdown, display


# define LLM
# NOTE: at the time of demo, text-davinci-002 did not have rate-limit errors
llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.


In [None]:
# For Azure OpenAI
import os
import json
import openai
from langchain.llms import AzureOpenAI
from langchain.embeddings import OpenAIEmbeddings
from llama_index import LangchainEmbedding
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    KnowledgeGraphIndex,
    LLMPredictor,
    ServiceContext,
)

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
from llama_index.llms import LangChainLLM

import logging
import sys

from IPython.display import Markdown, display

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

openai.api_type = "azure"
openai.api_base = "INSERT AZURE API BASE"
openai.api_version = "2022-12-01"
os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"
openai.api_key = os.getenv("OPENAI_API_KEY")

lc_llm = AzureOpenAI(
    deployment_name="INSERT DEPLOYMENT NAME",
    temperature=0,
    openai_api_version=openai.api_version,
    model_kwargs={
        "api_key": openai.api_key,
        "api_base": openai.api_base,
        "api_type": openai.api_type,
        "api_version": openai.api_version,
    },
)
llm = LangChainLLM(lc_llm)

# You need to deploy your own embedding model as well as your own chat completion model
embedding_llm = LangchainEmbedding(
    OpenAIEmbeddings(
        model="text-embedding-ada-002",
        deployment="INSERT DEPLOYMENT NAME",
        openai_api_key=openai.api_key,
        openai_api_base=openai.api_base,
        openai_api_type=openai.api_type,
        openai_api_version=openai.api_version,
    ),
    embed_batch_size=1,
)

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embedding_llm,
)

## Prepare for NebulaGraph

Before next step to creating the Knowledge Graph, let's ensure we have a running NebulaGraph with defined data schema.

In [2]:
# Create a NebulaGraph (version 3.5.0 or newer) cluster with:
# Option 0 for machines with Docker: `curl -fsSL nebula-up.siwei.io/install.sh | bash`
# Option 1 for Desktop: NebulaGraph Docker Extension https://hub.docker.com/extensions/weygu/nebulagraph-dd-ext

# If not, create it with the following commands from NebulaGraph's console:
# CREATE SPACE llamaindex(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
# :sleep 10;
# USE llamaindex;
# CREATE TAG entity(name string);
# CREATE EDGE relationship(relationship string);
# :sleep 10;
# CREATE TAG INDEX entity_index ON entity(name(256));

%pip install ipython-ngql nebula3-python

os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula"  # default is "nebula"
os.environ[
    "NEBULA_ADDRESS"
] = "10.180.146.77:9669"  # assumed we have NebulaGraph installed locally

space_name = "military"
edge_types, rel_prop_names = ["relationship"], [
    "relationship"
]  # default, could be omit if create from an empty kg
tags = ["entity"]  # default, could be omit if create from an empty kg

Note: you may need to restart the kernel to use updated packages.


Prepare for StorageContext with graph_store as NebulaGraphStore

In [3]:
graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

## (Optional)Build the Knowledge Graph with LlamaIndex

With the help of Llama Index and LLM defined, we could build Knowledge Graph from given documents.

If we have a Knowledge Graph on NebulaGraphStore already, this step could be skipped

### Step 1, load data from Wikipedia for "Guardians of the Galaxy Vol. 3"

In [4]:
# from llama_index import download_loader

# WikipediaReader = download_loader("WikipediaReader")
from llama_index import WikipediaReader
loader = WikipediaReader()

documents = loader.load_data(
    pages=["Shenyang J-16"], auto_suggest=False
)


for document in documents:
    print(document.text)

歼-16（又称J-16，代号“潜龙”）是中国沈阳飞机公司研发的一款4.5代重型多用途战机，由殲-11BS戰鬥機發展而來。該機2012年在網站上首次亮相，並被視為是2000年購入的蘇30自產版。2013年加拿大《汉和防务评论》称，首批24架歼-16战斗机已经下线，被目擊在巴丹吉林沙漠基地測試。俄罗斯《观点报》称，沈阳飞机公司已經向记者展示新型歼-16战斗机，飞机已經涂上中国海军航空兵的标准涂装。2017年7月30日，5机密集编队的歼16机群作为歼击机梯队的一部分参加了庆祝中国人民解放军建军90周年阅兵，央视解说词提到该机型“大幅提升了电子战能力”。


== 概述 ==


=== 研製需求 ===
隨著世界軍事科技的發展，中国人民解放军空军上世紀研發的歼轰-7系列戰轟機，在日益複雜的作戰環境中，空战自卫能力差的缺點顯得越來越突出，难以达到要求的現代多用途作战的要求。因此積極從俄羅斯引進Su-30MKK多用途战斗机。J-16是瀋陽飛機公司以J-11BS和Su-30MKK为蓝本研制的四代半雙座多用途战斗机，裝備自動電子掃描相控陣雷達並具備同時攻擊多個目標並識別目標的能力。J-16於2011年10月17日首飛，性能號稱接近F-15E，無論是載彈量還是火控系統性能都大幅提升。从网上流传的照片可以看出，歼-16全部装备了中国生产的“太行”渦扇-10发动机。。


== 基本性能 ==


=== 全面作戰能力 ===
殲-16是從殲-11B系列上發展而來的第4.5代多用途雙座戰機，殲-16最大特點是具備遠距離超視距攻擊能力和強大的對地、對海打擊能力。殲-16戰鬥機和蘇30MKK一樣採用雙座佈局，為強化武器掛載能力，起落架前輪為雙輪。垂尾頂端與蘇30MKK不同，而是類似於原版蘇27，垂尾頂端切尖。翼尖掛架與殲11B不同，印度曾評價與自家的蘇30MKI相比，新的復合材料機身，發動機推力接近AL-31等，可能更勝一籌，只有航電打平。
该型战机武器研发主要依賴使用蘇-30MKK的經驗，對其不足之處加以升級，新型殲-16將用于加强利用中國生產的反舰导弹打击水面舰只的能力。俄罗斯媒体称与歼轰-7相比较歼-16的机体更大，最大载弹量12吨，可以发射鹰击-62和鹰击-83反舰导弹。J-16装备自动电子扫描相控阵雷达，可与多目标作战。空空作戰方面，央視展示透露一次實際驅離行動中，掛載了「霹雳1

### Step 2, Generate a KnowledgeGraphIndex with NebulaGraph as graph_store

Then, we will create a KnowledgeGraphIndex to enable Graph based RAG, see [here](https://gpt-index.readthedocs.io/en/latest/examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.html) for deails, apart from that, we have a Knowledge Graph up and running for other purposes, too!

In [5]:
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    service_context=service_context,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

(歼-16, 又称, J-16)
(歼-16, 代号, 潜龙)
(歼-16, 是, 中国沈阳飞机公司研发的一款4.5代重型多用途战机)
(歼-16, 由, 殲-11BS戰鬥機發展而來)
(歼-16, 該機2012年在網站上首次亮相)
(歼-16, 並被視為是2000年購入的蘇30自產版)
(歼-16, 2013年加拿大《汉和防务评论》称)
(歼-16, 首批24架歼-16战斗机已经下线)
(歼-16, 被目擊在巴丹吉林沙漠基地測試)
(歼-16, 俄罗斯《观点报》称)
(歼-16, 沈阳飞机公司已經向记者展示新型歼-16战斗机)
(歼-16, 飞机已經涂上中国海军航空兵的标准涂装)
(歼轰-7系列戰轟機, 在日益複雜的作戰環境中, 空战自卫能力差的缺點顯得越來越突出)
(歼轰-7系列戰轟機, 难以达到要求的現代多用途作戰的要求, 因此積極從俄羅斯引進Su-30MKK多用途战斗机)
(J-16, is, fighter jet)
(J-16, is, multirole)
(J-16, is, twin-engine)
(J-16, is, long-range)
(J-16, is, all-weather)
(J-16, is, capable of attacking land and sea targets)
(殲-16戰鬥機, 採用雙座佈局, 強化武器掛載能力)
(殲-16戰鬥機, 為, 蘇30MKK)
(殲-16戰鬥機, 垂尾頂端與蘇30MKK不同, 類似於原版蘇27)
(殲-16戰鬥機, 垂尾頂端, 切尖)
(殲-16戰鬥機, 翼尖掛架與殲11B不同, 印度曾評價與自家的蘇30MKI相比)
(殲-16戰鬥機, 新的復合材料機身, 發動機推力接近AL-31等)
(殲-16戰鬥機, 只有航電打平, 說明)
(殲-16戰鬥機, 武器研发主要依賴使用蘇-30MKK的經驗, 對其不足之處加以升級)
(殲-16戰鬥機, 新型殲-16, 用于加强利用中國生產的反舰导弹打击水面舰只的能力)
(俄罗斯媒体, 称, 歼轰-7)
(歼轰-7, 比较, 歼-16)
(歼-16, 机体, 更大)
(歼-16, 最大载弹量, 12吨)
(歼-16, 可以发射, 鹰击-62)
(歼-16, 可以发射, 鹰击-83)
(歼-16, 装备, 自动电子扫描相控阵雷达)
(歼-16, 可与多

Now we have a Knowledge Graph on NebulaGraph cluster under space named `llamaindex` about the 'Guardians of the Galaxy Vol. 3' movie, let's play with it a little bit.

In [6]:
# install related packages, password is nebula by default
%pip install ipython-ngql networkx pyvis
%load_ext ngql
%ngql --address 10.180.146.77 --port 9669 --user root --password nebula

Note: you may need to restart the kernel to use updated packages.
Connection Pool Created
INFO:nebula3.logger:Get connection to ('10.180.146.77', 9669)


Unnamed: 0,Name
0,llamaindex
1,military


In [7]:
# Query some random Relationships with Cypher
%ngql USE military;
%ngql MATCH ()-[e]->() RETURN e LIMIT 100

INFO:nebula3.logger:Get connection to ('10.180.146.77', 9669)
INFO:nebula3.logger:Get connection to ('10.180.146.77', 9669)


Unnamed: 0,e
0,"(""J-16"")-[:relationship@-6238500775900394739{r..."
1,"(""J-16"")-[:relationship@-4537671700302613492{r..."
2,"(""J-16"")-[:relationship@-1759740491903092871{r..."
3,"(""J-16"")-[:relationship@-1759740491903092871{r..."
4,"(""J-16"")-[:relationship@-1759740491903092871{r..."
...,...
95,"(""歼-16D"")-[:relationship@4762462027576683225{r..."
96,"(""歼-16D"")-[:relationship@4762462027576683225{r..."
97,"(""歼-16D"")-[:relationship@4762462027576683225{r..."
98,"(""歼-16D"")-[:relationship@5796647766990977503{r..."


In [20]:
# draw the result

%ng_draw

<class 'pyvis.network.Network'> |N|=26 |E|=23

## Asking the Knowledge Graph

Finally, let's demo how to Query Knowledge Graph with Natural language!

Here, we will leverage the `KnowledgeGraphQueryEngine`, with `NebulaGraphStore` as the `storage_context.graph_store`.

In [8]:
from llama_index.query_engine import KnowledgeGraphQueryEngine

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore

query_engine = KnowledgeGraphQueryEngine(
    storage_context=storage_context,
    service_context=service_context,
    llm=llm,
    verbose=True,
)

In [19]:
response = query_engine.query(
    "Tell me about J-16?",
)
display(Markdown(f"<b>{response}</b>"))

[33;1m[1;3mGraph Store Query:
```
MATCH (e:`entity`)-[:relationship]->(j:`entity`) WHERE e.`entity`.`name` == 'J-16'
RETURN j.`entity`.`name`;
```
[0m[33;1m[1;3mGraph Store Response:
{'j.entity.name': ['2015', "Nancy Pelosi's Taiwan visit", 'Multi Role Fighter Bomber', 'all-weather', 'capable of attacking land and sea targets', 'fighter bomber', 'fighter jet', 'fighter/bomber', 'long-range', 'multirole', 'twin-engine', 'AirForceWorld.com', 'Taiwan Strait', 'China', '2011-2012', '2017']}
[0m[32;1m[1;3mFinal Response: 
The J-16 is a multirole fighter bomber that is all-weather and capable of attacking land and sea targets. It is a twin-engine fighter jet that is long-range and fighter/bomber. It was first seen in 2011-2012 and has been involved in various events such as Nancy Pelosi's Taiwan visit in 2015 and the AirForceWorld.com in 2017.
[0m

<b>
The J-16 is a multirole fighter bomber that is all-weather and capable of attacking land and sea targets. It is a twin-engine fighter jet that is long-range and fighter/bomber. It was first seen in 2011-2012 and has been involved in various events such as Nancy Pelosi's Taiwan visit in 2015 and the AirForceWorld.com in 2017.</b>

In [15]:
graph_query = query_engine.generate_query(
    "Tell me about J-16?",
)

graph_query = graph_query.replace("WHERE", "\n  WHERE").replace("RETURN", "\nRETURN")

display(
    Markdown(
        f"""
```cypher
{graph_query}
```
"""
    )
)


```cypher
```
MATCH (e:`entity`)-[:relationship]->(j:`entity`) 
  WHERE e.`entity`.`name` == 'J-16'

RETURN j.`entity`.`name`;
```
```


We could see it helps generate the Graph query:

```cypher
MATCH (p:`entity`)-[:relationship]->(e:`entity`) 
  WHERE p.`entity`.`name` == 'Peter Quill' 
RETURN e.`entity`.`name`;
```
And synthese the question based on its result:

```json
{'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}
```

Of course we still could query it, too! And this query engine could be our best Graph Query Language learning bot, then :).

In [16]:
%%ngql 
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'J-16'
RETURN p.`entity`.`name`, e.relationship, m.`entity`.`name`;

INFO:nebula3.logger:Get connection to ('10.180.146.77', 9669)


Unnamed: 0,p.entity.name,e.relationship,m.entity.name
0,J-16,entered service in,2015
1,J-16,dispatched in response to,Nancy Pelosi's Taiwan visit
2,J-16,most frequently used in,Taiwan Strait
3,J-16,first flew in,2011-2012
4,J-16,officially revealed in,2017


And change the query to be rendered

In [17]:
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'J-16'
RETURN p, e, m;

INFO:nebula3.logger:Get connection to ('10.180.146.77', 9669)


Unnamed: 0,p,e,m
0,"(""J-16"" :entity{name: ""J-16""})","(""J-16"")-[:relationship@-6238500775900394739{r...","(""2015"" :entity{name: ""2015""})"
1,"(""J-16"" :entity{name: ""J-16""})","(""J-16"")-[:relationship@-4537671700302613492{r...","(""Nancy Pelosi's Taiwan visit"" :entity{name: ""..."
2,"(""J-16"" :entity{name: ""J-16""})","(""J-16"")-[:relationship@3160896241778998042{re...","(""Taiwan Strait"" :entity{name: ""Taiwan Strait""})"
3,"(""J-16"" :entity{name: ""J-16""})","(""J-16"")-[:relationship@5218579307241849186{re...","(""2011-2012"" :entity{name: ""2011-2012""})"
4,"(""J-16"" :entity{name: ""J-16""})","(""J-16"")-[:relationship@7242383210706219038{re...","(""2017"" :entity{name: ""2017""})"


In [18]:
%ng_draw

<class 'pyvis.network.Network'> |N|=6 |E|=5

The results of this knowledge-fetching query could not be more clear from the renderred graph then.