# SAP HANA Cloud Knowledge Graph Engine

>[SAP HANA Cloud Knowledge Graph](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-knowledge-graph-guide/sap-hana-cloud-sap-hana-database-knowledge-graph-engine-guide) is a fully integrated knowledge graph solution within the `SAP HANA Cloud` database.
>
>This example demonstrates how to build a QA (Question-Answering) chain that queries [Resource Description Framework (RDF)](https://en.wikipedia.org/wiki/Resource_Description_Framework) data stored in an `SAP HANA Cloud` instance using the `SPARQL` query language, and returns a human-readable response.
>
>[SPARQL](https://en.wikipedia.org/wiki/SPARQL) is the standard query language for querying `RDF` graphs.


## Setup & Installation

**Prerequisite**:  
You must have an SAP HANA Cloud instance with the **triple store** feature enabled.  
For detailed instructions, refer to: [Enable Triple Store](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-knowledge-graph-guide/enable-triple-store/)<br />
Load the `kgdocu_movies` example data. See [Knowledge Graph Example](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-knowledge-graph-guide/knowledge-graph-example).

To use SAP HANA Knowledge Graph Engine and/or Vector Store Engine with LangChain, install the `langchain-hana` package:

In [None]:
import langchain_hana
print(langchain_hana.__version__)

First, create a connection to your SAP HANA Cloud instance.

In [None]:
import os

from dotenv import load_dotenv
from hdbcli import dbapi

# Load environment variables if needed
load_dotenv()

# Establish connection to SAP HANA Cloud
connection = dbapi.connect(
    address=os.environ.get("HANADB_URL"),
    port=os.environ.get("HANADB_PRT"),
    user=os.environ.get("HANADB_USR"),
    password=os.environ.get("HANADB_PWD"),
    autocommit=True,
    sslValidateCertificate=False,
)

## Initialize the `HanaRdfGraph`

You need a `HanaRdfGraph` instance that:

1. Loads your ontology schema (in Turtle)  
2. Executes SPARQL queries against your SAP HANA Cloud data graph  

The constructor requires:

- **`connection`**: an active `hdbcli.dbapi.connect(...)` instance  
- **`graph_uri`**: the named graph (or `"DEFAULT"`) where your RDF data lives  
- **One of**:  
  1. `ontology_query`**: a SPARQL CONSTRUCT to extract schema triples  
  2. `ontology_uri`**: a hosted ontology graph URI  
  3. `ontology_local_file`** + **`ontology_local_file_format`**: a local Turtle/RDF file  
  4. `auto_extract_ontology=True`** (not recommended for production—see note)

`graph_uri` vs. Ontology
- **`graph_uri`**:  
  The named graph in your SAP HANA Cloud instance that contains your instance data (sometimes 100k+ triples).
  If `None` or `"DEFAULT"` is provided, the default graph is used.  
  ➔ More details: [Default Graph and Named Graphs](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-knowledge-graph-guide/default-graph-and-named-graphs)
- **Ontology**: a lean schema (typically ~50-100 triples) describing classes, properties, domains, ranges, labels, comments, and subclass relationships. The ontology guides SPARQL generation and result interpretation.


## Example: Question Answering over a “Movies” Knowledge Graph

Below we’ll:

1. Instantiate the `HanaRdfGraph` pointing at our “movies” data graph

In [None]:
from langchain_hana import HanaRdfGraph

In [None]:
# Set up the Knowledge Graph
graph_uri = "kgdocu_movies"

graph = HanaRdfGraph(
    connection=connection, graph_uri=graph_uri, auto_extract_ontology=True
)

In [None]:
# a basic graph schema is extracted from the data graph. This schema will guide the LLM to generate a proper SPARQL query.
print(graph.get_schema)
schema_ttl = graph.get_schema

In [None]:
import rdflib
from rdflib.tools.rdf2dot import rdf2dot
import io
import networkx as nx
import matplotlib.pyplot as plt
from IPython.display import Markdown
import pandas as pd

In [None]:
# Parse into an RDFLib graph
g = rdflib.Graph()
g.parse(data=graph.get_schema, format="turtle")

# Build a NetworkX graph
G = nx.DiGraph()

# Add edges based on domain-range relationships
for prop in g.subjects(rdflib.RDF.type, rdflib.OWL.ObjectProperty):
    domain = g.value(prop, rdflib.RDFS.domain)
    range_ = g.value(prop, rdflib.RDFS.range)
    label = g.value(prop, rdflib.RDFS.label) or prop.split('/')[-1]
    if domain and range_:
        d_label = domain.split('/')[-1]
        r_label = range_.split('/')[-1]
        G.add_node(d_label)
        G.add_node(r_label)
        G.add_edge(d_label, r_label, label=str(label))

for prop in g.subjects(rdflib.RDF.type, rdflib.OWL.DatatypeProperty):
    domain = g.value(prop, rdflib.RDFS.domain)
    range_ = g.value(prop, rdflib.RDFS.range)
    label = g.value(prop, rdflib.RDFS.label) or prop.split('/')[-1]
    if domain and range_:
        d_label = domain.split('/')[-1]
        r_label = range_.split('/')[-1]
        G.add_node(d_label)
        G.add_node(r_label)
        G.add_edge(d_label, r_label, label=str(label))

# Draw using Matplotlib
# pos = nx.spring_layout(G)
# pos = nx.kamada_kawai_layout(G, weight=None)
pos = nx.circular_layout(G)
# pos = nx.spectral_layout(G)

nx.draw(G, pos, with_labels=True, node_size=2000)
edge_labels = nx.get_edge_attributes(G, 'label')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
plt.title("Ontology Schema Graph")
# plt.tight_layout()
plt.show()

## Executing SPARQL Queries

You can use the `query()` method to execute arbitrary SPARQL queries (`SELECT`, `ASK`, `CONSTRUCT`, etc.) on the data graph.  


The following query retrieves the top 10 movies with the highest number of contributors:

In [None]:
query = """
PREFIX kg: <http://kg.demo.sap.com/>
SELECT ?movieTitle (COUNT(?actor) AS ?actorCount)

FROM <kgdocu_movies>
WHERE {
    ?actor kg:acted_in ?movie .
    ?movie kg:title ?movieTitle .
}
GROUP BY ?movieTitle
ORDER BY DESC(?actorCount)
LIMIT 10
"""
top10 = graph.query(query)
pd.read_csv(io.StringIO(top10))