#Assessment 2: Building a Knowledge Graph with Semantic Search and Query Answering
Instructions:
- Build a basic RDF knowledge graph using the rdflib library.
- Example: "Alice is a person," "Alice knows Bob," "Bob works at Acme Corp."
- Convert the RDF triples into simple text sentences (e.g., "Alice knows Bob").
- Embed these text sentences using any method you prefer (e.g., sentence-transformers, HuggingFace, or other embedding techniques).
- Store the embeddings in FAISS or ChromaDB.
- Accept a simple user query like: "Who does Alice know?" and return the relevant text from the stored embeddings.

In [1]:
!pip install rdflib sentence-transformers chromadb


Collecting rdflib
  Downloading rdflib-7.5.0-py3-none-any.whl.metadata (12 kB)
Collecting chromadb
  Downloading chromadb-1.3.7-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.3-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.39.1-py3-none-any.whl.metadata (2.5 kB)
Collecting pypika>=0.48.9 (from chromadb)
  Downloading PyPika-0.48.9.

In [2]:
# Build basic RDF
from rdflib import Graph, Namespace, RDF

g = Graph()
ex = Namespace("http://example.org/")

g.add((ex.Alice, RDF.type, ex.Person))
g.add((ex.Alice, ex.knows, ex.Bob))
g.add((ex.Bob, ex.worksAt, ex.AcmeCorp))


<Graph identifier=Ncf380227d46f4fec8bdaf4ba29585834 (<class 'rdflib.graph.Graph'>)>

In [3]:
# Convert RDF triples into simple text sentences
sentences = [
    "Alice is a person",
    "Alice knows Bob",
    "Bob works at Acme Corp"
]


In [5]:
# create Embeddings
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
sentence_embeddings = model.encode(sentences)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [8]:
# Store embeddings in ChromaDB
import chromadb

client = chromadb.Client()
collection = client.create_collection(name="knowledge_graph1")

for i in range(len(sentence_embeddings)):
    collection.add(
        ids=[str(i)],
        embeddings=[sentence_embeddings[i]],
        documents=[sentences[i]]
    )


In [9]:
# Query
query = "Who does Alice know"
query = query.lower()

query_embedding = model.encode([query])


In [10]:
# Retrive
result = collection.query(
    query_embeddings=query_embedding,
    n_results=1
)

print("Answer:", result["documents"][0][0])


Answer: Alice knows Bob
