# GraphPromptor Usage Example

This notebook demonstrates the end-to-end workflow of the GraphPromptor framework, from loading a Knowledge Graph to generating an answer using an LLM guided by the graph information.

The workflow follows the "Retrieve → Embed → Reason" methodology, utilizing the `RetrievalModule`, `KnowledgeAdapter`, and `ReasoningModule`.

In [None]:
# Import necessary modules
from lightprof.utils import load_kg_from_triples, get_gemini_tokenizer
from lightprof.retrieval import RetrievalModule
from lightprof.adapter import KnowledgeAdapter, Subgraph # Import Subgraph type alias
from lightprof.reasoning import ReasoningModule

import networkx as nx
from typing import List, Tuple, Dict, Any
import torch

## 1. Load Knowledge Graph

Load the Knowledge Graph from a TSV file containing triples (head, relation, tail). The `load_kg_from_triples` function from `lightprof.utils` is used for this purpose.

In [None]:
# Define the path to the knowledge graph triples file
kg_filepath: str = 'data/freebase_triples.tsv'

# Load the Knowledge Graph
try:
    kg: nx.DiGraph = load_kg_from_triples(filepath=kg_filepath)
    print(f"Successfully loaded Knowledge Graph from {kg_filepath}")
    print(f"Number of nodes: {kg.number_of_nodes()}")
    print(f"Number of edges: {kg.number_of_edges()}")
except FileNotFoundError:
    print(f"Error: KG file not found at {kg_filepath}. Please ensure the file exists.")
except Exception as e:
    print(f"An error occurred while loading the KG: {e}")

## 2. Initialize Modules

Initialize the three core modules: `RetrievalModule`, `KnowledgeAdapter`, and `ReasoningModule`.

- The `RetrievalModule` requires the loaded Knowledge Graph and placeholder components for `hop_predictor` and `entity_linker` if actual trained models are not available.
- The `KnowledgeAdapter` is initialized with parameters for the text encoder (BERT) and embedding dimensions.
- The `ReasoningModule` is initialized with the LLM model name and a hard prompt template. Note that the ReasoningModule requires the `GOOGLE_API_KEY` environment variable to be set for the Gemini model.

In [None]:
# Initialize tokenizer (used by RetrievalModule and potentially others)
tokenizer: Any = get_gemini_tokenizer()

# Initialize placeholder or dummy components for RetrievalModule
# In a real scenario, these would be trained models
hop_predictor: Any = None  # Replace with your trained hop predictor
entity_linker: Any = None  # Replace with your trained entity linker
path_ranker: Any = None    # Replace with your trained path ranker

# Initialize RetrievalModule
retriever: RetrievalModule = RetrievalModule(
    kg=kg,
    hop_predictor=hop_predictor,
    entity_linker=entity_linker,
    path_ranker=path_ranker
)

# Initialize KnowledgeAdapter
# Parameters like struct_emb_dim and llm_embed_dim should match your training setup
adapter: KnowledgeAdapter = KnowledgeAdapter(
    bert_model_name='bert-base-uncased', # Text encoder model
    struct_emb_dim=128,                 # Structural embedding dimension
    llm_embed_dim=768                   # Target LLM embedding dimension
)

# Initialize ReasoningModule
# Ensure GOOGLE_API_KEY environment variable is set for Gemini models
reasoner: ReasoningModule = ReasoningModule(
    llm_model_name='gemini-2.5-flash-preview-04-17', # LLM model name
    hard_template="Answer the question: {question}\nKnowledge Graph Info: {kg_info}\nAnswer:" # Hard prompt template
)

print("Modules initialized.")

## 3. Define Sample Question

Define a sample question that the GraphPromptor will attempt to answer using the Knowledge Graph.

In [None]:
# Define the question
question: str = "Which drugs did Lindsay Lohan abuse?"

print(f"Sample Question: {question}")

## 4. Run Inference Process

Execute the core GraphPromptor workflow:

1.  **Retrieve Paths:** Use the `RetrievalModule` to find relevant paths in the KG based on the question.
2.  **Encode Paths:** Use the `KnowledgeAdapter` to convert the retrieved paths into soft prompt embeddings.
3.  **Generate Answer:** Use the `ReasoningModule` to generate the final answer by combining the question and the soft prompt embeddings (passed as the subgraph structure in this implementation).

In [None]:
# --- Step 4.1: Retrieve Paths ---
# Use the retriever to find relevant paths in the KG
print("Retrieving paths...")
paths: Subgraph = retriever.retrieve_paths(question=question)

print(f"Retrieved {len(paths)} paths.")
if paths:
    print("Example retrieved path:")
    for triple in paths[0]:
        print(f"  ({triple[0]}, {triple[1]}, {triple[2]})")

# --- Step 4.2: Encode Paths into Soft Prompts ---
# Use the adapter to convert paths into fused embeddings
# Note: The current ReasoningModule implementation uses the subgraph structure string,
# not the actual soft prompt embeddings from the adapter. The adapter call is included
# here to show the intended workflow step, but its output (`soft`) is not directly
# used by the reasoner's `answer` method in its current form.
print("Encoding paths into soft prompts...")
soft: torch.Tensor = adapter(subgraph=paths)

print(f"Generated soft prompt embeddings with shape: {soft.shape}")

# --- Step 4.3: Generate Answer ---
# Use the reasoner to generate the answer
# The reasoner's `answer` method expects the subgraph structure, not the soft embeddings.
# This is a simplification in the current ReasoningModule implementation.
print("Generating answer...")
answer: str = reasoner.answer(question=question, subgraph=paths)

print("Inference process complete.")

## 5. Print Final Answer

Display the answer generated by the `ReasoningModule`.

In [None]:
# Print the final answer
print("\nFinal Answer:")
print(answer)