# Multi-Abstraction Triplet Extraction

This notebook demonstrates how to extract multiple knowledge graphs from a single LaTeX proof at different levels of abstraction using LLM prompts.

The goal is to generate:
- **High-level abstract graph**: capturing broad proof steps
- **Mid-level graph**: intermediate details
- **Fine-grained graph**: detailed entities and relations

Each graph will be stored in Neo4j with metadata to distinguish abstraction levels.

---

## 1. Setup

- Ensure Neo4j is running (`docker compose up -d`)
- Configure Neo4j connection and LLM API keys if needed
- Import necessary libraries

Add src to the Python Path in the Notebook

In [9]:
%load_ext autoreload
%autoreload 2
import sys
import os


# Add the project root directory to the Python path
sys.path.append(os.path.abspath(os.path.join("../../..")))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [10]:
from IPython.display import display, Math, Latex
import re
from src.utils.file_utils import read_proof

# Load LaTeX proof
proof_latex = read_proof(
    "../../data/proofs/english/sum_of_first_n_ints/proof2.tex"
)

# Find the start and end positions
start = proof_latex.find(r"\begin{document}") + len(r"\begin{document}")
end = proof_latex.find(r"\end{document}")

# Extract the content between \begin{document} and \end{document}
informal_proof = proof_latex[start:end].strip()

# Replace any \section{...} with ## ...
informal_proof = re.sub(r"\\section\{([^}]+)\}", r"## \1", informal_proof)
informal_proof = re.sub(r"\\subsection\{([^}]+)\}", r"### \1", informal_proof)
informal_proof = re.sub(r"\\title\{([^}]+)\}", r"# \1", informal_proof)
informal_proof = re.sub(r"\\maketitle", "", informal_proof)

# Replace \begin{itemize} and \end{itemize} with Markdown-style lists
informal_proof = re.sub(r"\\begin{itemize}", "", informal_proof)
informal_proof = re.sub(r"\\end{itemize}", "", informal_proof)
informal_proof = re.sub(r"\\item\s+\*\*([^:]+):\*\*", r"- **\1:**", informal_proof)

# Replace \begin{center} ... \end{center} with \[ ... \]
informal_proof = re.sub(
    r"\\begin{center}(.*?)\\end{center}", r"\[\1\]", informal_proof, flags=re.DOTALL
)

# Display the LaTeX content
display(Latex(informal_proof))

<IPython.core.display.Latex object>

## 2. Define Prompts for Different Abstraction Levels

We customize the global `TRIPLET_EXTRACTION_PROMPT` before each extraction to instruct the LLM to extract triplets at various granularities.

## 3. Extract Triplets Using `extract_triplets()`

In [12]:
from src.phase1.extract_triplets import extract_triplets
from src.phase1.prompts import (
    TRIPLET_EXTRACTION_PROMPT,
    HIGH_LEVEL_TRIPLET_EXTRACTION_PROMPT,
    MID_LEVEL_TRIPLET_EXTRACTION_PROMPT,
    FINE_GRAINED_TRIPLET_EXTRACTION_PROMPT,
)
# Original prompt
ORIGINAL_PROMPT = TRIPLET_EXTRACTION_PROMPT
original_triplets = extract_triplets(informal_proof, ORIGINAL_PROMPT)

# High level
high_level_triplets = extract_triplets(
    informal_proof, HIGH_LEVEL_TRIPLET_EXTRACTION_PROMPT
)

# Mid level
mid_level_triplets = extract_triplets(
    informal_proof, MID_LEVEL_TRIPLET_EXTRACTION_PROMPT
)

# Fine-grained level
fine_grained_triplets = extract_triplets(
    informal_proof, FINE_GRAINED_TRIPLET_EXTRACTION_PROMPT
)

## 4. Store Triplets in Neo4j with Abstraction Level Metadata

Add a `graph_level` property to nodes and/or relationships to distinguish abstraction levels.

In [None]:
from src.utils.neo4j_utils import Neo4JUtils

# Initialize Neo4JUtils
neo4j = Neo4JUtils("bolt://localhost:7687", ("fanavaran", "fanavaran"))

# Clean the database (delete all nodes and relationships)
neo4j.clean_database()

# Example usage (assuming triplet objects with .entities and .relations)
neo4j.store_triplets(original_triplets, "original")
neo4j.store_triplets(high_level_triplets, "high")
neo4j.store_triplets(mid_level_triplets, "medium")
neo4j.store_triplets(fine_grained_triplets, "low")

In [14]:
fine_grained_triplets

Triplet(entities=[Entity(id='1', name='Sum of the first n natural numbers', label='S(n)', type='Mathematical Expression'), Entity(id='2', name='n', label='Natural Number', type='Variable'), Entity(id='3', name='1 + 2 + ... + n', label='Sum Expression', type='Mathematical Expression'), Entity(id='4', name='n + (n-1) + ... + 1', label='Reversed Sum Expression', type='Mathematical Expression'), Entity(id='5', name='(1+n), (2+(n-1)), ..., (n+1)', label='Pairwise Sum Expression', type='Mathematical Expression'), Entity(id='6', name='\x0crac{n(n+1)}{2}', label='Formula for Sum', type='Mathematical Expression')], relations=[Relation(source='1', target='3', type='Defines', name='Definition of S(n)'), Relation(source='1', target='6', type='Equals', name='Sum equals formula'), Relation(source='3', target='4', type='Rearrangement', name='Rearranging the sum'), Relation(source='3', target='5', type='Transformation', name='Pairwise addition'), Relation(source='2', target='1', type='Variable in', na

## 5. Visualize and Compare Graphs in Neo4j Browser

Use Cypher queries to filter graphs by abstraction level.

### Show all graphs:
```cypher
MATCH p=()-[r]->() RETURN p
```

### Show only high-level graph:
```cypher
MATCH p=()-[r]->() WHERE r.graph_abstraction_level = 'high' RETURN p
```

### Show only mid-level graph:
```cypher
MATCH p=()-[r]->() WHERE r.graph_abstraction_level = 'medium' RETURN p
```

### Show only fine-grained graph:
```cypher
MATCH p=()-[r]->() WHERE r.graph_abstraction_level = 'low' RETURN p
```

---

## 6. Sequential Step-Based Triplet Extraction by Detail Level

In this section, we generate multiple knowledge graphs by instructing the LLM to explain the proof sequentially at different levels of detail:

- **high**: very detailed sequential steps
- **medium**: moderate detail
- **low**: only major steps, ignoring fine details

This approach provides alternative graph views based on proof step granularity.

In [15]:
from src.phase1.extract_triplets import extract_triplets

detail_levels = ["high", "medium", "low"]
sequential_triplets = {}

for level in detail_levels:
#     prompt_template = """
# Given the following LaTeX proof, explain the proof sequentially at a {level} level of detail.

# three qualitative levels:
# - "high": very detailed sequential steps
# - "medium": moderate detail
# - "low": only major steps, ignoring fine details

# For each step, provide a one-word summary to be used as the node label.
# Then, extract triplets in the form <Source Entity, Relationship, Target Entity> that connect these summarized steps.

# Proof:
# {{proof}}
# """

    prompt_template = """
Given the following LaTeX proof, your task is to explain the proof sequentially at the **{level}** level of detail.

The three qualitative levels are:
- **high**: very detailed sequential steps, include all minor details and sub-steps.
- **medium**: moderate detail, include key steps but omit minor details.
- **low**: only major steps, ignore or summarize fine details.

Please generate the explanation at the **{level}** level accordingly.

For each step, provide a one-word summary to be used as the node label.

Then, extract triplets in the form <Source Entity, Relationship, Target Entity> that connect these summarized steps.

Proof:
{{proof}}
"""
    # Pre-format the prompt template with the `level` variable
    pre_formatted_prompt = prompt_template.format(level=level)
    # Pass the pre-formatted prompt and the `proof` parameter to the extract_triplets function
    triplets = extract_triplets(informal_proof, pre_formatted_prompt)

    # Store the triplets in the dictionary
    sequential_triplets[level] = triplets

    # Store in Neo4j with metadata
    neo4j.store_triplets(triplets, f"sequential_{level}")

In [16]:
sequential_triplets

{'high': Triplet(entities=[Entity(id='1', name='Sum of first n natural numbers', label='Sum', type='Concept'), Entity(id='2', name='Reversed sum of first n natural numbers', label='Reversed Sum', type='Concept'), Entity(id='3', name='Pairing of elements', label='Pairing', type='Concept'), Entity(id='4', name='Resulting sums', label='Result', type='Concept')], relations=[Relation(source='1', target='2', type='is', name='Same sum in different order'), Relation(source='1', target='3', type='forms', name='Pairing elements'), Relation(source='3', target='4', type='results in', name='Resulting sums')]),
 'medium': Triplet(entities=[Entity(id='1', name='Sum of first n natural numbers', label='Sum', type='Concept'), Entity(id='2', name='Reversed sum of first n natural numbers', label='Reversed Sum', type='Concept'), Entity(id='3', name='Pairing elements of sums', label='Pairing', type='Process'), Entity(id='4', name='Resulting pairs', label='Result', type='Outcome')], relations=[Relation(sourc