# Course Pattern Extraction and Application

This notebook implements a two-phase approach for:
1. Extracting graph-based patterns from a course about mathematical induction
2. Using those patterns to construct knowledge graphs for specific proof examples

---

## Phase 1: Course Pattern Extraction

First, we'll extract the graph-based pattern from the mathematical induction course.

In [5]:
%load_ext autoreload
%autoreload 2
import sys
import os

# Add the project root directory to the Python path
sys.path.append(os.path.abspath(os.path.join("../../..")))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
from IPython.display import display, Latex
import re
from src.utils.file_utils import read_proof
from src.phase1.extract_triplets import extract_triplets
from src.utils.neo4j_utils import Neo4JUtils

# Load the course content
course_latex = read_proof("../../data/courses/addition/course.tex")

# # Extract content between \begin{document} and \end{document}
# start = course_latex.find(r"\begin{document}") + len(r"\begin{document}")
# end = course_latex.find(r"\end{document}")
# proof = course_latex[start:end].strip()

# # Convert LaTeX to markdown-like format
# proof = re.sub(r"\\section\{([^}]+)\}", r"## \1", proof)
# proof = re.sub(r"\\subsection\{([^}]+)\}", r"### \1", proof)
# proof = re.sub(r"\\title\{([^}]+)\}", r"# \1", proof)
# proof = re.sub(r"\\maketitle", "", proof)
# proof = re.sub(r"\\begin{itemize}", "", proof)
# proof = re.sub(r"\\end{itemize}", "", proof)
# proof = re.sub(r"\\item\s+\*\*([^:]+):\*\*", r"- **\1:**", proof)

# # Display the course content
# display(Latex(proof))

### Extract Course Pattern

We'll use a specialized prompt to extract the pattern of mathematical induction from the course content.

In [None]:
SYSTEM_PROMPT = """You are an expert in mathematical proof/reasoning analysis, specializing in extracting structured knowledge graph from mathematical texts. Your task is to identify key detailed steps in a mathematical proof and relationships in mathematical content and represent them as knowledge graph triplets."""

COURSE_PATTERN_PROMPT = """
Given the following mathematic course content in LaTeX format, extract the key steps of the mathematical proof in fine grainded detailed steps and structure these steps to form the knowledge graph triplets as the pattern of explanatory chain (reasoning) for this course.

Focus on identifying:
1. The steps of mathematical proof in informal language
2. The relationships between these proof steps as a chain or sequence
3. The typical structure and flow of mathematical proofs of this type
4. Key steps and their relationships
5. The final triplets of the knowledge graph should have a single or multiple start and single or multiple end nodes/entities which are the steps in the proof. Please make sure the final graph is a single connected component and have label of the start and end nodes/entities.

Extract triplets in the form <Source Step/Entity, Relationship, Target Step/Entity> that represent this pattern.

Course Content:
```
{proof}
```
"""

# Extract the course pattern
course_pattern = extract_triplets(
    proof=course_latex,
    custom_prompt=COURSE_PATTERN_PROMPT,
    system_message=SYSTEM_PROMPT,
)
print(course_pattern)

# Store the pattern in Neo4j
neo4j = Neo4JUtils("bolt://localhost:7687", ("neo4j", "password"))
neo4j.clean_database()
neo4j.store_triplets(course_pattern, "course_pattern")

[SystemMessage(content='You are an expert in mathematical proof analysis, specializing in extracting structured knowledge graph from mathematical texts. Your task is to identify key steps in a mathematical proof and relationships in mathematical content and represent them as knowledge graph triplets.', additional_kwargs={}, response_metadata={}), HumanMessage(content='\nGiven the following mathematic course content in LaTeX format, extract the key steps of the mathematical proof in fine grainded detailed steps and structure these steps to form the knowledge graph triplets as the pattern of explanatory chain (reasoning) for this course.\n\nFocus on identifying:\n1. The steps of mathematical proof in informal language\n2. The relationships between these proof steps as a chain or sequence\n3. The typical structure and flow of mathematical proofs of this type\n4. Key steps and their relationships\n5. The final triplets of the knowledge graph should have a single or multiple start and singl

## Phase 2: Proof Pattern Application

Now we'll use the extracted pattern to construct knowledge graphs for specific proof examples.

In [4]:
# Load a proof example
proof_latex = read_proof("../../data/courses/addition/example3.tex")

# Extract content between \begin{document} and \end{document}
start = proof_latex.find(r"\begin{document}") + len(r"\begin{document}")
end = proof_latex.find(r"\end{document}")
proof = proof_latex[start:end].strip()

# Convert LaTeX to markdown-like format
proof = re.sub(r"\\section\{([^}]+)\}", r"## \1", proof)
proof = re.sub(r"\\subsection\{([^}]+)\}", r"### \1", proof)

# Display the proof
display(Latex(proof))

<IPython.core.display.Latex object>

In [14]:
PROOF_PATTERN_APPLICATION_PROMPT = """
Given the following mathematical proof and the given pattern of mathematical proof extracted from the course, construct a knowledge graph that follows the given pattern.

The pattern components are:
{course_pattern}

For the given proof, extract triplets in the form <Source Entity, Relationship, Target Entity> that:
1. Follow the structure of mathematical proof pattern
2. Map to the steps identified in the course pattern
3. Capture the specific details and relationships in this proof which may be different from the course pattern

Proof:
{{proof}}
"""

# # Format the prompt with course_pattern
# formatted_prompt = PROOF_PATTERN_APPLICATION_PROMPT.format(
#     course_pattern=course_pattern
# )

# # Pass the formatted prompt and proof_content to extract_triplets
# proof_triplets = extract_triplets(proof, formatted_prompt)

# # # Apply the pattern to the proof
# # proof_triplets = extract_triplets(proof_content, PROOF_PATTERN_APPLICATION_PROMPT)

# # Store the proof graph in Neo4j
# neo4j.store_triplets(proof_triplets, "proof_example")

# Format the prompt with course_pattern
formatted_prompt = PROOF_PATTERN_APPLICATION_PROMPT.format(
    course_pattern=course_pattern
)

# Pass the formatted prompt and proof_content to extract_triplets
proof_triplets = extract_triplets(proof, formatted_prompt)

# Clean the database to remove any existing graphs
neo4j.clean_database()

# Store the course pattern graph
neo4j.store_triplets(course_pattern, "course_pattern")

# Store the proof graph as a separate graph
neo4j.store_triplets(proof_triplets, "proof_example")


# Extract the course pattern
course_pattern = extract_triplets(
    proof, custom_prompt=COURSE_PATTERN_PROMPT, system_message=SYSTEM_PROMPT
)
print(course_pattern)

# Display visualization queries
print(neo4j.get_visualization_queries())

[SystemMessage(content='You are a helpful assistant that extracts entities and relations from mathematical proofs.', additional_kwargs={}, response_metadata={}), HumanMessage(content="\nGiven the following mathematical proof and the given pattern of mathematical proof extracted from the course, construct a knowledge graph that follows the given pattern.\n\nThe pattern components are:\nentities=[Entity(id='1', name='Base Case', label='Base Case of Addition', type='axiom', start=True, end=False), Entity(id='2', name='Recursive Case', label='Recursive Case of Addition', type='axiom', start=False, end=False), Entity(id='3', name='Addition Definition', label='Definition of Addition', type='definition', start=False, end=True)] relations=[Relation(source='1', target='3', type='grounds', name='Base Case grounds the definition of addition'), Relation(source='2', target='3', type='grounds', name='Recursive Case grounds the definition of addition'), Relation(source='1', target='2', type='explains

## Visualization and Analysis

You can visualize and analyze the graphs in Neo4j Browser using these queries:

### View Course Pattern:
```cypher
MATCH p=()-[r]->() WHERE r.graph_type = 'course_pattern' RETURN p
```

### View Proof Graph:
```cypher
MATCH p=()-[r]->() WHERE r.graph_type = 'proof_example' RETURN p
```

### Compare Pattern and Proof:
```cypher
MATCH p=()-[r]->() RETURN p
```