In [1]:
# First, let's import the necessary libraries and set up our environment
%load_ext autoreload
%autoreload 2
import sys
import os

# Add the project root directory to the Python path
sys.path.append(os.path.abspath(os.path.join("../../..")))  

from IPython.display import display, Latex
import re
from src.utils.file_utils import read_proof
from src.phase1.extract_triplets import extract_calculation_graph
from src.utils.neo4j_utils import Neo4JUtils

# Load the course content for addition and multiple addition
addition_course_latex = read_proof("../../data/courses/addition/course_2.tex")
multiple_addition_course_latex = read_proof("../../data/courses/multiple_addition/course.tex")

# Hierarchical Pattern Extraction for Multiple Addition

In this notebook, we'll extract knowledge graph patterns for mathematical operations in a hierarchical manner:

1. First, we'll extract the pattern for "Addition by Recursion" from the addition course
2. Then, we'll use this pattern to inform the extraction of the "Multiple Addition" pattern
3. This approach will create a hierarchical knowledge representation where multiple addition builds upon binary addition

This hierarchical approach reflects the natural relationship between mathematical operations, where multiple addition can be defined in terms of repeated binary addition.

In [2]:
# Define the system prompt for our LLM
SYSTEM_PROMPT = """You are an expert in mathematical calculation analysis, specializing in extracting structured knowledge graph from mathematical calculation texts. Your task is to identify quantom/detailed progress/procedural steps in a mathematical calculation process and represent them as a fine-grained knowledge graph with explicit step-by-step reasoning."""

# Define the prompt for extracting the addition course pattern
ADDITION_COURSE_PATTERN_PROMPT = """
Given the following mathematical course content in LaTeX format, extract a VERY DETAILED step-by-step explanatory chain that represents the calculation process. Create a knowledge graph with fine-grained steps that shows exactly how calculations proceed from start to finish.

Focus on identifying:
1. Every individual calculation step, no matter how small (e.g., "Add 2 to both sides", "Apply distributive property", etc.)
2. The precise mathematical operations performed at each step
3. The exact sequence of operations, with clear predecessor-successor relationships
4. Intermediate results at each calculation stage
5. The mathematical justification for each step (e.g., "By the associative property", "By substituting value from step 3")

IMPORTANT: If you see examples in the course content, do not extract them as separate graphs. The examples are only included to help you understand the calculation process better. Focus only on extracting the general calculation pattern/process.

The final knowledge graph MUST:
1. Have clearly marked START node(s) (the initial example statement)
2. Have clearly marked END node(s) (the final result)
3. Include ALL intermediate calculation steps with no gaps in reasoning
4. Form a single connected component with a clear directional flow
5. Use relationship types that precisely describe the mathematical operation performed (e.g., "applies_distributive_property", "substitutes_value", "simplifies_expression")

Extract a calculation graph with steps and transitions that represent this detailed calculation process.

Note:
- Steps should be specified with short mathematical expressions.
- Transitions should explain all the minor steps of the reasoning of the calculation.

Course Content:
```
{course_latex}
```
"""

# Extract the addition course pattern
addition_course_pattern = extract_calculation_graph(
    custom_prompt=ADDITION_COURSE_PATTERN_PROMPT.format(
        course_latex=addition_course_latex
    ),
    system_message=SYSTEM_PROMPT,
)
print("Addition Course Pattern:")
print(addition_course_pattern)

[SystemMessage(content='You are an expert in mathematical calculation analysis, specializing in extracting structured knowledge graph from mathematical calculation texts. Your task is to identify quantom/detailed progress/procedural steps in a mathematical calculation process and represent them as a fine-grained knowledge graph with explicit step-by-step reasoning.', additional_kwargs={}, response_metadata={}), HumanMessage(content='\nGiven the following mathematical course content in LaTeX format, extract a VERY DETAILED step-by-step explanatory chain that represents the calculation process. Create a knowledge graph with fine-grained steps that shows exactly how calculations proceed from start to finish.\n\nFocus on identifying:\n1. Every individual calculation step, no matter how small (e.g., "Add 2 to both sides", "Apply distributive property", etc.)\n2. The precise mathematical operations performed at each step\n3. The exact sequence of operations, with clear predecessor-successor 

## Hierarchical Pattern Extraction for Multiple Addition

Now that we have extracted the pattern for addition by recursion, we'll use it to inform the extraction of the multiple addition pattern. 

The key insight is that multiple addition can be defined as repeated binary addition, so the multiple addition pattern should incorporate the binary addition pattern as a sub-component. This creates a hierarchical relationship between the two operations.

In [3]:
# Define the prompt for extracting the multiple addition course pattern using the addition pattern
MULTIPLE_ADDITION_COURSE_PATTERN_PROMPT = """
Given the following mathematical course content in LaTeX format and a previously extracted addition pattern, extract a VERY DETAILED step-by-step explanatory chain for multiple addition. Create a knowledge graph with fine-grained steps that shows exactly how calculations proceed from start to finish.

The previously extracted addition pattern is:
```
{addition_pattern}
```

IMPORTANT: Multiple addition builds upon binary addition. Your extracted knowledge graph MUST maintain this hierarchical relationship by:
1. Incorporating the binary addition pattern as a sub-component of the multiple addition process
2. Showing how multiple addition operations decompose into binary addition operations
3. Creating explicit connections between multiple addition steps and their corresponding binary addition steps
4. Preserving the hierarchical structure where multiple addition is defined in terms of binary addition

Focus on identifying:
1. Every individual calculation step in the multiple addition process
2. How multiple addition decomposes into simpler binary addition problems
3. How parentheses are used to group terms for binary addition
4. The precise sequence of operations with clear predecessor-successor relationships
5. Intermediate results at each calculation stage

The final knowledge graph MUST:
1. Have clearly marked START node(s) (the initial multiple addition problem)
2. Have clearly marked END node(s) (the final result)
3. Include ALL intermediate calculation steps with no gaps in reasoning
4. Form a single connected component with a clear directional flow
5. Use relationship types that precisely describe the mathematical operations
6. Show the hierarchical relationship between multiple addition and binary addition

Extract a calculation graph with steps and transitions that represent this detailed calculation process.

Note:
- Steps should be specified with short mathematical expressions.
- Transitions should explain all the minor steps of the reasoning, including how multiple addition steps decompose into binary addition steps.

Course Content:
```
{course_latex}
```
"""

# Extract the multiple addition course pattern using the addition pattern
multiple_addition_course_pattern = extract_calculation_graph(
    custom_prompt=MULTIPLE_ADDITION_COURSE_PATTERN_PROMPT.format(
        addition_pattern=addition_course_pattern,
        course_latex=multiple_addition_course_latex
    ),
    system_message=SYSTEM_PROMPT,
)
print("Multiple Addition Course Pattern:")
print(multiple_addition_course_pattern)

[SystemMessage(content='You are an expert in mathematical calculation analysis, specializing in extracting structured knowledge graph from mathematical calculation texts. Your task is to identify quantom/detailed progress/procedural steps in a mathematical calculation process and represent them as a fine-grained knowledge graph with explicit step-by-step reasoning.', additional_kwargs={}, response_metadata={}), HumanMessage(content='\nGiven the following mathematical course content in LaTeX format and a previously extracted addition pattern, extract a VERY DETAILED step-by-step explanatory chain for multiple addition. Create a knowledge graph with fine-grained steps that shows exactly how calculations proceed from start to finish.\n\nThe previously extracted addition pattern is:\n```\nsteps=[MathStep(id=\'step1\', expression=\'a + b\', operation=\'initial_expression\', is_start=True, is_end=False), MathStep(id=\'step2\', expression=\'(a + (b-1)) + 1\', operation=\'decompose_b\', is_sta

## Storing Patterns in Neo4j

Now we'll store both patterns in Neo4j to visualize the hierarchical relationship between binary addition and multiple addition. This will allow us to see how multiple addition operations build upon binary addition operations.

In [4]:
# Store the patterns in Neo4j
neo4j = Neo4JUtils("bolt://localhost:7687", ("neo4j", "password"))
neo4j.clean_database()

# Function to sanitize LaTeX commands in relationship names
def sanitize_latex(text):
    # Replace LaTeX commands with plain text alternatives
    sanitized = text.replace("\\underbrace", "")
    sanitized = sanitized.replace("\\text", "")
    sanitized = sanitized.replace("\\cdots", "...")
    sanitized = sanitized.replace("\\times", "×")
    # Remove LaTeX subscripts and superscripts
    sanitized = re.sub(r"_{.*?}", "", sanitized)
    sanitized = re.sub(r"\^{.*?}", "", sanitized)
    return sanitized

# Sanitize expressions in multiple addition pattern
for step in multiple_addition_course_pattern.steps:
    step.expression = sanitize_latex(step.expression)

# Store the addition pattern
neo4j.store_calculation_graph(addition_course_pattern, "addition_course")

# Store the multiple addition pattern
neo4j.store_calculation_graph(multiple_addition_course_pattern, "multiple_addition_course")

print("Patterns stored in Neo4j database.")

Patterns stored in Neo4j database.


## Testing the Hierarchical Pattern

Now let's test our hierarchical pattern by applying it to a specific multiple addition problem. We'll see how the pattern breaks down the problem into a sequence of binary additions.

In [5]:
# Example: Load a test problem and apply the hierarchical pattern
test_problem = "2 + 3 + 4 + 5"

# Define a prompt to apply the hierarchical pattern to the test problem
TEST_PROBLEM_PROMPT = """
Given the following multiple addition problem and the hierarchical patterns for binary addition and multiple addition, 
create a detailed step-by-step solution that follows these patterns.

Binary Addition Pattern:
```
{addition_pattern}
```

Multiple Addition Pattern:
```
{multiple_addition_pattern}
```

Problem: {problem}

Provide a detailed solution that:
1. First applies the multiple addition pattern to break down the problem into binary additions
2. Then applies the binary addition pattern for each binary addition step
3. Shows all intermediate steps and calculations
4. Maintains the hierarchical relationship between multiple addition and binary addition

Extract a calculation graph with steps and transitions that represent this solution.
"""

# Apply the hierarchical pattern to the test problem
test_solution = extract_calculation_graph(
    custom_prompt=TEST_PROBLEM_PROMPT.format(
        addition_pattern=addition_course_pattern,
        multiple_addition_pattern=multiple_addition_course_pattern,
        problem=test_problem,
    ),
    system_message=SYSTEM_PROMPT,
)

print("Solution for test problem:")
print(test_solution)

# Store the test solution in Neo4j
neo4j.store_calculation_graph(test_solution, "test_solution")
print("Test solution stored in Neo4j database.")

[SystemMessage(content='You are an expert in mathematical calculation analysis, specializing in extracting structured knowledge graph from mathematical calculation texts. Your task is to identify quantom/detailed progress/procedural steps in a mathematical calculation process and represent them as a fine-grained knowledge graph with explicit step-by-step reasoning.', additional_kwargs={}, response_metadata={}), HumanMessage(content='\nGiven the following multiple addition problem and the hierarchical patterns for binary addition and multiple addition, \ncreate a detailed step-by-step solution that follows these patterns.\n\nBinary Addition Pattern:\n```\nsteps=[MathStep(id=\'step1\', expression=\'a + b\', operation=\'initial_expression\', is_start=True, is_end=False), MathStep(id=\'step2\', expression=\'(a + (b-1)) + 1\', operation=\'decompose_b\', is_start=False, is_end=False), MathStep(id=\'step3\', expression=\'((a + (b-2)) + 1) + 1\', operation=\'decompose_b\', is_start=False, is_e

## Visualization Queries

Here are some Cypher queries you can run in the Neo4j browser to visualize the hierarchical patterns:

1. View the binary addition pattern:
```cypher
MATCH (n)-[r]->(m) 
WHERE n.graph_type = "addition_course" AND m.graph_type = "addition_course"
RETURN n, r, m
```

2. View the multiple addition pattern:
```cypher
MATCH (n)-[r]->(m) 
WHERE n.graph_type = "multiple_addition_course" AND m.graph_type = "multiple_addition_course"
RETURN n, r, m
```

3. View the test solution:
```cypher
MATCH (n)-[r]->(m) 
WHERE n.graph_type = "test_solution" AND m.graph_type = "test_solution"
RETURN n, r, m
```