# Cypher Query Generation from Logic Patterns

This notebook demonstrates how to generate Neo4j Cypher queries from classified logic reasoning patterns.

## Workflow

1. **Classify Pattern** → Use `logic_reasoning.py` to identify pattern type
2. **Generate Cypher** → Use `cypher_generator.py` to create executable query
3. **Execute Query** → Run against Neo4j knowledge graph

## Pattern Coverage

Supports all 19 logic patterns with proper Cypher syntax for:
- Simple paths and chains
- Intersections and unions
- Negative constraints (NOT clauses)
- Complex multi-hop patterns

## Setup and Imports

In [None]:
# Import both modules
from logic_reasoning import predict_logic_pattern
from cypher_generator import (
    generate_cypher_query,
    batch_generate_cypher,
    validate_cypher_query,
    format_cypher_query
)

import json

def display_example(title, instance, show_formatted=True):
    """Helper function to display examples nicely"""
    print(f"\n{'='*80}")
    print(f"{title}")
    print(f"{'='*80}\n")
    
    # Show instance structure
    print("Instance Structure:")
    print(json.dumps(instance, indent=2))
    
    # Generate query
    query = generate_cypher_query(instance)
    
    if query:
        print(f"\nGenerated Cypher Query:")
        print("-" * 80)
        if show_formatted:
            print(format_cypher_query(query))
        else:
            print(query)
        print("-" * 80)
        print(f"\nValid: {validate_cypher_query(query)}")
    else:
        print("\nNo query generated (unsupported pattern)")
    
    print(f"\n{'='*80}")

## Example 1: Complete Workflow - From Pattern Classification to Cypher

This example shows the complete pipeline: classify the pattern, then generate the Cypher query.

In [None]:
# Step 1: Create an instance
instance = {
    'level': 3,
    'atoms': [
        {'n1': 'Gene', 'n2': 'Disease', 'pol': 'POS', 'rela': 'ASSOCIATES_WITH'},
        {'n1': 'Protein', 'n2': 'Disease', 'pol': 'POS', 'rela': 'TARGETS'}
    ],
    'logic': ['AND']
}

# Step 2: Classify the pattern
pattern = predict_logic_pattern(instance)
instance['predict_logic'] = pattern

print(f"Classified Pattern: {pattern}")
print(f"Description: Intersection - Find diseases targeted by both genes AND proteins\n")

# Step 3: Generate Cypher query
display_example("Complete Workflow: Pattern → Cypher", instance)

## Level 1 & 2: Basic Patterns

### Pattern 0p: Single Node Query

In [None]:
instance_0p = {
    'predict_logic': '0p',
    'instance': ['Gene'],
    'atoms': []
}

display_example("0p: Find all Genes", instance_0p)

### Pattern 1p: Single Edge Query

In [None]:
instance_1p = {
    'predict_logic': '1p',
    'atoms': [
        {'n1': 'Gene', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'}
    ],
    'retrieval': 'Disease'
}

display_example("1p: Diseases caused by genes", instance_1p)

## Level 3: Two-Atom Patterns

### Pattern 2p: Chain (Two-hop Path)

In [None]:
instance_2p = {
    'predict_logic': '2p',
    'atoms': [
        {'n1': 'Gene', 'n2': 'Protein', 'pol': 'POS', 'rela': 'ENCODES'},
        {'n1': 'Protein', 'n2': 'Disease', 'pol': 'POS', 'rela': 'REGULATES'}
    ],
    'retrieval': 'Disease'
}

print("Query: Find diseases regulated by proteins encoded by genes")
print("Structure: Gene → Protein → Disease\n")
display_example("2p: Chain Pattern", instance_2p)

### Pattern 2i: Intersection (AND with Positive Polarity)

In [None]:
instance_2i = {
    'predict_logic': '2i',
    'atoms': [
        {'n1': 'Gene_BRCA1', 'n2': 'Disease', 'pol': 'POS', 'rela': 'ASSOCIATES_WITH'},
        {'n1': 'Gene_TP53', 'n2': 'Disease', 'pol': 'POS', 'rela': 'ASSOCIATES_WITH'}
    ]
}

print("Query: Find diseases associated with BOTH BRCA1 AND TP53")
print("Structure: Gene_BRCA1 ↘")
print("                         Disease")
print("           Gene_TP53   ↗\n")
display_example("2i: Intersection Pattern", instance_2i)

### Pattern 2u: Union (OR with Positive Polarity)

In [None]:
instance_2u = {
    'predict_logic': '2u',
    'atoms': [
        {'n1': 'Drug_Aspirin', 'n2': 'Disease', 'pol': 'POS', 'rela': 'TREATS'},
        {'n1': 'Drug_Ibuprofen', 'n2': 'Disease', 'pol': 'POS', 'rela': 'TREATS'}
    ]
}

print("Query: Find diseases treated by Aspirin OR Ibuprofen")
print("Note: UNION combines results from both paths\n")
display_example("2u: Union Pattern", instance_2u)

### Pattern 2ni: Negated Intersection

In [None]:
instance_2ni = {
    'predict_logic': '2ni',
    'atoms': [
        {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'NEG', 'rela': 'CAUSES'}
    ]
}

print("Query: Find diseases caused by Gene_A but NOT Gene_B")
print("Note: WHERE NOT clause excludes the negative constraint\n")
display_example("2ni: Negated Intersection", instance_2ni)

### Pattern 2in: Intersection with Both Negative

In [None]:
instance_2in = {
    'predict_logic': '2in',
    'atoms': [
        {'n1': 'Gene_X', 'n2': 'Pathway', 'pol': 'NEG', 'rela': 'INHIBITS'},
        {'n1': 'Gene_Y', 'n2': 'Pathway', 'pol': 'NEG', 'rela': 'INHIBITS'}
    ]
}

print("Query: Find pathways NOT inhibited by Gene_X AND NOT inhibited by Gene_Y")
print("Note: Both constraints are negative\n")
display_example("2in: Both Negative Intersection", instance_2in)

### Pattern 2nu: Union with Both Negative

In [None]:
instance_2nu = {
    'predict_logic': '2nu',
    'atoms': [
        {'n1': 'Drug_A', 'n2': 'Disease', 'pol': 'NEG', 'rela': 'CONTRAINDICATED'},
        {'n1': 'Drug_B', 'n2': 'Disease', 'pol': 'NEG', 'rela': 'CONTRAINDICATED'}
    ]
}

print("Query: Diseases NOT contraindicated with Drug_A OR NOT contraindicated with Drug_B\n")
display_example("2nu: Both Negative Union", instance_2nu)

## Level 4: Three-Atom Patterns

### Pattern 3i: Three-way Intersection

In [None]:
instance_3i = {
    'predict_logic': '3i',
    'atoms': [
        {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'ASSOCIATES'},
        {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'POS', 'rela': 'ASSOCIATES'},
        {'n1': 'Gene_C', 'n2': 'Disease', 'pol': 'POS', 'rela': 'ASSOCIATES'}
    ]
}

print("Query: Find diseases associated with Gene_A AND Gene_B AND Gene_C")
print("Structure: All three genes point to the same disease\n")
display_example("3i: Three-way Intersection", instance_3i)

### Pattern 3u: Three-way Union

In [None]:
instance_3u = {
    'predict_logic': '3u',
    'atoms': [
        {'n1': 'Drug_Aspirin', 'n2': 'Symptom', 'pol': 'POS', 'rela': 'RELIEVES'},
        {'n1': 'Drug_Tylenol', 'n2': 'Symptom', 'pol': 'POS', 'rela': 'RELIEVES'},
        {'n1': 'Drug_Advil', 'n2': 'Symptom', 'pol': 'POS', 'rela': 'RELIEVES'}
    ]
}

print("Query: Symptoms relieved by Aspirin OR Tylenol OR Advil")
print("Note: Three separate UNION clauses combine all results\n")
display_example("3u: Three-way Union", instance_3u)

### Pattern 3p: Three-hop Chain

In [None]:
instance_3p = {
    'predict_logic': '3p',
    'atoms': [
        {'n1': 'Gene', 'n2': 'Protein', 'pol': 'POS', 'rela': 'ENCODES'},
        {'n1': 'Protein', 'n2': 'Pathway', 'pol': 'POS', 'rela': 'PARTICIPATES_IN'},
        {'n1': 'Pathway', 'n2': 'Disease', 'pol': 'POS', 'rela': 'DYSREGULATED_IN'}
    ]
}

print("Query: Find diseases with dysregulated pathways involving proteins encoded by genes")
print("Structure: Gene → Protein → Pathway → Disease\n")
display_example("3p: Three-hop Chain", instance_3p)

### Pattern ip: Intersection-Projection

In [None]:
instance_ip = {
    'predict_logic': 'ip',
    'atoms': [
        {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Disease', 'n2': 'Drug', 'pol': 'POS', 'rela': 'TREATED_BY'}
    ]
}

print("Query: Find drugs that treat diseases caused by both Gene_A AND Gene_B")
print("Structure: Gene_A ↘")
print("                    Disease → Drug")
print("           Gene_B ↗\n")
display_example("ip: Intersection then Projection", instance_ip)

### Pattern inp: Intersection-Negation-Projection

In [None]:
instance_inp = {
    'predict_logic': 'inp',
    'atoms': [
        {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'NEG', 'rela': 'CAUSES'},
        {'n1': 'Disease', 'n2': 'Symptom', 'pol': 'POS', 'rela': 'PRESENTS_WITH'}
    ]
}

print("Query: Symptoms of diseases caused by Gene_A but NOT Gene_B")
print("Note: Combines positive match with negative constraint\n")
display_example("inp: Intersection-Negation-Projection", instance_inp)

### Pattern pi: Projection-Intersection

In [None]:
instance_pi = {
    'predict_logic': 'pi',
    'atoms': [
        {'n1': 'Gene', 'n2': 'Pathway', 'pol': 'POS', 'rela': 'PARTICIPATES_IN'},
        {'n1': 'Pathway', 'n2': 'Disease', 'pol': 'POS', 'rela': 'IMPLICATED_IN'},
        {'n1': 'Protein', 'n2': 'Disease', 'pol': 'POS', 'rela': 'BIOMARKER_FOR'}
    ]
}

print("Query: Diseases implicated in pathways (via genes) AND with protein biomarkers")
print("Structure: Gene → Pathway → Disease")
print("                            ↑")
print("                         Protein\n")
display_example("pi: Projection then Intersection", instance_pi)

### Pattern pni: Projection-Negation-Intersection

In [None]:
instance_pni = {
    'predict_logic': 'pni',
    'atoms': [
        {'n1': 'Gene', 'n2': 'Pathway', 'pol': 'POS', 'rela': 'ACTIVATES'},
        {'n1': 'Pathway', 'n2': 'Disease', 'pol': 'POS', 'rela': 'LEADS_TO'},
        {'n1': 'Drug', 'n2': 'Disease', 'pol': 'NEG', 'rela': 'CONTRAINDICATED_FOR'}
    ]
}

print("Query: Diseases from gene-activated pathways, NOT contraindicated with certain drugs")
print("Note: Chain followed by negative constraint\n")
display_example("pni: Projection-Negation-Intersection", instance_pni)

### Pattern up: Union-Projection

In [None]:
instance_up = {
    'predict_logic': 'up',
    'atoms': [
        {'n1': 'Gene_A', 'n2': 'Pathway', 'pol': 'POS', 'rela': 'REGULATES'},
        {'n1': 'Gene_B', 'n2': 'Pathway', 'pol': 'POS', 'rela': 'REGULATES'},
        {'n1': 'Pathway', 'n2': 'Disease', 'pol': 'POS', 'rela': 'IMPLICATED_IN'}
    ]
}

print("Query: Diseases implicated in pathways regulated by Gene_A OR Gene_B")
print("Note: CALL subquery with UNION combines genes, then projects to disease\n")
display_example("up: Union-Projection", instance_up)

### Pattern pu: Projection-Union

In [None]:
instance_pu = {
    'predict_logic': 'pu',
    'atoms': [
        {'n1': 'Gene', 'n2': 'Protein', 'pol': 'POS', 'rela': 'ENCODES'},
        {'n1': 'Protein', 'n2': 'Disease', 'pol': 'POS', 'rela': 'ASSOCIATED_WITH'},
        {'n1': 'Drug', 'n2': 'Disease', 'pol': 'POS', 'rela': 'TREATS'}
    ]
}

print("Query: Diseases associated with gene-encoded proteins OR treated by drugs")
print("Note: Chain path OR direct connection\n")
display_example("pu: Projection-Union", instance_pu)

### Pattern iu: Intersection-Union

In [None]:
instance_iu = {
    'predict_logic': 'iu',
    'atoms': [
        {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Gene_C', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'}
    ]
}

print("Query: Diseases caused by (Gene_A AND Gene_B) OR Gene_C")
print("Note: First two genes intersect, then union with third\n")
display_example("iu: Intersection-Union", instance_iu)

### Pattern ui: Union-Intersection

In [None]:
instance_ui = {
    'predict_logic': 'ui',
    'atoms': [
        {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
        {'n1': 'Gene_C', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'}
    ]
}

print("Query: Diseases caused by (Gene_A OR Gene_B) AND Gene_C")
print("Note: CALL subquery unions first two genes, then intersects with third\n")
display_example("ui: Union-Intersection", instance_ui)

## Batch Processing Example

Process multiple instances at once to generate Cypher queries efficiently.

In [None]:
# Create a batch of instances with different patterns
batch_instances = [
    {
        'predict_logic': '1p',
        'atoms': [{'n1': 'Gene', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'}],
        'retrieval': 'Disease'
    },
    {
        'predict_logic': '2i',
        'atoms': [
            {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
            {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'}
        ]
    },
    {
        'predict_logic': '3i',
        'atoms': [
            {'n1': 'Gene_A', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
            {'n1': 'Gene_B', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'},
            {'n1': 'Gene_C', 'n2': 'Disease', 'pol': 'POS', 'rela': 'CAUSES'}
        ]
    },
    {
        'predict_logic': 'pi',
        'atoms': [
            {'n1': 'Gene', 'n2': 'Pathway', 'pol': 'POS', 'rela': 'ACTIVATES'},
            {'n1': 'Pathway', 'n2': 'Disease', 'pol': 'POS', 'rela': 'LEADS_TO'},
            {'n1': 'Protein', 'n2': 'Disease', 'pol': 'POS', 'rela': 'MARKER_FOR'}
        ]
    }
]

# Generate Cypher for all instances
results = batch_generate_cypher(batch_instances)

# Display results
print("\n" + "="*80)
print("BATCH PROCESSING RESULTS")
print("="*80 + "\n")

for i, instance in enumerate(results, 1):
    print(f"Instance {i}: Pattern {instance['predict_logic']}")
    print("-" * 80)
    print(format_cypher_query(instance['predict_cql']))
    print("\n")

## Query Validation

Validate generated Cypher queries before execution.

In [None]:
# Test various query validations
test_queries = [
    ("MATCH (n:Gene) RETURN n.name", "Valid simple query"),
    ("MATCH (n1:Gene)-[r:CAUSES]->(n2:Disease) RETURN n2.name", "Valid relationship query"),
    ("SELECT * FROM genes", "Invalid (SQL, not Cypher)"),
    ("", "Invalid (empty)"),
    (None, "Invalid (None)")
]

print("Query Validation Tests:")
print("=" * 80)

for query, description in test_queries:
    is_valid = validate_cypher_query(query)
    status = "✓ VALID" if is_valid else "✗ INVALID"
    print(f"\n{status}: {description}")
    if query:
        print(f"Query: {query[:60]}..." if len(str(query)) > 60 else f"Query: {query}")

## Integration Example: Full Pipeline

Complete example showing classification → Cypher generation → validation.

In [None]:
def process_query(atoms, logic, level, description=""):
    """Complete pipeline for processing a query"""
    print(f"\n{'='*80}")
    print(f"Processing: {description}")
    print(f"{'='*80}\n")
    
    # Step 1: Create instance
    instance = {
        'level': level,
        'atoms': atoms,
        'logic': logic
    }
    
    # Step 2: Classify pattern
    pattern = predict_logic_pattern(instance)
    instance['predict_logic'] = pattern
    print(f"Step 1 - Pattern Classification: {pattern}")
    
    # Step 3: Generate Cypher
    query = generate_cypher_query(instance)
    instance['predict_cql'] = query
    print(f"\nStep 2 - Cypher Generation:")
    print("-" * 80)
    print(format_cypher_query(query))
    print("-" * 80)
    
    # Step 4: Validate
    is_valid = validate_cypher_query(query)
    print(f"\nStep 3 - Validation: {'✓ VALID' if is_valid else '✗ INVALID'}")
    
    return instance

# Example 1: Biomedical Query
result1 = process_query(
    atoms=[
        {'n1': 'Gene_BRCA1', 'n2': 'Protein', 'pol': 'POS', 'rela': 'ENCODES'},
        {'n1': 'Protein', 'n2': 'Cancer', 'pol': 'POS', 'rela': 'BIOMARKER_FOR'}
    ],
    logic=['AND'],
    level=3,
    description="Find cancers with biomarkers from BRCA1-encoded proteins"
)

# Example 2: Drug Discovery Query
result2 = process_query(
    atoms=[
        {'n1': 'Target', 'n2': 'Disease', 'pol': 'POS', 'rela': 'IMPLICATED_IN'},
        {'n1': 'Drug', 'n2': 'Disease', 'pol': 'POS', 'rela': 'TREATS'},
        {'n1': 'Disease', 'n2': 'Symptom', 'pol': 'POS', 'rela': 'PRESENTS_WITH'}
    ],
    logic=['AND', 'AND'],
    level=4,
    description="Find symptoms of diseases with known targets AND treatments"
)

## Summary

This notebook demonstrated:

1. **Complete Workflow**: Pattern classification → Cypher generation → Validation
2. **All 19 Patterns**: Examples for each logic pattern type
3. **Batch Processing**: Efficient processing of multiple queries
4. **Query Validation**: Ensuring generated queries are valid
5. **Real-world Examples**: Biomedical and drug discovery use cases

### Key Features

- **Automatic Pattern Detection**: Analyzes instance structure
- **Cypher Generation**: Produces executable Neo4j queries
- **Negative Constraints**: Proper handling of NOT clauses
- **Complex Patterns**: Support for multi-hop chains, unions, intersections

### Next Steps

1. Connect to Neo4j database
2. Execute generated queries
3. Analyze results
4. Integrate into larger reasoning pipelines