# QIAGEN BKB Text2Cypher - Getting Started Guide

This notebook provides a practical introduction to using the QIAGEN BKB Text2Cypher Agent.

## What You'll Learn
- How to connect to Neo4j/Aura DB
- Execute natural language queries
- Understand predefined templates vs text2cypher
- Drug repurposing, target identification, and indication expansion
- Work with results in different formats

## üìã Prerequisites

Before running this notebook:
1. ‚úÖ Install dependencies: `pip install -r requirements.txt`
2. ‚úÖ Configure `.env` file with Neo4j and OpenAI credentials
3. ‚úÖ Test connection: `python test_aura_connection.py`

## üöÄ Step 1: Import and Initialize

In [None]:
# Add parent directory to path
import sys
sys.path.append('..')

# Import the main agent
from src.main import BKBQueryAgent
import json

print("‚úì Imports successful!")

In [None]:
# Initialize the agent
agent = BKBQueryAgent()
print("‚úì BKB Query Agent initialized!")
print("\nReady to query QIAGEN BKB!")

## üíä Step 2: Your First Query - Drug Discovery

Let's start with a simple question about drug targets.

In [None]:
# Ask a simple question
question = "What drugs target EGFR?"

result = agent.query(question)

# Display the answer
print(f"Question: {question}\n")
print(f"Query Type: {result['query_type']}")
print(f"Intent: {result['intent']}")
print(f"Results Found: {result['result_count']}\n")
print(f"Answer:\n{result['answer']}")

### Understanding the Response

- **Query Type**: `template` means it used a predefined optimized query
- **Intent**: The classified category of your question
- **Result Count**: Number of results found
- **Answer**: Natural language synthesis of the results

## üî¨ Step 3: See the Generated Cypher Query

In [None]:
# View the actual Cypher query that was executed
print("Generated Cypher Query:")
print("=" * 60)
print(result['cypher_query'])
print("=" * 60)

## üìä Step 4: Explore Raw Results

In [None]:
# Look at the first 3 raw results
print("Raw Results (first 3):")
print(json.dumps(result['results'][:3], indent=2))

## üîÑ Step 5: Drug Repurposing Query

Now let's try a more complex drug repurposing query.

In [None]:
# Drug repurposing query
question = "Find drugs similar to Imatinib with at least 3 shared targets"

result = agent.query(question)

print(f"üìã Results: {result['result_count']} similar drugs found\n")
print(result['answer'])

# Show which template was used
if 'template_name' in result:
    print(f"\n‚úì Used predefined template: {result['template_name']}")

## üéØ Step 6: Target Identification

Find genes and targets associated with diseases.

In [None]:
# Find genes associated with a disease
result = agent.query("What genes are associated with breast cancer?")

print(f"Found {result['result_count']} genes\n")
print(result['answer'])

# Show top 5 genes
if result['results']:
    print("\nTop 5 Genes:")
    for i, gene_data in enumerate(result['results'][:5], 1):
        gene_symbol = gene_data.get('gene_symbol', 'N/A')
        gene_name = gene_data.get('gene_name', 'N/A')
        print(f"{i}. {gene_symbol} - {gene_name}")

## üîç Step 7: Biomarker Discovery

In [None]:
# Find biomarkers for a disease
result = agent.query("What are biomarkers for lung cancer?")

print(result['answer'])

if result['result_count'] > 0:
    print(f"\nDiscovered {result['result_count']} biomarkers")

## üÜï Step 8: Indication Expansion

Find new therapeutic uses for existing drugs.

In [None]:
# Find new indications for an existing drug
result = agent.query("What new diseases could Metformin treat?")

print(f"Query Type: {result['query_type']}")
print(f"Potential new indications: {result['result_count']}\n")
print(result['answer'])

## ü§ñ Step 9: Text2Cypher for Complex Queries

For queries that don't match predefined templates, the system automatically generates custom Cypher.

In [None]:
# Complex query that will use text2cypher
result = agent.query(
    "Find proteins that interact with TP53 and are involved in apoptosis",
    force_text2cypher=True  # Force text2cypher mode
)

print(f"Query Type: {result['query_type']}")
print(f"\nGenerated Cypher:\n{result.get('cypher_query', 'N/A')}")
print(f"\nAnswer:\n{result['answer']}")

## üì¶ Step 10: Batch Processing

Process multiple questions at once.

In [None]:
# Batch of questions
questions = [
    "What drugs treat Parkinson's disease?",
    "Find genes in the MAPK pathway",
    "Which drugs inhibit VEGF?"
]

print("Processing batch queries...\n")
results = agent.batch_query(questions)

# Display results
for i, (q, r) in enumerate(zip(questions, results), 1):
    print(f"\n{'='*60}")
    print(f"Query {i}: {q}")
    print(f"Results: {r['result_count']}")
    print(f"Answer: {r['answer'][:150]}..." if len(r['answer']) > 150 else f"Answer: {r['answer']}")

## üí° Step 11: Get Template Suggestions

See which templates match your question before executing.

In [None]:
# Get suggestions for a question
suggestions = agent.get_suggestions("Find drugs for cancer treatment")

print(f"Found {len(suggestions)} matching template(s):\n")

for i, sug in enumerate(suggestions, 1):
    print(f"{i}. {sug['name']}")
    print(f"   Description: {sug['description']}")
    print(f"   Intent: {sug['intent']}")
    print(f"   Example: {sug['example_question']}\n")

## üìÑ Step 12: Different Output Formats

Get results in different formats: natural language, JSON, or table.

In [None]:
question = "Find top 5 drugs that target BCR-ABL"

# Natural language (default)
result_natural = agent.query(question, format="natural")
print("Natural Language Output:")
print("=" * 60)
print(result_natural['answer'])
print()

In [None]:
# JSON format
result_json = agent.query(question, format="json")
print("JSON Output (first 2 results):")
print("=" * 60)
print(json.dumps(result_json['results'][:2], indent=2))
print()

In [None]:
# Table format
result_table = agent.query(question, format="table")
print("Table Output:")
print("=" * 60)
print(result_table.get('formatted_results', 'No table data'))

## üéì Step 13: Understanding Query Routing

Let's peek under the hood to see how intent classification works.

In [None]:
from src.agents import get_intent_classifier

classifier = get_intent_classifier()

# Test different queries
test_queries = [
    "What drugs target BRAF?",
    "Find similar compounds to Aspirin",
    "Could Imatinib be used for new indications?",
    "What are biomarkers for prostate cancer?",
]

print("Intent Classification Results:")
print("=" * 60)

for query in test_queries:
    intent = classifier.classify(query)
    matches = classifier.find_matching_templates(query)
    
    print(f"\nQuery: {query}")
    print(f"Intent: {intent}")
    print(f"Matching Templates: {len(matches)}")
    print(f"Route: {'Predefined Template' if matches else 'Text2Cypher'}")

## üîß Step 14: Explore Available Templates

In [None]:
from src.templates import get_all_templates

templates = get_all_templates()

print(f"Total Available Templates: {len(templates)}\n")

# Group by intent
by_intent = {}
for t in templates:
    if t.intent not in by_intent:
        by_intent[t.intent] = []
    by_intent[t.intent].append(t)

# Display by category
for intent, tmps in by_intent.items():
    print(f"\nüìÅ {intent.upper().replace('_', ' ')} ({len(tmps)} templates):")
    for t in tmps:
        print(f"   ‚Ä¢ {t.name}")
        print(f"     ‚Üí {t.description}")

## üéØ Practice Exercises

Try these queries on your own:

In [None]:
# Exercise 1: Drug Repurposing
# Find drugs that could be repurposed for Alzheimer's disease
result = agent.query("Find drugs that could be repurposing for Alzheimer's disease")
print(result['answer'])

In [None]:
# Exercise 2: Pathway Analysis  
# Find druggable targets in a specific pathway
result = agent.query("Find druggable targets in the PI3K/AKT pathway")
print(result['answer'])

In [None]:
# Exercise 3: Novel Target Discovery
# Find undrugged targets for a disease
result = agent.query("Find novel undrugged targets for diabetes")
print(result['answer'])

In [None]:
# Exercise 4: Your own query!
# Write your own question here
your_question = ""  # Add your question here

if your_question:
    result = agent.query(your_question)
    print(f"Query Type: {result['query_type']}")
    print(f"Results: {result['result_count']}\n")
    print(result['answer'])
else:
    print("Add your question to the 'your_question' variable above!")

## üìö Summary

You've learned:

‚úÖ How to initialize the BKB Query Agent  
‚úÖ Execute natural language queries  
‚úÖ Understand predefined templates vs text2cypher  
‚úÖ Drug repurposing workflows  
‚úÖ Target identification and biomarker discovery  
‚úÖ Indication expansion  
‚úÖ Batch processing  
‚úÖ Different output formats  
‚úÖ Intent classification and routing  

## üöÄ Next Steps

1. Explore the interactive CLI: `python -m src.main`
2. Check out `example_queries.ipynb` for more advanced examples
3. Read the full documentation in `README.md`
4. Add your own custom templates in `src/templates/`

## üí¨ Need Help?

- View available templates: Use `agent.get_suggestions(your_question)`
- Force text2cypher: Add `force_text2cypher=True` to any query
- Debug mode: Set `LOG_LEVEL=DEBUG` in `.env`

Happy querying! üéâ