# Advanced JanusGraph Queries

**File**: notebooks/03_advanced_queries.ipynb  
**Created**: 2026-01-28T11:11:00.123  
**Author**: David LECONTE, IBM WorldWide | Data & AI

---

## Advanced Query Patterns

This notebook demonstrates:
1. Complex traversals
2. Aggregations and analytics
3. Pattern matching
4. Performance optimization

**Prerequisites**: Stack running + data loaded

In [None]:
# Setup
import nest_asyncio
nest_asyncio.apply()

from gremlin_python.driver import client
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

GREMLIN_URL = 'ws://janusgraph-server:8182/gremlin'
gc = client.Client(GREMLIN_URL, 'g')

print("âœ… Connected")

## 1. Multi-Hop Traversals

In [None]:
# Find people connected through 2 hops
query = """
g.V().has('person', 'name', 'Alice Johnson').as('start')
  .out('knows').out('knows')
  .where(neq('start'))
  .dedup()
  .values('name')
"""

result = gc.submit(query).all().result()
print("2-hop connections:")
for name in result:
    print(f"  - {name}")

## 2. Shortest Path

In [None]:
# Shortest path between two people
query = """
g.V().has('person', 'name', 'Alice Johnson')
  .repeat(both().simplePath())
  .until(has('person', 'name', 'Eve Davis'))
  .path()
  .by('name')
  .limit(1)
"""

path = gc.submit(query).all().result()
print("Shortest path Alice -> Eve:")
print(" -> ".join(path[0]) if path else "No path found")

## 3. Centrality Metrics

In [None]:
# Calculate degree centrality
query = """
g.V().hasLabel('person')
  .project('name', 'connections')
  .by('name')
  .by(bothE('knows').count())
"""

result = gc.submit(query).all().result()
centrality = [{'Person': r['name'], 'Connections': r['connections']} for r in result]
df = pd.DataFrame(centrality).sort_values('Connections', ascending=False)
print("Degree Centrality:")
display(df)

## 4. Pattern Matching

In [None]:
# Find triangles (3-person cycles)
query = """
g.V().hasLabel('person').as('a')
  .out('knows').as('b')
  .out('knows').as('c')
  .out('knows').where(eq('a'))
  .select('a', 'b', 'c')
  .by('name')
  .dedup()
"""

triangles = gc.submit(query).all().result()
print(f"Found {len(triangles)} triangles:")
for t in triangles[:5]:  # Show first 5
    print(f"  {t['a']} -> {t['b']} -> {t['c']} -> {t['a']}")

## 5. Aggregations

In [None]:
# Complex aggregation
query = """
g.V().hasLabel('company')
  .project('company', 'employees', 'products')
  .by('name')
  .by(in('worksFor').count())
  .by(out('created').count())
"""

result = gc.submit(query).all().result()
df = pd.DataFrame(result)
print("Company Statistics:")
display(df)

## 6. Recommendation Engine

In [None]:
# Recommend products based on coworkers
query = """
g.V().has('person', 'name', 'David Brown').as('person')
  .out('worksFor')
  .in('worksFor')
  .where(neq('person'))
  .out('uses')
  .groupCount()
  .by('name')
  .order(local).by(values, desc)
"""

recs = gc.submit(query).all().result()[0]
print("Product recommendations for David:")
for product, count in list(recs.items())[:3]:
    print(f"  - {product} (used by {count} coworkers)")

## 7. Performance Optimization

In [None]:
import time

# Compare indexed vs non-indexed query
def time_query(query):
    start = time.time()
    gc.submit(query).all().result()
    return time.time() - start

# Indexed query (uses composite index on name)
indexed = "g.V().has('person', 'name', 'Alice Johnson').count()"
t1 = time_query(indexed)

# Full scan
scan = "g.V().hasLabel('person').count()"
t2 = time_query(scan)

print(f"Indexed query: {t1:.4f}s")
print(f"Full scan: {t2:.4f}s")
print(f"Speedup: {t2/t1:.2f}x")

## 8. Batch Operations

In [None]:
# Bulk update properties
query = """
g.V().hasLabel('person')
  .property('analyzed', true)
  .count()
"""

count = gc.submit(query).all().result()[0]
print(f"Updated {count} vertices")

## Best Practices

1. **Use Indexes**: Add composite indexes for frequent queries
2. **Limit Results**: Use `.limit()` to prevent memory issues
3. **Batch Writes**: Use transactions for bulk operations
4. **Profile Queries**: Use `.profile()` to analyze performance
5. **Cache Results**: Store computed metrics for repeated use

---

**Signature**: David LECONTE, IBM WorldWide | Data & AI