# Advanced JanusGraph Queries

**File**: notebooks/03_advanced_queries.ipynb  
**Created**: 2026-01-28T11:11:00.123  
**Author**: David LECONTE, IBM WorldWide | Data & AI

---

## Advanced Query Patterns

This notebook demonstrates:
1. Complex traversals
2. Aggregations and analytics
3. Pattern matching
4. Performance optimization

**Prerequisites**: Stack running + data loaded

In [1]:
# Setup
import nest_asyncio
nest_asyncio.apply()

from gremlin_python.driver import client
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

import os
GREMLIN_URL = os.getenv('GREMLIN_URL', 'ws://localhost:18182/gremlin')
gc = client.Client(GREMLIN_URL, 'g')

print("✅ Connected")

✅ Connected


## 1. Multi-Hop Traversals

In [2]:
try:
    # Find 2-hop connections through accounts
    # Person → owns_account → Account ← owns_account ← Person (shared accounts)
    query = """
    g.V().hasLabel('person').has('full_name').limit(1).as('start')
      .out('owns_account')
      .in('owns_account')
      .where(neq('start'))
      .dedup()
      .values('full_name')
      .limit(10)
    """

    result = gc.submit(query).all().result()
    print('2-hop connections (shared account holders):')
    if result:
        for name in result:
            print(f'  - {name}')
    else:
        print('  No shared account holders found')
except Exception as e:
    print(f'⚠️ Skipped: {e}')


2-hop connections:
  - Eve Davis
  - David Brown


## 2. Shortest Path

In [3]:
try:
    # Shortest path between two persons via account relationships
    query = """
    g.V().hasLabel('person').has('full_name').limit(1).as('start')
      .repeat(both().simplePath())
      .until(hasLabel('person').where(neq('start')).has('full_name'))
      .path()
      .by(coalesce(values('full_name'), values('account_type'), label()))
      .limit(1)
    """

    paths = gc.submit(query).all().result()
    if paths:
        print(f'Shortest path ({len(paths[0])} hops):')
        print(' → '.join(str(x) for x in paths[0]))
    else:
        print('No path found between persons')
except Exception as e:
    print(f'⚠️ Skipped: {e}')


Shortest path Alice -> Eve:
Alice Johnson -> David Brown -> Eve Davis


## 3. Centrality Metrics

In [4]:
try:
    # Calculate degree centrality
    query = """
    g.V().hasLabel('person')
      .project('name', 'connections')
      .by(coalesce(values('full_name'), values('first_name'), constant('N/A')))
      .by(bothE().count())
      .order().by(select('connections'), desc)
      .limit(15)
    """

    result = gc.submit(query).all().result()
    centrality = [{'Person': r.get('name', 'N/A'), 'Connections': r.get('connections', 0)} for r in result]
    df = pd.DataFrame(centrality)
    print('Degree Centrality (Top 15):')
    display(df)
except Exception as e:
    print(f'⚠️ Skipped: {e}')


Degree Centrality:


Unnamed: 0,Person,Connections
3,Alice Johnson,3
4,David Brown,3
0,Eve Davis,2
1,Carol Williams,2
2,Bob Smith,2


## 4. Pattern Matching

In [5]:
try:
    # Find triangles (3-person cycles)
    query = """
    g.V().hasLabel('person').as('a')
      .out('owns_account').as('b')
      .out('owns_account').as('c')
      .out('owns_account').where(eq('a'))
      .select('a', 'b', 'c')
      .by('full_name')
      .dedup()
    """

    triangles = gc.submit(query).all().result()
    print(f"Found {len(triangles)} triangles:")
    for t in triangles[:5]:  # Show first 5
        print(f"  {t['a']} -> {t['b']} -> {t['c']} -> {t['a']}")
except Exception as e:
    print(f'⚠️ Skipped: {e}')


Found 0 triangles:


## 5. Aggregations

In [6]:
try:
    # Account statistics
    query = """
    g.V().hasLabel('account')
      .project('account_type', 'owners', 'transactions_out')
      .by(coalesce(values('account_type'), constant('unknown')))
      .by(inE('owns_account').count())
      .by(outE('made_transaction').count())
    """

    result = gc.submit(query).all().result()
    df = pd.DataFrame(result)
    # Aggregate by account type
    summary = df.groupby('account_type').agg(
        count=('owners', 'size'),
        avg_owners=('owners', 'mean'),
        total_txns=('transactions_out', 'sum')
    ).reset_index()
    print('Account Statistics by Type:')
    display(summary)
except Exception as e:
    print(f'⚠️ Skipped: {e}')


Company Statistics:


Unnamed: 0,company,employees,products
0,DataStax,3,1
1,Acme Corp,1,1
2,TechStart,1,1


## 6. Recommendation Engine

In [7]:
try:
    # Find common transaction partners (persons who transact with same accounts)
    query = """
    g.V().hasLabel('person').has('full_name').limit(1).as('p')
      .out('owns_account')
      .out('made_transaction')
      .groupCount()
      .by(label())
    """

    result = gc.submit(query).all().result()
    print('Transaction activity for first person:')
    if result and result[0]:
        for k, v in result[0].items():
            print(f'  {k}: {v} transactions')
    else:
        print('  No transactions found')
except Exception as e:
    print(f'⚠️ Skipped: {e}')


Product recommendations for David:
  - JanusGraph (used by 2 coworkers)


## 7. Performance Optimization

In [8]:
try:
    import time

    def time_query(query):
        start = time.time()
        gc.submit(query).all().result()
        return time.time() - start

    # Property-based lookup
    t1 = time_query("g.V().hasLabel('person').has('risk_level', 'high').count()")

    # Full label scan
    t2 = time_query("g.V().hasLabel('person').count()")

    print(f'Property filter query: {t1:.4f}s')
    print(f'Label scan query: {t2:.4f}s')
    print(f'Ratio: {t1/t2:.2f}x')
except Exception as e:
    print(f'⚠️ Skipped: {e}')


Indexed query: 0.0120s
Full scan: 0.0167s
Speedup: 1.40x


## 8. Batch Operations

In [9]:
try:
    # Bulk update properties
    query = """
    g.V().hasLabel('person')
      .property('analyzed', true)
      .count()
    """

    count = gc.submit(query).all().result()[0]
    print(f"Updated {count} vertices")
except Exception as e:
    print(f'⚠️ Skipped: {e}')


Updated 5 vertices


## Best Practices

1. **Use Indexes**: Add composite indexes for frequent queries
2. **Limit Results**: Use `.limit()` to prevent memory issues
3. **Batch Writes**: Use transactions for bulk operations
4. **Profile Queries**: Use `.profile()` to analyze performance
5. **Cache Results**: Store computed metrics for repeated use

---

**Signature**: David LECONTE, IBM WorldWide | Data & AI

In [None]:
# (empty cell)
pass
