# Getting Started with SSIS Northwind Graph in Memgraph

This tutorial will help you explore the SSIS Northwind graph that was created by metazcode and stored in Memgraph.

## What We'll Learn
- Connect to Memgraph and verify the SSIS Northwind data
- Understand the graph structure (nodes and relationships)
- Basic Cypher queries for exploration
- Explore the analytics-ready features (materialized views, metadata)

## Prerequisites
- SSIS Northwind data already analyzed and stored in Memgraph
- Memgraph database running (docker-compose up -d)
- Python environment with required packages

## Step 1: Install Required Packages

In [None]:
# Install required packages if not already installed
!pip install mgclient pandas matplotlib seaborn jupyter

## Step 2: Connect to Memgraph

In [None]:
import mgclient
import pandas as pd
import json
from datetime import datetime
#mzcode-memgraph
# Connect to Memgraph
def connect_to_memgraph():
    """Connect to Memgraph database."""
    try:
        connection = mgclient.connect(
            host='localhost',
            port=7687,
            username='',
            password=''
        )
        print("✅ Connected to Memgraph successfully!")
        return connection
    except Exception as e:
        print(f"❌ Failed to connect to Memgraph: {e}")
        print("Make sure Memgraph is running: docker-compose up -d")
        return None

# Create connection
mg = connect_to_memgraph()

## Step 3: Helper Functions

In [None]:
def execute_query(query, description=None):
    """Execute a Cypher query and return results as DataFrame."""
    if description:
        print(f"\n🔍 {description}")
        print(f"Query: {query}")
        print("-" * 50)
    
    try:
        cursor = mg.cursor()
        cursor.execute(query)
        results = cursor.fetchall()
        
        if results:
            # Get column names
            columns = [desc.name for desc in cursor.description] if cursor.description else ['result']
            # Create DataFrame
            df = pd.DataFrame(results, columns=columns)
            print(f"Found {len(df)} results:")
            return df
        else:
            print("No results found.")
            return pd.DataFrame()
            
    except Exception as e:
        print(f"❌ Query failed: {e}")
        return pd.DataFrame()

def pretty_print_json(data, max_length=500):
    """Pretty print JSON data with length limit."""
    if isinstance(data, str):
        try:
            data = json.loads(data)
        except:
            pass
    
    json_str = json.dumps(data, indent=2)
    if len(json_str) > max_length:
        json_str = json_str[:max_length] + "\\n...truncated..."
    print(json_str)

## Step 4: Verify SSIS Northwind Data

In [None]:
# Check if we have data in the database
overview_df = execute_query(
    "MATCH (n) RETURN count(n) as total_nodes",
    "Checking total nodes in database"
)
display(overview_df)

In [None]:
# Check for edges/relationships
edges_df = execute_query(
    "MATCH ()-[r]->() RETURN count(r) as total_edges",
    "Checking total edges in database"
)
display(edges_df)

In [None]:
# Check node types to confirm we have SSIS data
node_types_df = execute_query(
    "MATCH (n) RETURN n.node_type as node_type, count(n) as count ORDER BY count DESC",
    "Analyzing node types in the SSIS Northwind graph"
)
display(node_types_df)

## Step 5: Check Analytics-Ready Features

Let's see if the analytics-ready optimization was applied to our graph.

In [None]:
# Check for graph metadata (analytics-ready indicator)
metadata_df = execute_query(
    "MATCH (m:Node {node_type: 'graph_metadata'}) RETURN m.name, m.id, m.properties",
    "Checking for analytics-ready metadata"
)

if not metadata_df.empty:
    print("\n📊 Graph Metadata Found:")
    for _, row in metadata_df.iterrows():
        print(f"Name: {row['m.name']}")
        print(f"ID: {row['m.id']}")
        print("Properties:")
        pretty_print_json(row['m.properties'])
else:
    print("❌ No analytics-ready metadata found. Graph may not be optimized.")

display(metadata_df)

In [None]:
# Check for materialized views
views_df = execute_query(
    "MATCH (v:Node {node_type: 'materialized_view'}) RETURN v.name, v.id, JSON_EXTRACT(v.properties, '$.record_count') as record_count",
    "Checking for materialized views (analytics-ready features)"
)

if not views_df.empty:
    print(f"\n🎉 Found {len(views_df)} materialized views! This graph is analytics-ready.")
    print("\nAvailable views:")
    for _, row in views_df.iterrows():
        print(f"  • {row['v.name']} ({row['record_count']} records)")
else:
    print("❌ No materialized views found. Analytics-ready optimization may not have been applied.")

display(views_df)

## Step 6: Basic Graph Exploration

In [None]:
# Look at SSIS packages (pipelines)
pipelines_df = execute_query(
    "MATCH (p:Node {node_type: 'pipeline'}) RETURN p.name as package_name, p.id as package_id LIMIT 10",
    "Exploring SSIS packages in Northwind"
)
display(pipelines_df)

In [None]:
# Look at operations within packages
operations_df = execute_query(
    "MATCH (op:Node {node_type: 'operation'}) RETURN op.name as operation_name, op.id as operation_id LIMIT 10",
    "Exploring SSIS operations in Northwind"
)
display(operations_df)

In [None]:
# Look at tables/data assets
tables_df = execute_query(
    "MATCH (t:Node {node_type: 'table'}) RETURN t.name as table_name, t.id as table_id LIMIT 10",
    "Exploring tables/data assets in Northwind"
)
display(tables_df)

## Step 7: Understanding Relationships

In [None]:
# Check what types of relationships exist
relationships_df = execute_query(
    "MATCH ()-[r]->() RETURN DISTINCT type(r) as relationship_type, count(r) as count ORDER BY count DESC",
    "Analyzing relationship types in the graph"
)
display(relationships_df)

In [None]:
# Look at some actual relationships
sample_relationships_df = execute_query(
    """MATCH (a)-[r]->(b) 
       RETURN a.name as source, type(r) as relationship, b.name as target, 
              a.node_type as source_type, b.node_type as target_type 
       LIMIT 15""",
    "Sample relationships in the SSIS Northwind graph"
)
display(sample_relationships_df)

## Step 8: Summary and Next Steps

Let's create a summary of what we discovered about the SSIS Northwind graph.

In [None]:
print("📊 SSIS Northwind Graph Summary")
print("=" * 40)

# Get totals
total_nodes = execute_query("MATCH (n) RETURN count(n) as count").iloc[0]['count'] if not execute_query("MATCH (n) RETURN count(n) as count").empty else 0
total_edges = execute_query("MATCH ()-[r]->() RETURN count(r) as count").iloc[0]['count'] if not execute_query("MATCH ()-[r]->() RETURN count(r) as count").empty else 0

print(f"Total Nodes: {total_nodes}")
print(f"Total Edges: {total_edges}")

# Check for analytics features
has_metadata = not execute_query("MATCH (m:Node {node_type: 'graph_metadata'}) RETURN count(m) as count").empty and execute_query("MATCH (m:Node {node_type: 'graph_metadata'}) RETURN count(m) as count").iloc[0]['count'] > 0
view_count = execute_query("MATCH (v:Node {node_type: 'materialized_view'}) RETURN count(v) as count").iloc[0]['count'] if not execute_query("MATCH (v:Node {node_type: 'materialized_view'}) RETURN count(v) as count").empty else 0

print(f"\n🚀 Analytics-Ready Features:")
print(f"Graph Metadata: {'✅ Yes' if has_metadata else '❌ No'}")
print(f"Materialized Views: {view_count}")

print(f"\n📚 What's Next:")
print("• Open notebook 02_exploring_ssis_structure.ipynb to dive deeper into SSIS components")
print("• Open notebook 03_analytics_ready_features.ipynb to explore materialized views")
print("• Open notebook 04_advanced_queries.ipynb for complex analysis patterns")
print("• Open notebook 05_migration_analysis.ipynb to see practical migration use cases")

## Troubleshooting

If you're not seeing the expected data:

1. **Make sure Memgraph is running:**  
   ```
   docker ps | grep memgraph
   ```

2. **Verify SSIS Northwind was analyzed with Memgraph backend:**  
   ```
   cd /path/to/metazcode
   METAZCODE_DB_BACKEND=memgraph python -m metazcode full --path data/ssis/ssis_northwind
   ```

3. **Check connection settings:** Make sure you're connecting to the right host/port

4. **If nodes are 0:** The database might be empty - re-run the metazcode analysis