# PEDP Climate & Environmental Data Initiatives Network Map v2.0

This notebook creates an interactive network visualization of 19 climate and environmental data initiatives and their relationships.

**New in v2.0:**
- **19 nodes** (added 3 PEDP members: OEDP, EPIC, EDGI)
- **3 relationship types** with directional arrows:
  - üü£ Purple: "is a member of" (member ‚Üí parent)
  - üü¢ Green: "funds" (funder ‚Üí recipient)
  - üîµ Blue: "coordinates action with" (bidirectional)
- **Thicker edges** (2-3px) for better visibility
- **Directed graph** showing organizational hierarchy

**Features:**
- Interactive node exploration with drag, zoom, and pan
- Color-coded by initiative category
- Node size proportional to degree centrality
- Centrality metrics to identify key hubs
- Rich tooltips with organization details
- Styled edges showing relationship types


## Section 1: Setup

In [1]:
import networkx as nx
import pandas as pd
from pyvis.network import Network
import warnings
warnings.filterwarnings('ignore')

# Edge styling by relationship type (visualization config)
EDGE_STYLES = {
    'is a member of': {'color': '#8e44ad', 'width': 2.5, 'arrows': 'to'},
    'funds': {'color': '#27ae60', 'width': 3, 'arrows': 'to'},
    'coordinates action with': {'color': '#3498db', 'width': 2, 'arrows': 'to;from'}
}

print("‚úì Libraries imported successfully")
print(f"‚úì NetworkX version: {nx.__version__}")
print(f"‚úì Pandas version: {pd.__version__}")

‚úì Libraries imported successfully
‚úì NetworkX version: 3.5
‚úì Pandas version: 2.2.2


## Section 2: Load Data

In [2]:
# Load node and edge data
nodes_df = pd.read_csv('../data/processed/nodes.csv')
edges_df = pd.read_csv('../data/processed/edges.csv')
positions_df = pd.read_csv('../data/processed/node_positions.csv')

# Load color config and create mapping
colors_df = pd.read_csv('../data/processed/colors.csv')
color_map = dict(zip(colors_df['name'], colors_df['hex']))

# Map color names to hex codes
nodes_df['hex_color'] = nodes_df['color'].map(color_map)

# Create positions mapping for quick lookup
positions_map = {row['id']: {'x': row['x'], 'y': row['y'], 'fixed': row['fixed']}
                 for _, row in positions_df.iterrows()}

# Display summary
print(f"Nodes: {len(nodes_df)}, Edges: {len(edges_df)}, Positions: {len(positions_df)}")
print(f"\nNode data shape: {nodes_df.shape}")
print(f"Edge data shape: {edges_df.shape}")

print("\n=== Color Palette ===")
for name, hex_code in color_map.items():
    print(f"{name:10s} ‚Üí {hex_code}")

print("\n=== Sample Nodes ===")
display(nodes_df[['id', 'name', 'category', 'color']].head())

print("\n=== Sample Edges ===")
display(edges_df.head(10))

print("\n=== Category Distribution ===")
print(nodes_df['category'].value_counts())

Nodes: 78, Edges: 55, Positions: 78

Node data shape: (78, 11)
Edge data shape: (55, 3)

=== Color Palette ===
red        ‚Üí #e74c3c
green      ‚Üí #2ecc71
blue       ‚Üí #3498db
orange     ‚Üí #f39c12
purple     ‚Üí #9b59b6
teal       ‚Üí #1abc9c

=== Sample Nodes ===


Unnamed: 0,id,name,category,color
0,AGU,American Geophysical Union,Data Coordination/Standards,red
1,CDAN,Climate-Ocean Data Action Network,Data Coordination/Standards,red
2,DataFoundation,Data Foundation - Climate Data Collaborative &...,Data Coordination/Standards,red
3,GRQD,Group on Reference Quality Datasets,Data Coordination/Standards,red
4,KCF,Keeling Curve Foundation,Data Preservation/Archiving,blue



=== Sample Edges ===


Unnamed: 0,source,target,relationship_type
0,ImpactProject,PEDP,is a member of
1,OEDP,PEDP,is a member of
2,EPIC,PEDP,is a member of
3,EDGI,PEDP,is a member of
4,DataFoundation,Cornerstone,funds
5,DataFoundation,GRQD,funds
6,DataFoundation,CDAN,funds
7,DataFoundation,KCF,funds
8,DataFoundation,ImpactProject,funds
9,PEDP,AGU,coordinates action with



=== Category Distribution ===
category
Funder                         36
Capacity Building/Support      14
Data Coordination/Standards    12
Data Preservation/Archiving     7
Research/Academic               3
Government/Agency               3
Data Collection/Monitoring      1
Communication/Access            1
Advocacy/Community Focus        1
Name: count, dtype: int64


## Section 3: Build Network

In [3]:
# Create directed graph
G = nx.DiGraph()

# Add nodes with attributes
for idx, row in nodes_df.iterrows():
    G.add_node(
        row['id'],
        name=row['name'],
        organization=row['organization'],
        category=row['category'],
        description=row['description'],
        status=row['status'],
        timeline=row['timeline']
    )

# Add edges with relationship type attribute
for idx, row in edges_df.iterrows():
    G.add_edge(
        row['source'],
        row['target'],
        relationship_type=row['relationship_type']
    )

# Network statistics
print("=== Network Statistics ===")
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.3f}")
print(f"Graph type: {'Directed' if nx.is_directed(G) else 'Undirected'}")

# Show relationship type distribution
print("\n=== Relationship Type Distribution ===")
print(edges_df['relationship_type'].value_counts())

# Convert to undirected for connectivity check
G_undirected = G.to_undirected()
print(f"\nConnected: {nx.is_connected(G_undirected)}")

if not nx.is_connected(G_undirected):
    print(f"\nNumber of connected components: {nx.number_connected_components(G_undirected)}")
    components = list(nx.connected_components(G_undirected))
    print("Component sizes:", [len(c) for c in components])

# Degree distribution (on undirected version for comparability)
degrees = dict(G_undirected.degree())
print(f"\nAverage degree: {sum(degrees.values()) / len(degrees):.2f}")
print(f"Max degree: {max(degrees.values())}")
print(f"Min degree: {min(degrees.values())}")


=== Network Statistics ===
Nodes: 78
Edges: 55
Density: 0.009
Graph type: Directed

=== Relationship Type Distribution ===
relationship_type
coordinates action with    33
funds                      18
is a member of              4
Name: count, dtype: int64

Connected: False

Number of connected components: 47
Component sizes: [32, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Average degree: 1.41
Max degree: 27
Min degree: 0


## Section 4: Calculate Centrality Metrics

In [4]:
# Convert directed graph to undirected for centrality calculations
G_undirected = G.to_undirected()

# Create a filtered graph for node sizing (exclude funder network edges)
# This keeps PEDP as the visual focus and prevents funder cluster from dominating
G_filtered = nx.DiGraph()
G_filtered.add_nodes_from(G.nodes(data=True))

# Add only "meaningful" edges (exclude "Interested in solving the problem")
for source, target, data in G.edges(data=True):
    if data['relationship_type'] != 'Interested in solving the problem':
        G_filtered.add_edge(source, target, **data)

G_filtered_undirected = G_filtered.to_undirected()

# Calculate centrality metrics on FILTERED graph for node sizing
# This ensures PEDP remains the largest, most central node
degree_centrality_for_sizing = nx.degree_centrality(G_filtered_undirected)

# Calculate other centrality metrics on full graph for statistics
degree_centrality = nx.degree_centrality(G_undirected)
betweenness_centrality = nx.betweenness_centrality(G_undirected)
closeness_centrality = nx.closeness_centrality(G_undirected)

# Create summary DataFrame
centrality_data = []
for node in G.nodes():
    node_name = nodes_df[nodes_df['id'] == node]['name'].values[0]
    centrality_data.append({
        'ID': node,
        'Node': node_name,
        'In-degree': G.in_degree(node),
        'Out-degree': G.out_degree(node),
        'Connections': G_undirected.degree(node),
        'Meaningful_Connections': G_filtered_undirected.degree(node),  # For visual sizing
        'Degree': degree_centrality[node],
        'Betweenness': betweenness_centrality[node],
        'Closeness': closeness_centrality[node]
    })

centrality_df = pd.DataFrame(centrality_data).sort_values('Degree', ascending=False)

print("=== Centrality Metrics ===")
print("\nTop 10 Most Connected Initiatives (by Degree Centrality):")
display(centrality_df.head(10))

print("\nTop 10 by Meaningful Connections (excluding funder network edges):")
display(centrality_df.sort_values('Meaningful_Connections', ascending=False)[['Node', 'Meaningful_Connections', 'Connections']].head(10))

print("\nTop 5 Bridge Nodes (by Betweenness Centrality):")
display(centrality_df.sort_values('Betweenness', ascending=False)[['Node', 'Connections', 'Betweenness']].head())

print("\nTop 5 Information Spreaders (by Closeness Centrality):")
display(centrality_df.sort_values('Closeness', ascending=False)[['Node', 'Connections', 'Closeness']].head())

=== Centrality Metrics ===

Top 10 Most Connected Initiatives (by Degree Centrality):


Unnamed: 0,ID,Node,In-degree,Out-degree,Connections,Meaningful_Connections,Degree,Betweenness,Closeness
15,PEDP,Public Environmental Data Partners,17,10,27,27,0.350649,0.135105,0.356586
2,DataFoundation,Data Foundation - Climate Data Collaborative &...,1,8,9,9,0.116883,0.021411,0.235481
5,NASEM,NASEM - Earth Observations & Data Stewardship ...,3,5,8,8,0.103896,0.01342,0.231121
12,CODE,CODE - Center for Open Data Enterprise,4,2,6,6,0.077922,0.002147,0.218956
0,AGU,American Geophysical Union,5,1,6,6,0.077922,0.003868,0.222866
3,GRQD,Group on Reference Quality Datasets,5,0,5,5,0.064935,0.001025,0.164217
9,NYCE,New York Climate Exchange,2,2,4,4,0.051948,0.000797,0.215181
10,DRP,The Data Rescue Project,2,2,4,4,0.051948,0.000342,0.201299
4,KCF,Keeling Curve Foundation,2,1,3,3,0.038961,0.0,0.157981
17,EPIC,Environmental Policy Innovation Center,0,3,3,3,0.038961,0.0,0.208009



Top 10 by Meaningful Connections (excluding funder network edges):


Unnamed: 0,Node,Meaningful_Connections,Connections
15,Public Environmental Data Partners,27,27
2,Data Foundation - Climate Data Collaborative &...,9,9
5,NASEM - Earth Observations & Data Stewardship ...,8,8
12,CODE - Center for Open Data Enterprise,6,6
0,American Geophysical Union,6,6
3,Group on Reference Quality Datasets,5,5
9,New York Climate Exchange,4,4
10,The Data Rescue Project,4,4
18,Environmental Data Governance Initiative,3,3
8,Environmental & Health Data & Analysis Trust,3,3



Top 5 Bridge Nodes (by Betweenness Centrality):


Unnamed: 0,Node,Connections,Betweenness
15,Public Environmental Data Partners,27,0.135105
2,Data Foundation - Climate Data Collaborative &...,9,0.021411
5,NASEM - Earth Observations & Data Stewardship ...,8,0.01342
0,American Geophysical Union,6,0.003868
12,CODE - Center for Open Data Enterprise,6,0.002147



Top 5 Information Spreaders (by Closeness Centrality):


Unnamed: 0,Node,Connections,Closeness
15,Public Environmental Data Partners,27,0.356586
2,Data Foundation - Climate Data Collaborative &...,9,0.235481
5,NASEM - Earth Observations & Data Stewardship ...,8,0.231121
0,American Geophysical Union,6,0.222866
12,CODE - Center for Open Data Enterprise,6,0.218956


## Section 5: Create Interactive Visualization

In [5]:
# Initialize PyVis network with directed mode enabled
net = Network(
    height='800px',
    width='100%',
    bgcolor='#ffffff',
    font_color='#333333',
    notebook=True,
    directed=True
)

# Configure physics - SPATIAL LAYOUT: maintain position-based clustering
# Lower gravity/central_gravity allows initial positions to dominate
# Stronger spring strength keeps connected nodes together despite spatial bias
net.barnes_hut(
    gravity=-3000,          # REDUCED: Less repulsion to maintain clusters
    central_gravity=0.1,    # REDUCED: Allow positions to dominate over centering
    spring_length=150,      # REDUCED: Tighter edge connections
    spring_strength=0.01,   # INCREASED: Stronger edge pull (10x stronger)
    damping=0.2,            # INCREASED: Faster settling
    overlap=0
)

# Add nodes with styling from CSV + color config + POSITIONS
for node in G.nodes():
    node_data = nodes_df[nodes_df['id'] == node].iloc[0]
    
    # Use hex color from mapped config
    color = node_data['hex_color']
    
    # SIZE STRATEGY:
    # - Funders: fixed small size (20px) - uniform sizing
    # - Non-funders: sized by meaningful connections (excludes funder network edges)
    if node_data['category'] == 'Funder':
        size = 20  # Fixed small size for all funders
    else:
        # Size by FILTERED degree centrality for non-funders
        size = 15 + (degree_centrality_for_sizing[node] * 200)
    
    # Build connection-focused tooltip
    tooltip_lines = [node_data['name'], f"Category: {node_data['category']}", ""]
    
    # Check if node is isolated (no edges)
    node_degree = G_undirected.degree(node)
    
    if node_degree == 0:
        # Isolated node - add contextual status
        if node_data['category'] == 'Funder':
            tooltip_lines.append("Interested in working in this space")
        else:
            tooltip_lines.append("Actively working in this space")
    else:
        # Connected node - organize connections by type and direction
        connections = {
            'member_of': [],
            'has_members': [],
            'funds': [],
            'funded_by': [],
            'coordinates': []
        }
        
        # Process outgoing edges (this node ‚Üí others)
        for _, target, edge_data in G.out_edges(node, data=True):
            rel_type = edge_data['relationship_type']
            target_name = G.nodes[target]['name']
            
            if rel_type == "is a member of":
                connections['member_of'].append(target_name)
            elif rel_type == "funds":
                connections['funds'].append(target_name)
            elif rel_type == "coordinates action with":
                connections['coordinates'].append(target_name)
        
        # Process incoming edges (others ‚Üí this node)
        for source, _, edge_data in G.in_edges(node, data=True):
            rel_type = edge_data['relationship_type']
            source_name = G.nodes[source]['name']
            
            if rel_type == "is a member of":
                connections['has_members'].append(source_name)
            elif rel_type == "funds":
                connections['funded_by'].append(source_name)
            elif rel_type == "coordinates action with":
                # Deduplicate bidirectional edges
                if source_name not in connections['coordinates']:
                    connections['coordinates'].append(source_name)
        
        # Build tooltip sections in logical order
        if connections['member_of']:
            tooltip_lines.append("Member of:")
            for org in sorted(connections['member_of']):
                tooltip_lines.append(f"‚Ä¢ {org}")
            tooltip_lines.append("")
        
        if connections['has_members']:
            tooltip_lines.append("Has members:")
            for org in sorted(connections['has_members']):
                tooltip_lines.append(f"‚Ä¢ {org}")
            tooltip_lines.append("")
        
        if connections['funds']:
            tooltip_lines.append("Funds:")
            for org in sorted(connections['funds']):
                tooltip_lines.append(f"‚Ä¢ {org}")
            tooltip_lines.append("")
        
        if connections['funded_by']:
            tooltip_lines.append("Funded by:")
            for org in sorted(connections['funded_by']):
                tooltip_lines.append(f"‚Ä¢ {org}")
            tooltip_lines.append("")
        
        if connections['coordinates']:
            tooltip_lines.append("Coordinates with:")
            for org in sorted(connections['coordinates']):
                tooltip_lines.append(f"‚Ä¢ {org}")
    
    # Join lines and remove trailing whitespace
    title = "\n".join(tooltip_lines).rstrip()
    
    # Get position from positions_map
    pos = positions_map[node]
    
    # Add node to network WITH POSITION
    net.add_node(
        node,
        label=node_data['name'],
        title=title,
        color=color,
        size=size,
        borderWidth=2,
        borderWidthSelected=4,
        x=pos['x'],           # NEW: Initial x position
        y=pos['y'],           # NEW: Initial y position
        fixed=pos['fixed']    # NEW: Whether to lock position
    )

# Add styled edges based on relationship type
for edge in G.edges(data=True):
    rel_type = edge[2]['relationship_type']
    style = EDGE_STYLES[rel_type]
    
    net.add_edge(
        edge[0],
        edge[1],
        color=style['color'],
        width=style['width'],
        arrows=style['arrows'],
        title=rel_type,
        smooth={'type': 'continuous'},
        arrowStrikethrough=False
    )

# Show in notebook
print("Generating interactive visualization with spatial layout...")
print("\nüí° Spatial Layout Strategy:")
print("   ‚Ä¢ Isolated funders: Bottom right corner (no edges)")
print("   ‚Ä¢ Isolated non-funders: Top left corner (no edges)")
print("   ‚Ä¢ Connected nodes: Centered with category bias")
print("   ‚Ä¢ Physics maintains spatial clusters while respecting real edges\n")
net.show('network_preview.html')

# Save to outputs directory
net.save_graph('../outputs/network_map.html')
print("\n‚úì Interactive visualization saved to: outputs/network_map.html")
print("\nüìä Open the HTML file in your browser to explore the network!")

Generating interactive visualization with spatial layout...

üí° Spatial Layout Strategy:
   ‚Ä¢ Isolated funders: Bottom right corner (no edges)
   ‚Ä¢ Isolated non-funders: Top left corner (no edges)
   ‚Ä¢ Connected nodes: Centered with category bias
   ‚Ä¢ Physics maintains spatial clusters while respecting real edges

network_preview.html

‚úì Interactive visualization saved to: outputs/network_map.html

üìä Open the HTML file in your browser to explore the network!


## Section 6: Network Summary

In [6]:
# Get top node
top_node_id = centrality_df.iloc[0]['ID']
top_node_name = centrality_df.iloc[0]['Node']
top_node_connections = G_undirected.degree(top_node_id)

print("="*60)
print("PEDP CLIMATE & ENVIRONMENTAL DATA INITIATIVES")
print("Network Summary v2.0")
print("="*60)
print(f"\nüìä Total Initiatives: {len(nodes_df)}")
print(f"üîó Total Relationships: {len(edges_df)}")
print(f"üìà Network Density: {nx.density(G_undirected):.2%}")
print(f"‚≠ê Most Connected: {top_node_name} ({top_node_connections} connections)")

print("\n=== Top 5 Key Hubs ===")
for idx, row in centrality_df.head(5).iterrows():
    print(f"{row['Node']:50s} {row['Connections']:2d} connections")

print("\n=== Relationship Types ===")
rel_counts = edges_df['relationship_type'].value_counts()
for rel_type, count in rel_counts.items():
    style = EDGE_STYLES[rel_type]
    print(f"  {rel_type:30s} {count:3d} edges ({style['color']})")

print("\n=== Edge Legend ===")
print("üü£ Purple arrows: 'is a member of' (member ‚Üí parent org)")
print("üü¢ Green arrows: 'funds' (funder ‚Üí recipient)")
print("üîµ Blue bidirectional: 'coordinates action with' (mutual)")

print("\n=== PEDP Members ===")
membership_edges = edges_df[edges_df['relationship_type'] == 'is a member of']
for idx, row in membership_edges.iterrows():
    member_name = nodes_df[nodes_df['id'] == row['source']]['name'].values[0]
    print(f"  ‚Ä¢ {member_name}")

print("\n=== Category Distribution ===")
for category, count in nodes_df['category'].value_counts().items():
    print(f"  {category:40s} {count:2d} initiatives")

print("\n=== Timeline Distribution ===")
for timeline, count in nodes_df['timeline'].value_counts().items():
    print(f"  {timeline:40s} {count:2d} initiatives")

print("\n" + "="*60)
print("üí° Insights:")
print("   - PEDP serves as the central hub with 4 member organizations")
print("   - Data Foundation provides funding to 5 key initiatives")
print("   - Strong bidirectional coordination across the ecosystem")
print("   - Mix of established organizations and emerging initiatives")
print("="*60)


PEDP CLIMATE & ENVIRONMENTAL DATA INITIATIVES
Network Summary v2.0

üìä Total Initiatives: 78
üîó Total Relationships: 55
üìà Network Density: 1.83%
‚≠ê Most Connected: Public Environmental Data Partners (27 connections)

=== Top 5 Key Hubs ===
Public Environmental Data Partners                 27 connections
Data Foundation - Climate Data Collaborative & GHG Coalition  9 connections
NASEM - Earth Observations & Data Stewardship Workshop  8 connections
CODE - Center for Open Data Enterprise              6 connections
American Geophysical Union                          6 connections

=== Relationship Types ===
  coordinates action with         33 edges (#3498db)
  funds                           18 edges (#27ae60)
  is a member of                   4 edges (#8e44ad)

=== Edge Legend ===
üü£ Purple arrows: 'is a member of' (member ‚Üí parent org)
üü¢ Green arrows: 'funds' (funder ‚Üí recipient)
üîµ Blue bidirectional: 'coordinates action with' (mutual)

=== PEDP Members ===
  ‚Ä¢ T

## Visualization Guide

**Using the Interactive Map:**
- **Drag** nodes to reposition them
- **Hover** over nodes to see detailed information
- **Zoom** with mouse wheel or trackpad
- **Pan** by clicking and dragging on empty space
- Click the **physics button** (‚öôÔ∏è) to toggle the force simulation on/off

**Node Features:**
- **Color** indicates category (see legend above)
- **Size** reflects degree centrality (more connections = larger node)
- **Position** determined by force-directed algorithm (connected nodes cluster together)

**Legend:**
- üîµ Data Collection/Monitoring
- üü¢ Data Preservation/Archiving
- üî¥ Data Coordination/Standards
- üü† Capacity Building/Support
- üü£ Communication/Access
- üü¢ Advocacy/Community Focus

## Section 7: Hypothetical Future State Visualization

‚ö†Ô∏è **HYPOTHETICAL SCENARIO ANALYSIS** ‚ö†Ô∏è

This section creates a hypothetical future state showing how isolated organizations could become connected through 2 intermediary hubs.

**What's different:**
- Adds 2 hypothetical intermediary nodes (HYP-HUB1, HYP-HUB2)
- Adds ~50 grey hypothetical connections
- Light grey background to distinguish from current network
- All existing nodes/edges preserved with same physics

In [7]:
# Load hypothetical data additions
print("Loading hypothetical data...")
nodes_hyp = pd.read_csv('../data/processed/nodes_hypothetical.csv')
edges_hyp = pd.read_csv('../data/processed/edges_hypothetical.csv')
positions_hyp = pd.read_csv('../data/processed/node_positions_hypothetical.csv')

# Combine with current data
nodes_combined = pd.concat([nodes_df, nodes_hyp], ignore_index=True)
edges_combined = pd.concat([edges_df, edges_hyp], ignore_index=True)
positions_combined = pd.concat([positions_df, positions_hyp], ignore_index=True)

# Map colors for new nodes
nodes_combined['hex_color'] = nodes_combined['color'].map(color_map)

# Update positions map
positions_map_combined = {row['id']: {'x': row['x'], 'y': row['y'], 'fixed': row['fixed']}
                         for _, row in positions_combined.iterrows()}

# Add hypothetical edge style
EDGE_STYLES_COMBINED = EDGE_STYLES.copy()
EDGE_STYLES_COMBINED['hypothetical connection'] = {
    'color': '#999999', 'width': 1.5, 'arrows': 'to', 'dashes': True
}

print(f"Combined: {len(nodes_combined)} nodes, {len(edges_combined)} edges")
print(f"Added: {len(nodes_hyp)} nodes, {len(edges_hyp)} edges")

Loading hypothetical data...
Combined: 80 nodes, 105 edges
Added: 2 nodes, 50 edges


In [8]:
# Build combined graph
G_combined = nx.DiGraph()

for idx, row in nodes_combined.iterrows():
    G_combined.add_node(
        row['id'],
        name=row['name'],
        organization=row['organization'],
        category=row['category'],
        description=row['description'],
        status=row['status'],
        timeline=row['timeline']
    )

for idx, row in edges_combined.iterrows():
    G_combined.add_edge(row['source'], row['target'], relationship_type=row['relationship_type'])

G_combined_undirected = G_combined.to_undirected()

# Calculate centrality on REAL edges only (for node sizing)
G_real_only = nx.DiGraph()
G_real_only.add_nodes_from(G_combined.nodes(data=True))
for source, target, data in G_combined.edges(data=True):
    if data['relationship_type'] != 'hypothetical connection':
        G_real_only.add_edge(source, target, **data)

G_real_filtered = nx.DiGraph()
G_real_filtered.add_nodes_from(G_real_only.nodes(data=True))
for source, target, data in G_real_only.edges(data=True):
    if data['relationship_type'] != 'Interested in solving the problem':
        G_real_filtered.add_edge(source, target, **data)

G_real_filtered_undirected = G_real_filtered.to_undirected()
degree_centrality_combined = nx.degree_centrality(G_real_filtered_undirected)

print(f"Combined network: {G_combined.number_of_nodes()} nodes, {G_combined.number_of_edges()} edges")
print(f"Real edges only: {G_real_only.number_of_edges()} edges")
print(f"Hypothetical edges: {G_combined.number_of_edges() - G_real_only.number_of_edges()} edges")

Combined network: 80 nodes, 105 edges
Real edges only: 55 edges
Hypothetical edges: 50 edges


In [9]:
# Create hypothetical visualization (SAME code as Section 5, different data)
net_hyp = Network(
    height='800px',
    width='100%',
    bgcolor='#f8f8f8',  # Light grey background
    font_color='#333333',
    notebook=True,
    directed=True
)

# EXACT SAME physics as current network
net_hyp.barnes_hut(
    gravity=-3000,
    central_gravity=0.1,
    spring_length=150,
    spring_strength=0.01,
    damping=0.2,
    overlap=0
)

# Add nodes (SAME code as Section 5)
for node in G_combined.nodes():
    node_data = nodes_combined[nodes_combined['id'] == node].iloc[0]
    is_hypothetical = node.startswith('HYP-')
    
    color = node_data['hex_color']
    
    # SIZE: Funders=20px, others by centrality (from REAL edges only)
    if node_data['category'] == 'Funder':
        size = 20
    else:
        if not is_hypothetical:
            size = 15 + (degree_centrality_combined.get(node, 0) * 200)
        else:
            size = 30  # Fixed for hypothetical nodes
    
    # Build tooltip
    tooltip_lines = [node_data['name'], f"Category: {node_data['category']}", ""]
    
    if is_hypothetical:
        tooltip_lines.append("‚ö†Ô∏è HYPOTHETICAL ORGANIZATION (NOT REAL)")
        tooltip_lines.append("")
        tooltip_lines.append(node_data['description'])
    else:
        # Use REAL graph for connection info
        G_real_undirected = G_real_only.to_undirected()
        node_degree = G_real_undirected.degree(node) if node in G_real_undirected else 0
        
        if node_degree == 0:
            if node_data['category'] == 'Funder':
                tooltip_lines.append("Interested in working in this space")
            else:
                tooltip_lines.append("Actively working in this space")
        else:
            connections = {
                'member_of': [], 'has_members': [],
                'funds': [], 'funded_by': [], 'coordinates': []
            }
            
            for _, target, edge_data in G_real_only.out_edges(node, data=True):
                rel_type = edge_data['relationship_type']
                target_name = G_real_only.nodes[target]['name']
                if rel_type == "is a member of":
                    connections['member_of'].append(target_name)
                elif rel_type == "funds":
                    connections['funds'].append(target_name)
                elif rel_type == "coordinates action with":
                    connections['coordinates'].append(target_name)
            
            for source, _, edge_data in G_real_only.in_edges(node, data=True):
                rel_type = edge_data['relationship_type']
                source_name = G_real_only.nodes[source]['name']
                if rel_type == "is a member of":
                    connections['has_members'].append(source_name)
                elif rel_type == "funds":
                    connections['funded_by'].append(source_name)
                elif rel_type == "coordinates action with":
                    if source_name not in connections['coordinates']:
                        connections['coordinates'].append(source_name)
            
            if connections['member_of']:
                tooltip_lines.append("Member of:")
                for org in sorted(connections['member_of']):
                    tooltip_lines.append(f"‚Ä¢ {org}")
                tooltip_lines.append("")
            if connections['has_members']:
                tooltip_lines.append("Has members:")
                for org in sorted(connections['has_members']):
                    tooltip_lines.append(f"‚Ä¢ {org}")
                tooltip_lines.append("")
            if connections['funds']:
                tooltip_lines.append("Funds:")
                for org in sorted(connections['funds']):
                    tooltip_lines.append(f"‚Ä¢ {org}")
                tooltip_lines.append("")
            if connections['funded_by']:
                tooltip_lines.append("Funded by:")
                for org in sorted(connections['funded_by']):
                    tooltip_lines.append(f"‚Ä¢ {org}")
                tooltip_lines.append("")
            if connections['coordinates']:
                tooltip_lines.append("Coordinates with:")
                for org in sorted(connections['coordinates']):
                    tooltip_lines.append(f"‚Ä¢ {org}")
    
    title = "\n".join(tooltip_lines).rstrip()
    pos = positions_map_combined[node]
    
    # Add node with different styling for hypothetical
    if is_hypothetical:
        net_hyp.add_node(
            node,
            label=f"[HYPOTHETICAL]\n{node_data['name']}",
            title=title,
            color=color,
            size=size,
            borderWidth=3,
            shape='box',
            font={'color': '#666666'},
            x=pos['x'],
            y=pos['y'],
            fixed=pos['fixed']
        )
    else:
        net_hyp.add_node(
            node,
            label=node_data['name'],
            title=title,
            color=color,
            size=size,
            borderWidth=2,
            borderWidthSelected=4,
            x=pos['x'],
            y=pos['y'],
            fixed=pos['fixed']
        )

# Add edges
for edge in G_combined.edges(data=True):
    rel_type = edge[2]['relationship_type']
    style = EDGE_STYLES_COMBINED[rel_type]
    
    edge_config = {
        'color': style['color'],
        'width': style['width'],
        'arrows': style['arrows'],
        'title': rel_type,
        'smooth': {'type': 'continuous'},
        'arrowStrikethrough': False
    }
    
    if style.get('dashes'):
        edge_config['dashes'] = True
    
    net_hyp.add_edge(edge[0], edge[1], **edge_config)

print("Generating hypothetical visualization...")
net_hyp.show('network_preview_hypothetical.html')
net_hyp.save_graph('../outputs/network_map_hypothetical.html')
print("\n‚úì Hypothetical visualization saved to: outputs/network_map_hypothetical.html")
print("\n‚ö†Ô∏è  Remember: This shows a HYPOTHETICAL future state, not current network")
print("   Grey dashed edges = potential future partnerships")

Generating hypothetical visualization...
network_preview_hypothetical.html



‚úì Hypothetical visualization saved to: outputs/network_map_hypothetical.html

‚ö†Ô∏è  Remember: This shows a HYPOTHETICAL future state, not current network
   Grey dashed edges = potential future partnerships
