# PEDP Climate & Environmental Data Initiatives Network Map v2.0

This notebook creates an interactive network visualization of 19 climate and environmental data initiatives and their relationships.

**New in v2.0:**
- **19 nodes** (added 3 PEDP members: OEDP, EPIC, EDGI)
- **3 relationship types** with directional arrows:
  - üü£ Purple: "is a member of" (member ‚Üí parent)
  - üü¢ Green: "funds" (funder ‚Üí recipient)
  - üîµ Blue: "coordinates action with" (bidirectional)
- **Thicker edges** (2-3px) for better visibility
- **Directed graph** showing organizational hierarchy

**Features:**
- Interactive node exploration with drag, zoom, and pan
- Color-coded by initiative category
- Node size proportional to degree centrality
- Centrality metrics to identify key hubs
- Rich tooltips with organization details
- Styled edges showing relationship types


## Section 1: Setup

In [1]:
import networkx as nx
import pandas as pd
from pyvis.network import Network
import warnings
warnings.filterwarnings('ignore')

# Edge styling by relationship type (visualization config)
EDGE_STYLES = {
    'is a member of': {'color': '#8e44ad', 'width': 2.5, 'arrows': 'to'},
    'funds': {'color': '#27ae60', 'width': 3, 'arrows': 'to'},
    'coordinates action with': {'color': '#3498db', 'width': 2, 'arrows': 'to;from'}
}

print("‚úì Libraries imported successfully")
print(f"‚úì NetworkX version: {nx.__version__}")
print(f"‚úì Pandas version: {pd.__version__}")


‚úì Libraries imported successfully
‚úì NetworkX version: 3.5
‚úì Pandas version: 2.2.2


## Section 2: Load Data

In [2]:
# Load node and edge data
nodes_df = pd.read_csv('../data/processed/nodes.csv')
edges_df = pd.read_csv('../data/processed/edges.csv')

# Load color config and create mapping
colors_df = pd.read_csv('../data/processed/colors.csv')
color_map = dict(zip(colors_df['name'], colors_df['hex']))

# Map color names to hex codes
nodes_df['hex_color'] = nodes_df['color'].map(color_map)

# Display summary
print(f"Nodes: {len(nodes_df)}, Edges: {len(edges_df)}")
print(f"\nNode data shape: {nodes_df.shape}")
print(f"Edge data shape: {edges_df.shape}")

print("\n=== Color Palette ===")
for name, hex_code in color_map.items():
    print(f"{name:10s} ‚Üí {hex_code}")

print("\n=== Sample Nodes ===")
display(nodes_df[['id', 'name', 'category', 'color']].head())

print("\n=== Sample Edges ===")
display(edges_df.head(10))

print("\n=== Category Distribution ===")
print(nodes_df['category'].value_counts())


Nodes: 86, Edges: 55

Node data shape: (86, 11)
Edge data shape: (55, 3)

=== Color Palette ===
red        ‚Üí #e74c3c
green      ‚Üí #2ecc71
blue       ‚Üí #3498db
orange     ‚Üí #f39c12
purple     ‚Üí #9b59b6
teal       ‚Üí #1abc9c

=== Sample Nodes ===


Unnamed: 0,id,name,category,color
0,AGU,American Geophysical Union,Data Coordination/Standards,red
1,CDAN,Climate-Ocean Data Action Network,Data Coordination/Standards,red
2,DataFoundation,Data Foundation - Climate Data Collaborative &...,Data Coordination/Standards,red
3,GRQD,Group on Reference Quality Datasets,Data Coordination/Standards,red
4,KCF,Keeling Curve Foundation,Data Preservation/Archiving,blue



=== Sample Edges ===


Unnamed: 0,source,target,relationship_type
0,ImpactProject,PEDP,is a member of
1,OEDP,PEDP,is a member of
2,EPIC,PEDP,is a member of
3,EDGI,PEDP,is a member of
4,DataFoundation,Cornerstone,funds
5,DataFoundation,GRQD,funds
6,DataFoundation,CDAN,funds
7,DataFoundation,KCF,funds
8,DataFoundation,ImpactProject,funds
9,PEDP,AGU,coordinates action with



=== Category Distribution ===
category
Funder                         36
Capacity Building/Support      21
Data Coordination/Standards    12
Data Preservation/Archiving     7
Government/Agency               4
Research/Academic               3
Data Collection/Monitoring      1
Communication/Access            1
Advocacy/Community Focus        1
Name: count, dtype: int64


## Section 3: Build Network

In [3]:
# Create directed graph
G = nx.DiGraph()

# Add nodes with attributes
for idx, row in nodes_df.iterrows():
    G.add_node(
        row['id'],
        name=row['name'],
        organization=row['organization'],
        category=row['category'],
        description=row['description'],
        status=row['status'],
        timeline=row['timeline']
    )

# Add edges with relationship type attribute
for idx, row in edges_df.iterrows():
    G.add_edge(
        row['source'],
        row['target'],
        relationship_type=row['relationship_type']
    )

# Network statistics
print("=== Network Statistics ===")
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.3f}")
print(f"Graph type: {'Directed' if nx.is_directed(G) else 'Undirected'}")

# Show relationship type distribution
print("\n=== Relationship Type Distribution ===")
print(edges_df['relationship_type'].value_counts())

# Convert to undirected for connectivity check
G_undirected = G.to_undirected()
print(f"\nConnected: {nx.is_connected(G_undirected)}")

if not nx.is_connected(G_undirected):
    print(f"\nNumber of connected components: {nx.number_connected_components(G_undirected)}")
    components = list(nx.connected_components(G_undirected))
    print("Component sizes:", [len(c) for c in components])

# Degree distribution (on undirected version for comparability)
degrees = dict(G_undirected.degree())
print(f"\nAverage degree: {sum(degrees.values()) / len(degrees):.2f}")
print(f"Max degree: {max(degrees.values())}")
print(f"Min degree: {min(degrees.values())}")


=== Network Statistics ===
Nodes: 83
Edges: 55
Density: 0.008
Graph type: Directed

=== Relationship Type Distribution ===
relationship_type
coordinates action with    33
funds                      18
is a member of              4
Name: count, dtype: int64

Connected: False

Number of connected components: 52
Component sizes: [32, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Average degree: 1.33
Max degree: 27
Min degree: 0


## Section 4: Calculate Centrality Metrics

In [4]:
# Convert directed graph to undirected for centrality calculations
G_undirected = G.to_undirected()

# Calculate centrality metrics on undirected version for comparability
degree_centrality = nx.degree_centrality(G_undirected)
betweenness_centrality = nx.betweenness_centrality(G_undirected)
closeness_centrality = nx.closeness_centrality(G_undirected)

# Create summary DataFrame
centrality_data = []
for node in G.nodes():
    node_name = nodes_df[nodes_df['id'] == node]['name'].values[0]
    centrality_data.append({
        'ID': node,
        'Node': node_name,
        'In-degree': G.in_degree(node),
        'Out-degree': G.out_degree(node),
        'Connections': G_undirected.degree(node),
        'Degree': degree_centrality[node],
        'Betweenness': betweenness_centrality[node],
        'Closeness': closeness_centrality[node]
    })

centrality_df = pd.DataFrame(centrality_data).sort_values('Degree', ascending=False)

print("=== Centrality Metrics ===")
print("\nTop 10 Most Connected Initiatives (by Degree Centrality):")
display(centrality_df.head(10))

print("\nTop 5 Bridge Nodes (by Betweenness Centrality):")
display(centrality_df.sort_values('Betweenness', ascending=False)[['Node', 'Connections', 'Betweenness']].head())

print("\nTop 5 Information Spreaders (by Closeness Centrality):")
display(centrality_df.sort_values('Closeness', ascending=False)[['Node', 'Connections', 'Closeness']].head())


=== Centrality Metrics ===

Top 10 Most Connected Initiatives (by Degree Centrality):


Unnamed: 0,ID,Node,In-degree,Out-degree,Connections,Degree,Betweenness,Closeness
15,PEDP,Public Environmental Data Partners,17,10,27,0.329268,0.119035,0.334843
2,DataFoundation,Data Foundation - Climate Data Collaborative &...,1,8,9,0.109756,0.018865,0.221123
5,NASEM,NASEM - Earth Observations & Data Stewardship ...,3,5,8,0.097561,0.011824,0.217028
12,CODE,CODE - Center for Open Data Enterprise,4,2,6,0.073171,0.001892,0.205605
0,AGU,American Geophysical Union,5,1,6,0.073171,0.003408,0.209277
3,GRQD,Group on Reference Quality Datasets,5,0,5,0.060976,0.000903,0.154204
9,NYCE,New York Climate Exchange,2,2,4,0.04878,0.000703,0.202061
10,DRP,The Data Rescue Project,2,2,4,0.04878,0.000301,0.189024
4,KCF,Keeling Curve Foundation,2,1,3,0.036585,0.0,0.148348
17,EPIC,Environmental Policy Innovation Center,0,3,3,0.036585,0.0,0.195325



Top 5 Bridge Nodes (by Betweenness Centrality):


Unnamed: 0,Node,Connections,Betweenness
15,Public Environmental Data Partners,27,0.119035
2,Data Foundation - Climate Data Collaborative &...,9,0.018865
5,NASEM - Earth Observations & Data Stewardship ...,8,0.011824
0,American Geophysical Union,6,0.003408
12,CODE - Center for Open Data Enterprise,6,0.001892



Top 5 Information Spreaders (by Closeness Centrality):


Unnamed: 0,Node,Connections,Closeness
15,Public Environmental Data Partners,27,0.334843
2,Data Foundation - Climate Data Collaborative &...,9,0.221123
5,NASEM - Earth Observations & Data Stewardship ...,8,0.217028
0,American Geophysical Union,6,0.209277
12,CODE - Center for Open Data Enterprise,6,0.205605


## Section 5: Create Interactive Visualization

In [5]:
# Initialize PyVis network with directed mode enabled
net = Network(
    height='800px',
    width='100%',
    bgcolor='#ffffff',
    font_color='#333333',
    notebook=True,
    directed=True
)

# Configure physics for better layout
net.barnes_hut(
    gravity=-8000,
    central_gravity=0.3,
    spring_length=200,
    spring_strength=0.001,
    damping=0.09,
    overlap=0
)

# Add nodes with styling from CSV + color config
for node in G.nodes():
    node_data = nodes_df[nodes_df['id'] == node].iloc[0]
    
    # Use hex color from mapped config
    color = node_data['hex_color']
    
    # Size by degree centrality (15-60px range)
    size = 15 + (degree_centrality[node] * 200)
    
    # Create simple, clean tooltip (no complex HTML styling)
    title = (
        f"{node_data['name']}\n\n"
        f"Organization: {node_data['organization']}\n"
        f"Category: {node_data['category']}\n"
        f"Status: {node_data['status']}\n"
        f"Timeline: {node_data['timeline']}\n"
        f"Connections: {G_undirected.degree(node)}\n\n"
        f"{node_data['description']}"
    )
    
    # Add node to network
    net.add_node(
        node,
        label=node_data['name'],
        title=title,
        color=color,
        size=size,
        borderWidth=2,
        borderWidthSelected=4
    )

# Add styled edges based on relationship type
for edge in G.edges(data=True):
    rel_type = edge[2]['relationship_type']
    style = EDGE_STYLES[rel_type]
    
    net.add_edge(
        edge[0],
        edge[1],
        color=style['color'],
        width=style['width'],
        arrows=style['arrows'],
        title=rel_type,
        smooth={'type': 'continuous'},
        arrowStrikethrough=False
    )

# Show in notebook
print("Generating interactive visualization...")
net.show('network_preview.html')

# Save to outputs directory
net.save_graph('../outputs/network_map.html')
print("\n‚úì Interactive visualization saved to: outputs/network_map.html")
print("\nüìä Open the HTML file in your browser to explore the network!")


Generating interactive visualization...
network_preview.html

‚úì Interactive visualization saved to: outputs/network_map.html

üìä Open the HTML file in your browser to explore the network!


## Section 6: Network Summary

In [6]:
# Get top node
top_node_id = centrality_df.iloc[0]['ID']
top_node_name = centrality_df.iloc[0]['Node']
top_node_connections = G_undirected.degree(top_node_id)

print("="*60)
print("PEDP CLIMATE & ENVIRONMENTAL DATA INITIATIVES")
print("Network Summary v2.0")
print("="*60)
print(f"\nüìä Total Initiatives: {len(nodes_df)}")
print(f"üîó Total Relationships: {len(edges_df)}")
print(f"üìà Network Density: {nx.density(G_undirected):.2%}")
print(f"‚≠ê Most Connected: {top_node_name} ({top_node_connections} connections)")

print("\n=== Top 5 Key Hubs ===")
for idx, row in centrality_df.head(5).iterrows():
    print(f"{row['Node']:50s} {row['Connections']:2d} connections")

print("\n=== Relationship Types ===")
rel_counts = edges_df['relationship_type'].value_counts()
for rel_type, count in rel_counts.items():
    style = EDGE_STYLES[rel_type]
    print(f"  {rel_type:30s} {count:3d} edges ({style['color']})")

print("\n=== Edge Legend ===")
print("üü£ Purple arrows: 'is a member of' (member ‚Üí parent org)")
print("üü¢ Green arrows: 'funds' (funder ‚Üí recipient)")
print("üîµ Blue bidirectional: 'coordinates action with' (mutual)")

print("\n=== PEDP Members ===")
membership_edges = edges_df[edges_df['relationship_type'] == 'is a member of']
for idx, row in membership_edges.iterrows():
    member_name = nodes_df[nodes_df['id'] == row['source']]['name'].values[0]
    print(f"  ‚Ä¢ {member_name}")

print("\n=== Category Distribution ===")
for category, count in nodes_df['category'].value_counts().items():
    print(f"  {category:40s} {count:2d} initiatives")

print("\n=== Timeline Distribution ===")
for timeline, count in nodes_df['timeline'].value_counts().items():
    print(f"  {timeline:40s} {count:2d} initiatives")

print("\n" + "="*60)
print("üí° Insights:")
print("   - PEDP serves as the central hub with 4 member organizations")
print("   - Data Foundation provides funding to 5 key initiatives")
print("   - Strong bidirectional coordination across the ecosystem")
print("   - Mix of established organizations and emerging initiatives")
print("="*60)


PEDP CLIMATE & ENVIRONMENTAL DATA INITIATIVES
Network Summary v2.0

üìä Total Initiatives: 86
üîó Total Relationships: 55
üìà Network Density: 1.62%
‚≠ê Most Connected: Public Environmental Data Partners (27 connections)

=== Top 5 Key Hubs ===
Public Environmental Data Partners                 27 connections
Data Foundation - Climate Data Collaborative & GHG Coalition  9 connections
NASEM - Earth Observations & Data Stewardship Workshop  8 connections
CODE - Center for Open Data Enterprise              6 connections
American Geophysical Union                          6 connections

=== Relationship Types ===
  coordinates action with         33 edges (#3498db)
  funds                           18 edges (#27ae60)
  is a member of                   4 edges (#8e44ad)

=== Edge Legend ===
üü£ Purple arrows: 'is a member of' (member ‚Üí parent org)
üü¢ Green arrows: 'funds' (funder ‚Üí recipient)
üîµ Blue bidirectional: 'coordinates action with' (mutual)

=== PEDP Members ===
  ‚Ä¢ T

## Visualization Guide

**Using the Interactive Map:**
- **Drag** nodes to reposition them
- **Hover** over nodes to see detailed information
- **Zoom** with mouse wheel or trackpad
- **Pan** by clicking and dragging on empty space
- Click the **physics button** (‚öôÔ∏è) to toggle the force simulation on/off

**Node Features:**
- **Color** indicates category (see legend above)
- **Size** reflects degree centrality (more connections = larger node)
- **Position** determined by force-directed algorithm (connected nodes cluster together)

**Legend:**
- üîµ Data Collection/Monitoring
- üü¢ Data Preservation/Archiving
- üî¥ Data Coordination/Standards
- üü† Capacity Building/Support
- üü£ Communication/Access
- üü¢ Advocacy/Community Focus