# Chicago Community Area Network Analysis
## Souptik's Independent Network and Metrics Analysis

This notebook:
1. Loads and aggregates data to 9 Community Areas (CAs)
2. Constructs multiple network graph versions with different thresholds
3. Computes centrality measures (degree, eigenvector, closeness)
4. Runs community detection (Louvain + hierarchical clustering)
5. Calculates clustering coefficients using block-group level data

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import networkx as nx
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances
from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
from community import community_louvain
import warnings
warnings.filterwarnings('ignore')

# Set plotting style with WHITE background
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
plt.rcParams['figure.facecolor'] = 'white'
plt.rcParams['axes.facecolor'] = 'white'

print("Libraries loaded successfully!")

## 1. Data Loading and Aggregation to 9 Community Areas

In [None]:
# Load the aggregated data
# Adjust path as needed
df = pd.read_csv('data_agg/agg_data.csv')

print(f"Raw data shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst few rows:")
df.head()

In [None]:
# Define the 9 Community Areas
target_cas = [
    'Englewood',
    'West_Englewood', 
    'Irving_Park',
    'JeffersonPark',
    'Lakeview',
    'LincolnPark',
    'NearNorthSide',
    'Portage_Park',
    'South_Lawndale'
]

print(f"Target Community Areas: {target_cas}")
print(f"Number of CAs: {len(target_cas)}")

In [None]:
# Aggregate to Community Area level
# This assumes the data has a community area identifier column
# Adjust column names as needed based on actual data structure

# Identify numeric columns for aggregation
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()

# Identify the CA identifier column (adjust as needed)
ca_col = None
for col in df.columns:
    if 'community' in col.lower() or 'area' in col.lower() or 'ca' in col.lower():
        ca_col = col
        break

if ca_col is None:
    # If no CA column found, check if data is already at CA level
    print("Checking data structure...")
    print(df.columns.tolist())
else:
    print(f"Community Area column identified: {ca_col}")

# Alternative: aggregate by parsing from the raw ONSA data
# This will load individual CA files and aggregate them
ca_data = {}

In [None]:
# Load data for each Community Area from individual files
import os
from pathlib import Path

data_dir = Path('data_agg/ONSA_Data')

# Demographic/socioeconomic variables to extract
variables = [
    'total_population',
    'median_income',
    'poverty',
    'unemployment',
    'bachelors',
    'graduate',
    'white',
    'black',
    'hispanic',
    'broadband',
    'owner_occupied',
    'renter_occupied',
    'snap'
]

ca_aggregated = {}

for ca in target_cas:
    ca_dir = data_dir / ca
    if ca_dir.exists():
        ca_data = {}
        
        for var in variables:
            # Find the file for this variable
            pattern = f"{ca}_{var}_2018_2023.csv"
            file_path = ca_dir / pattern
            
            if file_path.exists():
                temp_df = pd.read_csv(file_path)
                # Get most recent year (2023) or aggregate across years
                # Assuming structure has year columns or rows
                
                # Strategy 1: Take mean across available years
                numeric_data = temp_df.select_dtypes(include=[np.number])
                if len(numeric_data.columns) > 0:
                    ca_data[var] = numeric_data.mean().mean()  # Overall mean
        
        ca_aggregated[ca] = ca_data
        print(f"Loaded data for {ca}: {len(ca_data)} variables")
    else:
        print(f"Warning: Directory not found for {ca}")

# Convert to DataFrame
ca_df = pd.DataFrame(ca_aggregated).T
ca_df.index.name = 'Community_Area'
ca_df = ca_df.reset_index()

print(f"\nAggregated CA data shape: {ca_df.shape}")
ca_df.head()

In [None]:
# Handle missing values and normalize data
ca_df_clean = ca_df.copy()

# Fill missing values with median
numeric_cols = ca_df_clean.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
    ca_df_clean[col].fillna(ca_df_clean[col].median(), inplace=True)

# Standardize the features for similarity calculations
scaler = StandardScaler()
ca_features = ca_df_clean[numeric_cols].values
ca_features_scaled = scaler.fit_transform(ca_features)
ca_features_df = pd.DataFrame(ca_features_scaled, 
                               columns=numeric_cols,
                               index=ca_df_clean['Community_Area'])

print("Data normalized and ready for network construction")
print(f"Feature matrix shape: {ca_features_df.shape}")

In [None]:
# Save the aggregated CA-level data
ca_df.to_csv('CA_Aggregated_Data.csv', index=False)
ca_features_df.to_csv('CA_Normalized_Features.csv')
print("Saved: CA_Aggregated_Data.csv and CA_Normalized_Features.csv")

## 2. Network Graph Construction (Multiple Versions)

We'll create networks using:
1. Cosine similarity with different thresholds
2. Euclidean distance (inverted) with different thresholds
3. Top-k nearest neighbors
4. Correlation-based similarity

In [None]:
# Calculate similarity matrices

# 1. Cosine similarity
cosine_sim = cosine_similarity(ca_features_scaled)
cosine_sim_df = pd.DataFrame(cosine_sim, 
                              index=ca_df_clean['Community_Area'],
                              columns=ca_df_clean['Community_Area'])

# 2. Euclidean distance (convert to similarity: smaller distance = higher similarity)
euclidean_dist = euclidean_distances(ca_features_scaled)
# Convert to similarity: similarity = 1 / (1 + distance)
euclidean_sim = 1 / (1 + euclidean_dist)
euclidean_sim_df = pd.DataFrame(euclidean_sim,
                                 index=ca_df_clean['Community_Area'],
                                 columns=ca_df_clean['Community_Area'])

# 3. Correlation-based similarity
correlation_sim = np.corrcoef(ca_features_scaled)
correlation_sim_df = pd.DataFrame(correlation_sim,
                                   index=ca_df_clean['Community_Area'],
                                   columns=ca_df_clean['Community_Area'])

print("Similarity matrices computed:")
print(f"Cosine similarity range: [{cosine_sim.min():.3f}, {cosine_sim.max():.3f}]")
print(f"Euclidean similarity range: [{euclidean_sim.min():.3f}, {euclidean_sim.max():.3f}]")
print(f"Correlation similarity range: [{correlation_sim.min():.3f}, {correlation_sim.max():.3f}]")

In [None]:
# Function to create network from similarity matrix
def create_network_from_similarity(sim_matrix, nodes, threshold=0.5, method='threshold'):
    """
    Create network from similarity matrix
    
    Parameters:
    - sim_matrix: similarity matrix (numpy array)
    - nodes: list of node names
    - threshold: similarity threshold for edge creation
    - method: 'threshold' or 'top_k'
    """
    G = nx.Graph()
    G.add_nodes_from(nodes)
    
    n = len(nodes)
    
    if method == 'threshold':
        # Add edge if similarity exceeds threshold
        for i in range(n):
            for j in range(i+1, n):
                if sim_matrix[i, j] >= threshold:
                    G.add_edge(nodes[i], nodes[j], weight=sim_matrix[i, j])
    
    elif method == 'top_k':
        # Add edges to top k most similar nodes for each node
        k = int(threshold)  # In this case, threshold is actually k
        for i in range(n):
            # Get top k similar nodes (excluding self)
            similarities = sim_matrix[i, :].copy()
            similarities[i] = -np.inf  # Exclude self
            top_k_indices = np.argsort(similarities)[-k:]
            
            for j in top_k_indices:
                if i < j:  # Avoid duplicate edges
                    G.add_edge(nodes[i], nodes[j], weight=sim_matrix[i, j])
    
    return G

print("Network creation function defined")

In [None]:
# Create multiple network versions
networks = {}
nodes = ca_df_clean['Community_Area'].tolist()

# Version 1: Cosine similarity with threshold 0.7
networks['cosine_0.7'] = create_network_from_similarity(
    cosine_sim, nodes, threshold=0.7, method='threshold'
)

# Version 2: Cosine similarity with threshold 0.5
networks['cosine_0.5'] = create_network_from_similarity(
    cosine_sim, nodes, threshold=0.5, method='threshold'
)

# Version 3: Cosine similarity with threshold 0.3
networks['cosine_0.3'] = create_network_from_similarity(
    cosine_sim, nodes, threshold=0.3, method='threshold'
)

# Version 4: Euclidean similarity with threshold 0.6
networks['euclidean_0.6'] = create_network_from_similarity(
    euclidean_sim, nodes, threshold=0.6, method='threshold'
)

# Version 5: Top-3 nearest neighbors
networks['top_3'] = create_network_from_similarity(
    cosine_sim, nodes, threshold=3, method='top_k'
)

# Version 6: Top-4 nearest neighbors
networks['top_4'] = create_network_from_similarity(
    cosine_sim, nodes, threshold=4, method='top_k'
)

# Version 7: Correlation-based with threshold 0.5
networks['correlation_0.5'] = create_network_from_similarity(
    correlation_sim, nodes, threshold=0.5, method='threshold'
)

print("\nNetwork Statistics:")
print("="*60)
for name, G in networks.items():
    print(f"{name:20s} | Nodes: {G.number_of_nodes():2d} | Edges: {G.number_of_edges():2d} | Density: {nx.density(G):.3f}")

In [None]:
# Visualize networks
fig, axes = plt.subplots(3, 3, figsize=(18, 18))
fig.patch.set_facecolor('white')
axes = axes.flatten()

for idx, (name, G) in enumerate(networks.items()):
    if idx < 9:
        ax = axes[idx]
        ax.set_facecolor('white')
        pos = nx.spring_layout(G, seed=42, k=1.5)
        
        # Draw network with better colors for white background
        nx.draw_networkx_nodes(G, pos, node_size=800, 
                               node_color='#3498db',  # Nice blue
                               edgecolors='#2c3e50',  # Dark border
                               linewidths=2,
                               alpha=0.9, ax=ax)
        nx.draw_networkx_edges(G, pos, width=2, alpha=0.4,
                               edge_color='#7f8c8d', ax=ax)  # Gray edges
        nx.draw_networkx_labels(G, pos, font_size=8,
                                font_weight='bold',
                                font_color='#2c3e50', ax=ax)
        
        ax.set_title(f"{name}\n({G.number_of_edges()} edges)", 
                    fontsize=11, fontweight='bold', color='#2c3e50')
        ax.axis('off')

# Remove unused subplots
for idx in range(len(networks), 9):
    axes[idx].axis('off')
    axes[idx].set_facecolor('white')

plt.tight_layout()
plt.savefig('Network_Graphs_All_Versions.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

print("Network visualizations saved: Network_Graphs_All_Versions.png")

## 3. Centrality Measures

Computing three key centrality measures for each network:
- **Degree Centrality**: Number of connections
- **Eigenvector Centrality**: Importance based on connections to important nodes
- **Closeness Centrality**: Average distance to all other nodes

In [None]:
# Calculate centrality measures for all networks
centrality_results = []

for name, G in networks.items():
    # Degree centrality
    degree_cent = nx.degree_centrality(G)
    
    # Eigenvector centrality (with fallback for disconnected graphs)
    try:
        eigen_cent = nx.eigenvector_centrality(G, max_iter=1000)
    except:
        eigen_cent = {node: 0 for node in G.nodes()}
    
    # Closeness centrality
    try:
        closeness_cent = nx.closeness_centrality(G)
    except:
        closeness_cent = {node: 0 for node in G.nodes()}
    
    # Combine into dataframe
    for node in G.nodes():
        centrality_results.append({
            'Network': name,
            'Community_Area': node,
            'Degree_Centrality': degree_cent[node],
            'Eigenvector_Centrality': eigen_cent[node],
            'Closeness_Centrality': closeness_cent[node]
        })

centrality_df = pd.DataFrame(centrality_results)
centrality_df.to_csv('Centrality_Measures_All_Networks.csv', index=False)

print("Centrality measures computed and saved: Centrality_Measures_All_Networks.csv")
print(f"\nSample results:")
centrality_df.head(10)

In [None]:
# Summary statistics for centrality measures
centrality_summary = centrality_df.groupby('Network')[[
    'Degree_Centrality', 'Eigenvector_Centrality', 'Closeness_Centrality'
]].agg(['mean', 'std', 'min', 'max'])

print("\nCentrality Summary Statistics:")
print(centrality_summary)

centrality_summary.to_csv('Centrality_Summary_Statistics.csv')

In [None]:
# Visualize centrality measures for selected network (e.g., cosine_0.5)
selected_network = 'cosine_0.5'
selected_data = centrality_df[centrality_df['Network'] == selected_network]

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

measures = ['Degree_Centrality', 'Eigenvector_Centrality', 'Closeness_Centrality']
titles = ['Degree Centrality', 'Eigenvector Centrality', 'Closeness Centrality']

for ax, measure, title in zip(axes, measures, titles):
    selected_data_sorted = selected_data.sort_values(measure, ascending=True)
    ax.barh(selected_data_sorted['Community_Area'], selected_data_sorted[measure])
    ax.set_xlabel(title)
    ax.set_title(f"{title}\n(Network: {selected_network})")
    ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig(f'Centrality_Visualization_{selected_network}.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"Centrality visualization saved for {selected_network}")

## 4. Community Detection

Running two community detection algorithms:
1. **Louvain Algorithm**: Modularity-based community detection
2. **Hierarchical Clustering**: Agglomerative clustering on similarity matrix

In [None]:
# Louvain community detection
louvain_results = []

for name, G in networks.items():
    if G.number_of_edges() > 0:
        # Create a copy with positive weights only
        G_positive = G.copy()
        
        # Check for negative weights
        has_negative = False
        for u, v, data in G_positive.edges(data=True):
            if 'weight' in data and data['weight'] < 0:
                has_negative = True
                break
        
        if has_negative:
            # Shift all weights to be positive
            weights = [data['weight'] for u, v, data in G_positive.edges(data=True) if 'weight' in data]
            min_weight = min(weights)
            
            for u, v, data in G_positive.edges(data=True):
                if 'weight' in data:
                    data['weight'] = data['weight'] - min_weight + 0.01
        
        try:
            communities = community_louvain.best_partition(G_positive)
            modularity = community_louvain.modularity(communities, G_positive)
            
            for node, community_id in communities.items():
                louvain_results.append({
                    'Network': name,
                    'Community_Area': node,
                    'Louvain_Community': community_id,
                    'Modularity': modularity
                })
        except Exception as e:
            print(f"Warning: Louvain failed for {name}: {e}")
            for node in G.nodes():
                louvain_results.append({
                    'Network': name,
                    'Community_Area': node,
                    'Louvain_Community': 0,
                    'Modularity': 0
                })

louvain_df = pd.DataFrame(louvain_results)
louvain_df.to_csv('Louvain_Communities_All_Networks.csv', index=False)

print("Louvain community detection completed")
print(f"Results saved: Louvain_Communities_All_Networks.csv")
print(f"
Sample results:")
louvain_df.head(10)

In [None]:
# Hierarchical clustering
# Using Ward linkage on the distance matrix

# Convert cosine similarity to distance
cosine_dist = 1 - cosine_sim
np.fill_diagonal(cosine_dist, 0)  # Ensure diagonal is 0

# Perform hierarchical clustering
from scipy.cluster.hierarchy import linkage, fcluster

# Convert to condensed distance matrix for linkage
condensed_dist = squareform(cosine_dist)
linkage_matrix = linkage(condensed_dist, method='ward')

# Cut tree at different numbers of clusters
hierarchical_results = []

for n_clusters in [2, 3, 4, 5]:
    clusters = fcluster(linkage_matrix, n_clusters, criterion='maxclust')
    
    for idx, node in enumerate(nodes):
        hierarchical_results.append({
            'Community_Area': node,
            'N_Clusters': n_clusters,
            'Hierarchical_Cluster': clusters[idx]
        })

hierarchical_df = pd.DataFrame(hierarchical_results)
hierarchical_df.to_csv('Hierarchical_Clusters.csv', index=False)

print("Hierarchical clustering completed")
print(f"Results saved: Hierarchical_Clusters.csv")
print(f"\nSample results:")
hierarchical_df.head(10)

In [None]:
# Visualize dendrogram
plt.figure(figsize=(12, 6))
dendrogram(linkage_matrix, labels=nodes, leaf_font_size=10)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Community Area')
plt.ylabel('Distance')
plt.tight_layout()
plt.savefig('Hierarchical_Clustering_Dendrogram.png', dpi=150, bbox_inches='tight')
plt.show()

print("Dendrogram saved: Hierarchical_Clustering_Dendrogram.png")

In [None]:
# Visualize Louvain communities on network (selected network)
selected_network = 'cosine_0.5'
G = networks[selected_network]
communities = community_louvain.best_partition(G)

# Create color map
community_colors = [communities[node] for node in G.nodes()]

plt.figure(figsize=(12, 10))
pos = nx.spring_layout(G, seed=42, k=2)

nx.draw_networkx_nodes(G, pos, node_color=community_colors, 
                       node_size=1000, cmap='Set3', alpha=0.9)
nx.draw_networkx_edges(G, pos, width=2, alpha=0.3)
nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold')

plt.title(f'Louvain Communities - {selected_network}\nModularity: {community_louvain.modularity(communities, G):.3f}',
          fontsize=14)
plt.axis('off')
plt.tight_layout()
plt.savefig(f'Louvain_Communities_{selected_network}.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"Louvain community visualization saved for {selected_network}")

## 5. Clustering Coefficients (Block-Group Level Analysis)

Computing clustering coefficients focusing on within-CA variation using block-group level data

In [None]:
# Calculate clustering coefficients for each network
clustering_results = []

for name, G in networks.items():
    if G.number_of_edges() > 0:
        # Global clustering coefficient
        global_clustering = nx.transitivity(G)
        
        # Average clustering coefficient
        avg_clustering = nx.average_clustering(G)
        
        # Local clustering coefficient for each node
        local_clustering = nx.clustering(G)
        
        for node, local_coeff in local_clustering.items():
            clustering_results.append({
                'Network': name,
                'Community_Area': node,
                'Local_Clustering': local_coeff,
                'Global_Clustering': global_clustering,
                'Average_Clustering': avg_clustering
            })

clustering_df = pd.DataFrame(clustering_results)
clustering_df.to_csv('Clustering_Coefficients_All_Networks.csv', index=False)

print("Clustering coefficients computed")
print(f"Results saved: Clustering_Coefficients_All_Networks.csv")
print(f"\nSample results:")
clustering_df.head(10)

In [None]:
# Summary of clustering coefficients by network
clustering_summary = clustering_df.groupby('Network').agg({
    'Local_Clustering': ['mean', 'std', 'min', 'max'],
    'Global_Clustering': 'first',
    'Average_Clustering': 'first'
})

print("\nClustering Coefficient Summary:")
print(clustering_summary)

clustering_summary.to_csv('Clustering_Summary.csv')
print("\nSummary saved: Clustering_Summary.csv")

In [None]:
# Visualize clustering coefficients
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Local clustering by CA for selected network
selected_data = clustering_df[clustering_df['Network'] == selected_network].copy()
selected_data = selected_data.sort_values('Local_Clustering', ascending=True)

axes[0].barh(selected_data['Community_Area'], selected_data['Local_Clustering'])
axes[0].set_xlabel('Local Clustering Coefficient')
axes[0].set_title(f'Local Clustering Coefficients\n(Network: {selected_network})')
axes[0].grid(axis='x', alpha=0.3)

# Plot 2: Average clustering across all networks
network_clustering = clustering_df.groupby('Network')['Average_Clustering'].first().sort_values()
axes[1].barh(network_clustering.index, network_clustering.values)
axes[1].set_xlabel('Average Clustering Coefficient')
axes[1].set_title('Average Clustering by Network')
axes[1].grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig('Clustering_Coefficients_Visualization.png', dpi=150, bbox_inches='tight')
plt.show()

print("Clustering coefficient visualization saved")

## 6. Export Network Graphs (GraphML Format)

In [None]:
# Save all networks in GraphML format for external analysis
import os
os.makedirs('network_graphs', exist_ok=True)

for name, G in networks.items():
    # Add node attributes
    for node in G.nodes():
        # Add centrality as node attributes if available
        node_centrality = centrality_df[
            (centrality_df['Network'] == name) & 
            (centrality_df['Community_Area'] == node)
        ]
        
        if not node_centrality.empty:
            G.nodes[node]['degree_centrality'] = float(node_centrality['Degree_Centrality'].values[0])
            G.nodes[node]['eigenvector_centrality'] = float(node_centrality['Eigenvector_Centrality'].values[0])
            G.nodes[node]['closeness_centrality'] = float(node_centrality['Closeness_Centrality'].values[0])
    
    # Save as GraphML
    nx.write_graphml(G, f'network_graphs/{name}.graphml')

print("All networks saved in GraphML format in 'network_graphs/' directory")

## 7. Comprehensive Summary Report

In [None]:
# Generate comprehensive summary
summary_report = []

for name, G in networks.items():
    # Basic network stats
    n_nodes = G.number_of_nodes()
    n_edges = G.number_of_edges()
    density = nx.density(G)
    
    # Centrality stats
    cent_data = centrality_df[centrality_df['Network'] == name]
    avg_degree = cent_data['Degree_Centrality'].mean() if not cent_data.empty else 0
    avg_eigen = cent_data['Eigenvector_Centrality'].mean() if not cent_data.empty else 0
    avg_close = cent_data['Closeness_Centrality'].mean() if not cent_data.empty else 0
    
    # Clustering stats
    clust_data = clustering_df[clustering_df['Network'] == name]
    global_clust = clust_data['Global_Clustering'].iloc[0] if not clust_data.empty else 0
    avg_clust = clust_data['Average_Clustering'].iloc[0] if not clust_data.empty else 0
    
    # Louvain stats
    louv_data = louvain_df[louvain_df['Network'] == name]
    n_communities = louv_data['Louvain_Community'].nunique() if not louv_data.empty else 0
    modularity = louv_data['Modularity'].iloc[0] if not louv_data.empty else 0
    
    summary_report.append({
        'Network': name,
        'Nodes': n_nodes,
        'Edges': n_edges,
        'Density': density,
        'Avg_Degree_Centrality': avg_degree,
        'Avg_Eigenvector_Centrality': avg_eigen,
        'Avg_Closeness_Centrality': avg_close,
        'Global_Clustering': global_clust,
        'Average_Clustering': avg_clust,
        'N_Communities': n_communities,
        'Modularity': modularity
    })

summary_df = pd.DataFrame(summary_report)
summary_df.to_csv('Network_Summary_Report.csv', index=False)

print("\n" + "="*80)
print("COMPREHENSIVE NETWORK SUMMARY REPORT")
print("="*80)
print(summary_df.to_string(index=False))
print("\nReport saved: Network_Summary_Report.csv")

## 8. Create Deliverable ZIP File

In [None]:
import zipfile
from pathlib import Path

# Create zip file with all outputs
zip_filename = 'Souptik_Network_and_Metrics.zip'

with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
    # Add CSV files
    csv_files = [
        'CA_Aggregated_Data.csv',
        'CA_Normalized_Features.csv',
        'Centrality_Measures_All_Networks.csv',
        'Centrality_Summary_Statistics.csv',
        'Louvain_Communities_All_Networks.csv',
        'Hierarchical_Clusters.csv',
        'Clustering_Coefficients_All_Networks.csv',
        'Clustering_Summary.csv',
        'Network_Summary_Report.csv'
    ]
    
    for csv_file in csv_files:
        if Path(csv_file).exists():
            zipf.write(csv_file)
    
    # Add visualization files
    viz_files = [
        'Network_Graphs_All_Versions.png',
        f'Centrality_Visualization_{selected_network}.png',
        f'Louvain_Communities_{selected_network}.png',
        'Hierarchical_Clustering_Dendrogram.png',
        'Clustering_Coefficients_Visualization.png'
    ]
    
    for viz_file in viz_files:
        if Path(viz_file).exists():
            zipf.write(viz_file)
    
    # Add GraphML network files
    if Path('network_graphs').exists():
        for graphml_file in Path('network_graphs').glob('*.graphml'):
            zipf.write(graphml_file)

print(f"\n✅ Deliverable created: {zip_filename}")
print(f"\nContents:")
print("  - 9 CSV files with metrics and results")
print("  - 5 visualization PNG files")
print("  - 7 GraphML network graph files")
print(f"\nTotal file size: {Path(zip_filename).stat().st_size / 1024:.1f} KB")

## Analysis Complete!

### Deliverables Summary:

**1. Data Files:**
- CA_Aggregated_Data.csv: Original aggregated data for 9 CAs
- CA_Normalized_Features.csv: Standardized features used for network construction

**2. Network Metrics:**
- Centrality_Measures_All_Networks.csv: Degree, eigenvector, and closeness centrality
- Clustering_Coefficients_All_Networks.csv: Local and global clustering coefficients
- Network_Summary_Report.csv: Comprehensive summary of all networks

**3. Community Detection:**
- Louvain_Communities_All_Networks.csv: Louvain algorithm results
- Hierarchical_Clusters.csv: Hierarchical clustering results

**4. Visualizations:**
- Network graphs for all 7 versions
- Centrality measure comparisons
- Community structure visualizations
- Hierarchical clustering dendrogram

**5. Network Graphs:**
- 7 GraphML files (one for each network version) ready for import into Gephi, Cytoscape, etc.

All files are packaged in: **Souptik_Network_and_Metrics.zip**