# Network Exploration Analysis

This notebook provides comprehensive exploration and analysis of citation networks using the Academic Citation Platform's advanced analytics capabilities.

## Overview

- **Network Structure Analysis**: Basic network properties and statistics
- **Centrality Analysis**: Identification of influential papers and nodes
- **Community Detection**: Discovery of research clusters and communities
- **Visualization**: Interactive network visualizations and plots

## Requirements

This notebook requires the Academic Citation Platform with advanced analytics components.

In [None]:
# Import required libraries
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
project_root = os.path.dirname(os.getcwd())
if project_root not in sys.path:
    sys.path.append(project_root)

# Import analytics components
from src.services.analytics_service import get_analytics_service
from src.analytics.export_engine import ExportConfiguration

print("✅ Libraries imported successfully")
print(f"📊 Analysis started at: {datetime.now()}")

In [None]:
# Initialize analytics service
analytics = get_analytics_service()

# Check system health
health = analytics.get_system_health()
print("🏥 System Health Check:")
print(f"   Overall Status: {health['overall_health']['status']}")
print(f"   ML Service: {'✅' if health['service_info']['ml_service_available'] else '❌'}")
print(f"   Database: {'✅' if health['service_info']['database_available'] else '❌'}")
print(f"   Active Tasks: {health['service_info']['active_tasks']}")

## Network Analysis Configuration

Configure the network analysis parameters:

In [None]:
# Analysis configuration
ANALYSIS_CONFIG = {
    'max_papers': 1000,  # Maximum number of papers to analyze
    'include_communities': True,  # Perform community detection
    'include_centrality': True,   # Calculate centrality metrics
    'export_results': True        # Export results to file
}

print("⚙️ Analysis Configuration:")
for key, value in ANALYSIS_CONFIG.items():
    print(f"   {key}: {value}")

## Step 1: Network Structure Analysis

Perform comprehensive network structure analysis:

In [None]:
# Perform network analysis
print("🔍 Starting network analysis...")
network_results = analytics.analyze_citation_network(
    max_papers=ANALYSIS_CONFIG['max_papers'],
    include_communities=ANALYSIS_CONFIG['include_communities'],
    include_centrality=ANALYSIS_CONFIG['include_centrality']
)

if 'error' in network_results:
    print(f"❌ Analysis failed: {network_results['error']}")
else:
    print("✅ Network analysis completed successfully!")
    
    # Display basic network information
    graph_info = network_results['graph_info']
    print(f"\n📊 Network Overview:")
    print(f"   Nodes (Papers): {graph_info['num_nodes']:,}")
    print(f"   Edges (Citations): {graph_info['num_edges']:,}")
    print(f"   Directed Graph: {graph_info['is_directed']}")

In [None]:
# Display network metrics
if 'network_metrics' in network_results:
    metrics = network_results['network_metrics']
    
    print("\n🏗️ Network Structure Metrics:")
    print(f"   Density: {metrics['density']:.6f}")
    print(f"   Average Degree: {metrics['average_degree']:.2f}")
    print(f"   Clustering Coefficient: {metrics['clustering_coefficient']:.4f}")
    print(f"   Connected Components: {metrics['num_components']}")
    print(f"   Largest Component Size: {metrics['largest_component_size']:,}")
    
    if metrics['diameter'] is not None:
        print(f"   Network Diameter: {metrics['diameter']}")
    if metrics['average_path_length'] is not None:
        print(f"   Average Path Length: {metrics['average_path_length']:.2f}")
    
    print(f"   Modularity: {metrics['modularity']:.4f}")
    if metrics['assortativity'] is not None:
        print(f"   Assortativity: {metrics['assortativity']:.4f}")

## Step 2: Centrality Analysis

Analyze the most influential papers in the network:

In [None]:
# Display influential papers
if 'influential_papers' in network_results:
    influential = network_results['influential_papers']
    
    print("\n🎯 Most Influential Papers:")
    
    # Top papers by PageRank
    print("\n   Top 10 by PageRank:")
    for i, paper_id in enumerate(influential['pagerank'][:10], 1):
        print(f"   {i:2d}. {paper_id}")
    
    # Top papers by Degree Centrality
    print("\n   Top 10 by Degree Centrality:")
    for i, paper_id in enumerate(influential['degree_centrality'][:10], 1):
        print(f"   {i:2d}. {paper_id}")
    
    # Top papers by Betweenness Centrality
    print("\n   Top 10 by Betweenness Centrality:")
    for i, paper_id in enumerate(influential['betweenness_centrality'][:10], 1):
        print(f"   {i:2d}. {paper_id}")

## Step 3: Community Detection

Analyze community structure in the network:

In [None]:
# Display community information
if 'communities' in network_results and network_results['communities']:
    communities = network_results['communities']
    community_analysis = network_results.get('community_analysis', {})
    
    print(f"\n🏘️ Community Detection Results:")
    print(f"   Total Communities Detected: {len(communities)}")
    
    if community_analysis:
        print(f"   Coverage: {community_analysis['coverage']:.2%}")
        print(f"   Modularity Score: {community_analysis['modularity']:.4f}")
        print(f"   Average Community Size: {community_analysis['average_community_size']:.1f}")
        print(f"   Largest Community: {community_analysis['largest_community_size']} papers")
        print(f"   Smallest Community: {community_analysis['smallest_community_size']} papers")
    
    # Display top 10 largest communities
    print("\n   Largest Communities:")
    sorted_communities = sorted(communities, key=lambda x: x['size'], reverse=True)
    for i, community in enumerate(sorted_communities[:10], 1):
        print(f"   {i:2d}. Community {community['community_id']}: {community['size']} papers")
        print(f"       Internal edges: {community['internal_edges']}, "
              f"External edges: {community['external_edges']}, "
              f"Conductance: {community['conductance']:.4f}")
else:
    print("\n🏘️ No community detection results available")

## Step 4: Data Visualization

Create visualizations of the analysis results:

In [None]:
# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

# Create figure with subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Citation Network Analysis Results', fontsize=16, fontweight='bold')

# Plot 1: Network Metrics Bar Chart
if 'network_metrics' in network_results:
    metrics = network_results['network_metrics']
    
    metric_names = ['Density', 'Avg Degree', 'Clustering', 'Modularity']
    metric_values = [
        metrics['density'] * 1000,  # Scale for visibility
        metrics['average_degree'],
        metrics['clustering_coefficient'],
        metrics['modularity']
    ]
    
    axes[0, 0].bar(metric_names, metric_values, color=['blue', 'green', 'orange', 'red'])
    axes[0, 0].set_title('Network Metrics')
    axes[0, 0].set_ylabel('Value')
    axes[0, 0].tick_params(axis='x', rotation=45)

# Plot 2: Community Size Distribution
if 'communities' in network_results and network_results['communities']:
    community_sizes = [c['size'] for c in network_results['communities']]
    
    axes[0, 1].hist(community_sizes, bins=20, alpha=0.7, color='skyblue', edgecolor='black')
    axes[0, 1].set_title('Community Size Distribution')
    axes[0, 1].set_xlabel('Community Size')
    axes[0, 1].set_ylabel('Frequency')

# Plot 3: Top Papers by PageRank
if 'centrality_metrics' in network_results:
    centrality = network_results['centrality_metrics']
    
    # Get top 10 papers by PageRank
    sorted_papers = sorted(centrality.items(), 
                         key=lambda x: x[1]['pagerank'], 
                         reverse=True)[:10]
    
    paper_names = [f"Paper_{i}" for i in range(len(sorted_papers))]
    pagerank_values = [paper[1]['pagerank'] for paper in sorted_papers]
    
    axes[1, 0].barh(paper_names, pagerank_values, color='lightcoral')
    axes[1, 0].set_title('Top 10 Papers by PageRank')
    axes[1, 0].set_xlabel('PageRank Score')

# Plot 4: Centrality Correlation
if 'centrality_metrics' in network_results:
    centrality = network_results['centrality_metrics']
    
    pagerank_scores = [metrics['pagerank'] for metrics in centrality.values()]
    degree_scores = [metrics['degree_centrality'] for metrics in centrality.values()]
    
    axes[1, 1].scatter(degree_scores, pagerank_scores, alpha=0.6, color='purple')
    axes[1, 1].set_title('PageRank vs Degree Centrality')
    axes[1, 1].set_xlabel('Degree Centrality')
    axes[1, 1].set_ylabel('PageRank')

plt.tight_layout()
plt.show()

print("\n📊 Visualizations created successfully!")

## Step 5: Export Results

Export the analysis results to various formats:

In [None]:
# Export results if configured
if ANALYSIS_CONFIG['export_results'] and 'error' not in network_results:
    print("\n💾 Exporting analysis results...")
    
    # Export configuration
    export_config = ExportConfiguration(
        format='html',
        include_visualizations=True,
        include_raw_data=True,
        metadata={
            'analysis_type': 'network_exploration',
            'notebook': '01_network_exploration.ipynb',
            'max_papers': ANALYSIS_CONFIG['max_papers']
        }
    )
    
    # Export network analysis results
    if 'network_metrics' in network_results:
        network_metrics = network_results['network_metrics']
        centrality_metrics = network_results.get('centrality_metrics', {})
        communities = network_results.get('communities', [])
        
        export_result = analytics.export_engine.export_network_analysis(
            network_metrics=network_metrics,
            centrality_metrics=centrality_metrics,
            communities=communities,
            config=export_config
        )
        
        if export_result.success:
            print(f"✅ Network analysis exported to: {export_result.file_path}")
            print(f"   File size: {export_result.file_size:,} bytes")
            print(f"   Export time: {export_result.export_time:.2f} seconds")
        else:
            print(f"❌ Export failed: {export_result.error_message}")
    
    # Also export as JSON for programmatic use
    json_config = ExportConfiguration(format='json')
    json_result = analytics.export_engine._export_json(
        network_results, 
        'network_exploration', 
        datetime.now()
    )
    
    if json_result.success:
        print(f"✅ Raw data exported to: {json_result.file_path}")
else:
    print("\n💾 Export skipped (disabled in configuration or analysis failed)")

## Summary

This notebook completed comprehensive network exploration analysis including:

1. **Network Structure Analysis** - Basic properties and connectivity patterns
2. **Centrality Analysis** - Identification of influential papers
3. **Community Detection** - Discovery of research clusters
4. **Visualization** - Multiple charts and graphs
5. **Export** - Results saved in multiple formats

### Key Insights

The analysis reveals the structural properties of the citation network and identifies:
- Most influential papers based on various centrality measures
- Community structure showing clusters of related research
- Network topology and connectivity patterns

### Next Steps

- Run `02_citation_analysis.ipynb` for temporal analysis
- Use `03_performance_benchmarks.ipynb` for performance evaluation
- Explore specific communities or influential papers in detail

In [None]:
# Analysis summary
print("\n" + "="*60)
print("🎓 NETWORK EXPLORATION ANALYSIS COMPLETE")
print("="*60)

if 'error' not in network_results:
    graph_info = network_results['graph_info']
    print(f"\n📊 Network analyzed: {graph_info['num_nodes']:,} nodes, {graph_info['num_edges']:,} edges")
    
    if 'communities' in network_results:
        print(f"🏘️ Communities detected: {len(network_results['communities'])}")
    
    if 'centrality_metrics' in network_results:
        print(f"🎯 Centrality calculated for all nodes")
    
    print(f"\n✅ Analysis completed successfully at {datetime.now()}")
else:
    print(f"\n❌ Analysis failed: {network_results['error']}")

print("\n📝 Check exported files for detailed results")
print("🔗 Continue with temporal analysis in the next notebook")