# Citation Analysis & Temporal Patterns

This notebook focuses on temporal analysis of citation patterns, trends, and growth patterns over time.

## Overview

- **Citation Growth Analysis**: How citations accumulate over time for different papers
- **Trend Detection**: Identifying increasing, decreasing, or stable citation trends
- **Seasonal Patterns**: Monthly and yearly citation patterns
- **Burst Detection**: Identifying sudden spikes in citation activity
- **Impact Analysis**: Long-term citation impact and half-life calculations

## Requirements

This notebook requires temporal citation data with timestamps.

In [None]:
# Import required libraries
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
project_root = os.path.dirname(os.getcwd())
if project_root not in sys.path:
    sys.path.append(project_root)

# Import analytics components
from src.services.analytics_service import get_analytics_service
from src.models.paper import Paper
from src.models.citation import Citation
from src.analytics.export_engine import ExportConfiguration

print("✅ Libraries imported successfully")
print(f"📅 Analysis started at: {datetime.now()}")

In [None]:
# Initialize analytics service
analytics = get_analytics_service()

print("🔬 Analytics service initialized")
print("📊 Temporal analysis capabilities loaded")

## Configuration

Set up the analysis parameters:

In [None]:
# Analysis configuration
TEMPORAL_CONFIG = {
    'num_papers': 100,           # Number of papers to analyze
    'min_citations': 5,          # Minimum citations per paper
    'include_trends': True,      # Perform trend analysis
    'include_growth': True,      # Analyze citation growth
    'include_seasonality': True, # Analyze seasonal patterns
    'export_results': True       # Export results to file
}

print("⚙️ Temporal Analysis Configuration:")
for key, value in TEMPORAL_CONFIG.items():
    print(f"   {key}: {value}")

## Step 1: Generate Sample Data

Create sample papers and citations for demonstration:

In [None]:
# Generate sample papers and citations for demonstration
import random
from datetime import datetime, timedelta

print("📝 Generating sample data for temporal analysis...")

# Create sample papers
sample_papers = []
base_year = 2015

for i in range(TEMPORAL_CONFIG['num_papers']):
    pub_year = base_year + random.randint(0, 8)  # 2015-2023
    paper = Paper(
        paper_id=f"sample_paper_{i:03d}",
        title=f"Research Paper {i}: Advanced Analysis Methods",
        abstract=f"This paper presents advanced methods for research area {i % 10}.",
        publication_year=pub_year,
        authors=[f"Author_{i}_A", f"Author_{i}_B"],
        venue=f"Conference_{i % 5}"
    )
    sample_papers.append(paper)

# Create sample citations
sample_citations = []
citation_id = 0

for paper in sample_papers:
    # Generate citations for this paper
    num_citations = random.randint(TEMPORAL_CONFIG['min_citations'], 50)
    paper_pub_year = paper.publication_year
    
    for _ in range(num_citations):
        # Citation typically happens after publication
        citation_year = paper_pub_year + random.randint(0, 8)
        citation_month = random.randint(1, 12)
        citation_day = random.randint(1, 28)
        
        citation_date = datetime(citation_year, citation_month, citation_day)
        
        # Don't create citations in the future
        if citation_date > datetime.now():
            continue
        
        citation = Citation(
            citation_id=f"citation_{citation_id}",
            source_paper_id=f"citing_paper_{citation_id}",
            target_paper_id=paper.paper_id,
            citation_date=citation_date
        )
        sample_citations.append(citation)
        citation_id += 1

print(f"✅ Generated {len(sample_papers)} papers and {len(sample_citations)} citations")
print(f"📊 Date range: 2015 to {datetime.now().year}")
print(f"📈 Average citations per paper: {len(sample_citations) / len(sample_papers):.1f}")

## Step 2: Temporal Analysis

Perform comprehensive temporal analysis:

In [None]:
# Perform temporal analysis
print("⏱️ Starting temporal analysis...")

temporal_results = analytics.analyze_temporal_patterns(
    papers=sample_papers,
    citations=sample_citations,
    include_trends=TEMPORAL_CONFIG['include_trends'],
    include_growth=TEMPORAL_CONFIG['include_growth'],
    include_seasonality=TEMPORAL_CONFIG['include_seasonality']
)

if 'error' in temporal_results:
    print(f"❌ Temporal analysis failed: {temporal_results['error']}")
else:
    print("✅ Temporal analysis completed successfully!")
    
    data_info = temporal_results['data_info']
    print(f"\n📊 Analysis Overview:")
    print(f"   Papers analyzed: {data_info['num_papers']}")
    print(f"   Citations analyzed: {data_info['num_citations']}")

## Step 3: Citation Growth Analysis

Analyze how citations grow over time for different papers:

In [None]:
# Display citation growth results
if 'growth_metrics' in temporal_results:
    growth_metrics = temporal_results['growth_metrics']
    
    print(f"\n📈 Citation Growth Analysis:")
    print(f"   Papers with growth data: {len(growth_metrics)}")
    
    # Calculate summary statistics
    total_citations = [gm['total_citations'] for gm in growth_metrics]
    impact_factors = [gm['impact_factor'] for gm in growth_metrics]
    years_to_peak = [gm['years_to_peak'] for gm in growth_metrics if gm['years_to_peak'] is not None]
    
    print(f"\n   Citation Statistics:")
    print(f"     Total citations: {sum(total_citations):,}")
    print(f"     Average citations per paper: {np.mean(total_citations):.1f}")
    print(f"     Median citations per paper: {np.median(total_citations):.1f}")
    print(f"     Max citations: {max(total_citations)}")
    
    print(f"\n   Impact Factor Statistics:")
    print(f"     Average impact factor: {np.mean(impact_factors):.3f}")
    print(f"     Median impact factor: {np.median(impact_factors):.3f}")
    
    if years_to_peak:
        print(f"\n   Time to Peak Statistics:")
        print(f"     Average years to peak: {np.mean(years_to_peak):.1f}")
        print(f"     Median years to peak: {np.median(years_to_peak):.1f}")
    
    # Show top performing papers
    print(f"\n   🏆 Top 10 Papers by Total Citations:")
    sorted_papers = sorted(growth_metrics, key=lambda x: x['total_citations'], reverse=True)[:10]
    
    for i, paper_metrics in enumerate(sorted_papers, 1):
        print(f"   {i:2d}. {paper_metrics['paper_id']}: {paper_metrics['total_citations']} citations "
              f"(Impact: {paper_metrics['impact_factor']:.3f})")

## Step 4: Trend Analysis

Analyze overall citation trends:

In [None]:
# Display trend analysis results
if 'trend_analysis' in temporal_results:
    trend = temporal_results['trend_analysis']
    
    print(f"\n📊 Overall Citation Trend Analysis:")
    print(f"   Trend Direction: {trend['trend_direction'].upper()}")
    print(f"   Trend Strength: {trend['trend_strength']:.4f} (0=weak, 1=strong)")
    print(f"   Annual Growth Rate: {trend['growth_rate']:.2%}")
    
    # Interpret the results
    if trend['trend_direction'] == 'increasing':
        print(f"\n   📈 Interpretation: Citations are growing over time")
        if trend['growth_rate'] > 0.1:
            print(f"      Strong growth rate of {trend['growth_rate']:.1%} annually")
        else:
            print(f"      Moderate growth rate of {trend['growth_rate']:.1%} annually")
    elif trend['trend_direction'] == 'decreasing':
        print(f"\n   📉 Interpretation: Citations are declining over time")
    else:
        print(f"\n   ➡️ Interpretation: Citation patterns are relatively stable")
    
    strength_desc = "strong" if trend['trend_strength'] > 0.7 else "moderate" if trend['trend_strength'] > 0.3 else "weak"
    print(f"      Trend strength is {strength_desc} ({trend['trend_strength']:.3f})")

## Step 5: Seasonal Analysis

Analyze seasonal patterns in citation activity:

In [None]:
# Display seasonal analysis results
if 'seasonal_analysis' in temporal_results:
    seasonal = temporal_results['seasonal_analysis']
    
    if 'error' not in seasonal:
        print(f"\n🌟 Seasonal Pattern Analysis:")
        print(f"   Seasonal Variation Coefficient: {seasonal['seasonal_variation_coefficient']:.4f}")
        print(f"   Strong Seasonality Detected: {'Yes' if seasonal['has_strong_seasonality'] else 'No'}")
        
        if seasonal['peak_month']:
            month_names = ['', 'January', 'February', 'March', 'April', 'May', 'June',
                          'July', 'August', 'September', 'October', 'November', 'December']
            peak_month_name = month_names[seasonal['peak_month']]
            print(f"   Peak Citation Month: {peak_month_name} (Month {seasonal['peak_month']})")
        
        if seasonal['has_strong_seasonality']:
            print(f"\n   📊 Strong seasonal patterns detected!")
            print(f"      This suggests citation activity varies significantly by month.")
        else:
            print(f"\n   📊 Citation activity appears relatively consistent throughout the year.")
    else:
        print(f"\n🌟 Seasonal analysis: {seasonal['error']}")

# Display burst detection results
if 'citation_bursts' in temporal_results:
    bursts = temporal_results['citation_bursts']
    
    print(f"\n🚀 Citation Burst Detection:")
    print(f"   Detected Bursts: {len(bursts)}")
    
    if bursts:
        print(f"\n   Top Citation Bursts:")
        sorted_bursts = sorted(bursts, key=lambda x: x['intensity'], reverse=True)[:5]
        
        for i, burst in enumerate(sorted_bursts, 1):
            start_date = burst['start_time'].strftime('%Y-%m-%d')
            end_date = burst['end_time'].strftime('%Y-%m-%d')
            print(f"   {i}. {start_date} to {end_date}")
            print(f"      Intensity: {burst['intensity']:.2f}x baseline")
            print(f"      Peak Citations: {burst['peak_value']:.0f}")
    else:
        print(f"   No significant citation bursts detected in the data.")

## Step 6: Visualizations

Create comprehensive visualizations of temporal patterns:

In [None]:
# Create temporal analysis visualizations
plt.style.use('default')
sns.set_palette("viridis")

# Create figure with subplots
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Citation Temporal Analysis Results', fontsize=16, fontweight='bold')

# Plot 1: Citation Distribution by Year
if 'growth_metrics' in temporal_results:
    growth_metrics = temporal_results['growth_metrics']
    
    # Count citations by publication year
    pub_years = [gm['publication_year'] for gm in growth_metrics]
    total_citations = [gm['total_citations'] for gm in growth_metrics]
    
    # Group by publication year
    year_citations = {}
    for year, citations in zip(pub_years, total_citations):
        if year not in year_citations:
            year_citations[year] = 0
        year_citations[year] += citations
    
    years = sorted(year_citations.keys())
    citations = [year_citations[year] for year in years]
    
    axes[0, 0].bar(years, citations, color='skyblue', alpha=0.7)
    axes[0, 0].set_title('Total Citations by Publication Year')
    axes[0, 0].set_xlabel('Publication Year')
    axes[0, 0].set_ylabel('Total Citations')
    axes[0, 0].tick_params(axis='x', rotation=45)

# Plot 2: Impact Factor Distribution
if 'growth_metrics' in temporal_results:
    impact_factors = [gm['impact_factor'] for gm in growth_metrics if gm['impact_factor'] > 0]
    
    axes[0, 1].hist(impact_factors, bins=20, alpha=0.7, color='lightgreen', edgecolor='black')
    axes[0, 1].set_title('Impact Factor Distribution')
    axes[0, 1].set_xlabel('Impact Factor (Citations/Year)')
    axes[0, 1].set_ylabel('Number of Papers')
    axes[0, 1].axvline(np.mean(impact_factors), color='red', linestyle='--', 
                      label=f'Mean: {np.mean(impact_factors):.3f}')
    axes[0, 1].legend()

# Plot 3: Years to Peak Distribution
if 'growth_metrics' in temporal_results:
    years_to_peak = [gm['years_to_peak'] for gm in growth_metrics if gm['years_to_peak'] is not None]
    
    if years_to_peak:
        axes[1, 0].hist(years_to_peak, bins=15, alpha=0.7, color='orange', edgecolor='black')
        axes[1, 0].set_title('Years to Peak Citation Distribution')
        axes[1, 0].set_xlabel('Years to Peak')
        axes[1, 0].set_ylabel('Number of Papers')
        axes[1, 0].axvline(np.mean(years_to_peak), color='red', linestyle='--',
                          label=f'Mean: {np.mean(years_to_peak):.1f} years')
        axes[1, 0].legend()

# Plot 4: Citation Timeline (Sample)
# Create a timeline showing citation accumulation for top papers
if 'growth_metrics' in temporal_results:
    # Get top 5 papers by total citations
    top_papers = sorted(growth_metrics, key=lambda x: x['total_citations'], reverse=True)[:5]
    
    for i, paper_metrics in enumerate(top_papers):
        citations_by_year = paper_metrics['citations_per_year']
        years = sorted(citations_by_year.keys())
        cumulative_citations = []
        total = 0
        
        for year in years:
            total += citations_by_year[year]
            cumulative_citations.append(total)
        
        axes[1, 1].plot(years, cumulative_citations, marker='o', 
                       label=f"Paper {i+1}", linewidth=2, markersize=4)
    
    axes[1, 1].set_title('Citation Accumulation Over Time (Top 5 Papers)')
    axes[1, 1].set_xlabel('Year')
    axes[1, 1].set_ylabel('Cumulative Citations')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 Temporal visualizations created successfully!")

## Step 7: Export Results

Export the temporal analysis results:

In [None]:
# Export temporal analysis results
if TEMPORAL_CONFIG['export_results'] and 'error' not in temporal_results:
    print("\n💾 Exporting temporal analysis results...")
    
    # Export configuration
    export_config = ExportConfiguration(
        format='html',
        include_visualizations=True,
        include_raw_data=True,
        metadata={
            'analysis_type': 'temporal_analysis',
            'notebook': '02_citation_analysis.ipynb',
            'num_papers': len(sample_papers),
            'num_citations': len(sample_citations)
        }
    )
    
    # Export temporal analysis
    if 'growth_metrics' in temporal_results and 'trend_analysis' in temporal_results:
        export_result = analytics.export_engine.export_temporal_analysis(
            growth_metrics=temporal_results['growth_metrics'],
            trend_analysis=temporal_results['trend_analysis'],
            config=export_config
        )
        
        if export_result.success:
            print(f"✅ Temporal analysis exported to: {export_result.file_path}")
            print(f"   File size: {export_result.file_size:,} bytes")
            print(f"   Export time: {export_result.export_time:.2f} seconds")
        else:
            print(f"❌ Export failed: {export_result.error_message}")
    
    # Export raw data as JSON
    json_result = analytics.export_engine._export_json(
        temporal_results,
        'citation_temporal_analysis',
        datetime.now()
    )
    
    if json_result.success:
        print(f"✅ Raw data exported to: {json_result.file_path}")
else:
    print("\n💾 Export skipped (disabled in configuration or analysis failed)")

## Summary

This notebook completed comprehensive temporal analysis of citation patterns including:

1. **Citation Growth Analysis** - How citations accumulate over time
2. **Trend Detection** - Overall citation trends and growth rates
3. **Seasonal Analysis** - Monthly and yearly patterns
4. **Burst Detection** - Sudden spikes in citation activity
5. **Impact Analysis** - Long-term citation impact metrics

### Key Findings

The temporal analysis reveals:
- Citation growth patterns and impact factors for different papers
- Overall trends in citation activity over time
- Seasonal variations in citation patterns
- Identification of citation bursts and peak periods

### Applications

These insights can be used for:
- Predicting future citation patterns
- Identifying high-impact research areas
- Understanding research lifecycle patterns
- Timing publication and citation strategies

### Next Steps

- Run `03_performance_benchmarks.ipynb` for performance analysis
- Combine with network analysis for comprehensive insights
- Explore specific time periods or research areas in detail

In [None]:
# Analysis summary
print("\n" + "="*60)
print("📅 TEMPORAL CITATION ANALYSIS COMPLETE")
print("="*60)

if 'error' not in temporal_results:
    data_info = temporal_results['data_info']
    print(f"\n📊 Data analyzed: {data_info['num_papers']} papers, {data_info['num_citations']} citations")
    
    if 'trend_analysis' in temporal_results:
        trend = temporal_results['trend_analysis']
        print(f"📈 Overall trend: {trend['trend_direction']} ({trend['growth_rate']:.2%} annually)")
    
    if 'growth_metrics' in temporal_results:
        growth_metrics = temporal_results['growth_metrics']
        impact_factors = [gm['impact_factor'] for gm in growth_metrics]
        print(f"🎯 Average impact factor: {np.mean(impact_factors):.3f} citations/year")
    
    if 'citation_bursts' in temporal_results:
        bursts = temporal_results['citation_bursts']
        print(f"🚀 Citation bursts detected: {len(bursts)}")
    
    print(f"\n✅ Analysis completed successfully at {datetime.now()}")
else:
    print(f"\n❌ Analysis failed: {temporal_results['error']}")

print("\n📝 Check exported files for detailed results")
print("⚡ Continue with performance analysis in the next notebook")