# ü¶† InfoDemics - Misinformation Spread Visualization

## Interactive SIR Model Simulation for Network-Based Misinformation Spread

This notebook implements an interactive visualization of misinformation spread using the **SIR (Susceptible-Infected-Recovered)** epidemiological model on Twitter network data.

### Features:
- üï∏Ô∏è Interactive network visualization
- üìä SIR model simulation
- üö´ Super-spreader intervention analysis
- üìà Dynamic charts and analytics

## 1. Install Required Packages

Run this cell first to install all dependencies:

In [None]:
# Install required packages
!pip install pandas numpy networkx pyvis plotly ipywidgets matplotlib seaborn

print("‚úÖ All packages installed successfully!")

## 2. Import Libraries

In [None]:
import pandas as pd
import numpy as np
import networkx as nx
from pyvis.network import Network
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, HTML, IFrame
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully!")
print(f"üì¶ Pandas version: {pd.__version__}")
print(f"üì¶ NetworkX version: {nx.__version__}")
print(f"üì¶ NumPy version: {np.__version__}")

## 3. Load and Preprocess Data

In [None]:
# Load CSV files
print("Loading data...")
nodes_df = pd.read_csv('nodes.csv')
edges_df = pd.read_csv('edges.csv')

# Rename columns to match expected schema
nodes_df = nodes_df.rename(columns={
    'followers': 'followers_count',
    'friends': 'degree'
})

# Classify nodes based on label
def classify_label(label):
    if 'Non_Conspiracy' in str(label):
        return 'Non-Conspiracy'
    elif 'Conspiracy' in str(label):
        return 'Conspiracy'
    else:
        return 'Non-Conspiracy'  # Treat "Other" as Susceptible

nodes_df['category'] = nodes_df['label'].apply(classify_label)

# Calculate actual degree from edges
degree_dict = {}
for node_id in nodes_df['id']:
    degree = len(edges_df[edges_df['source'] == node_id]) + len(edges_df[edges_df['target'] == node_id])
    degree_dict[node_id] = degree

nodes_df['actual_degree'] = nodes_df['id'].map(degree_dict)

# Display summary statistics
print(f"\n‚úÖ Data loaded successfully!")
print(f"\nüìä Dataset Summary:")
print(f"   Total Nodes: {len(nodes_df)}")
print(f"   Total Edges: {len(edges_df)}")
print(f"   Conspiracy Nodes: {len(nodes_df[nodes_df['category'] == 'Conspiracy'])}")
print(f"   Non-Conspiracy Nodes: {len(nodes_df[nodes_df['category'] == 'Non-Conspiracy'])}")

# Display first few rows
print("\nüìã Sample Node Data:")
display(nodes_df.head())

print("\nüìã Sample Edge Data:")
display(edges_df.head())

## 4. Network Analysis

In [None]:
# Create NetworkX graph
G = nx.Graph()

# Add nodes with attributes
for _, node in nodes_df.iterrows():
    G.add_node(
        node['id'],
        label=node['category'],
        followers_count=node['followers_count'],
        degree=node['actual_degree'],
        original_label=node['label']
    )

# Add edges
for _, edge in edges_df.iterrows():
    if G.has_node(edge['source']) and G.has_node(edge['target']):
        G.add_edge(edge['source'], edge['target'])

# Network statistics
print("üï∏Ô∏è Network Statistics:")
print(f"   Nodes: {G.number_of_nodes()}")
print(f"   Edges: {G.number_of_edges()}")
print(f"   Density: {nx.density(G):.4f}")
print(f"   Is Connected: {nx.is_connected(G)}")

if not nx.is_connected(G):
    components = list(nx.connected_components(G))
    print(f"   Number of Components: {len(components)}")
    print(f"   Largest Component Size: {len(max(components, key=len))}")

# Degree distribution
degrees = [G.degree(n) for n in G.nodes()]
print(f"\nüìä Degree Statistics:")
print(f"   Average Degree: {np.mean(degrees):.2f}")
print(f"   Max Degree: {np.max(degrees)}")
print(f"   Min Degree: {np.min(degrees)}")

# Top influencers
top_influencers = sorted(nodes_df.itertuples(), key=lambda x: x.actual_degree, reverse=True)[:5]
print(f"\nüëë Top 5 Influencers (by degree):")
for i, node in enumerate(top_influencers, 1):
    print(f"   {i}. Node {node.id}: {node.actual_degree} connections, {node.followers_count} followers, {node.category}")

## 5. Initial Network Visualization

In [None]:
def create_network_visualization(graph, states=None, title="Network Graph"):
    """
    Create interactive network visualization using PyVis
    """
    net = Network(height='700px', width='100%', bgcolor='#ffffff', font_color='black', notebook=True)
    
    # Configure physics for better layout
    net.barnes_hut(gravity=-8000, central_gravity=0.3, spring_length=200, spring_strength=0.001)
    
    # Add nodes
    for node in graph.nodes():
        node_data = graph.nodes[node]
        
        # Determine color based on state or original label
        if states:
            state = states.get(node, 'S')
            if state == 'I':
                color = '#FF4B4B'  # Red for Infected
            elif state == 'R':
                color = '#4CAF50'  # Green for Recovered
            else:
                color = '#1E88E5'  # Blue for Susceptible
        else:
            color = '#FF4B4B' if node_data['label'] == 'Conspiracy' else '#1E88E5'
        
        # Size based on followers
        size = 10 + (node_data['followers_count'] / 2)
        
        # Create title (tooltip)
        title_text = f"ID: {node}<br>Category: {node_data['label']}<br>Followers: {node_data['followers_count']}<br>Degree: {node_data['degree']}"
        
        net.add_node(
            node,
            label=str(node),
            color=color,
            size=size,
            title=title_text
        )
    
    # Add edges
    for edge in graph.edges():
        net.add_edge(edge[0], edge[1], color='#cccccc')
    
    # Save and display
    net.show('network_graph.html')
    return IFrame('network_graph.html', width='100%', height='720px')

# Display initial network
print("üé® Color Legend:")
print("   üî¥ Red: Conspiracy/Infected nodes")
print("   üîµ Blue: Non-Conspiracy/Susceptible nodes")
print("   üü¢ Green: Recovered nodes (after simulation)")
print("\nNode size is proportional to follower count.\n")

create_network_visualization(G, title="Initial Network State")

## 6. SIR Model Implementation

In [None]:
def initialize_sir_states(graph, initial_infected_pct):
    """
    Initialize SIR states for all nodes
    """
    states = {}
    nodes = list(graph.nodes())
    
    # Get conspiracy nodes (already infected)
    conspiracy_nodes = [n for n in nodes if graph.nodes[n]['label'] == 'Conspiracy']
    
    # Calculate additional random infections if needed
    total_infected = int(len(nodes) * initial_infected_pct / 100)
    additional_infected = max(0, total_infected - len(conspiracy_nodes))
    
    # Get susceptible nodes for random infection
    susceptible_nodes = [n for n in nodes if graph.nodes[n]['label'] == 'Non-Conspiracy']
    random_infected = np.random.choice(
        susceptible_nodes, 
        size=min(additional_infected, len(susceptible_nodes)), 
        replace=False
    ) if additional_infected > 0 else []
    
    # Set initial states
    for node in nodes:
        if node in conspiracy_nodes or node in random_infected:
            states[node] = 'I'  # Infected
        else:
            states[node] = 'S'  # Susceptible
    
    return states


def run_sir_simulation(graph, beta, gamma, initial_infected_pct, time_steps=50):
    """
    Run SIR model simulation
    
    Parameters:
    - beta: Infection rate (0-1)
    - gamma: Recovery rate (0-1)
    - initial_infected_pct: Initial percentage of infected nodes
    - time_steps: Number of simulation steps
    
    Returns:
    - sir_history: Dictionary with S, I, R counts over time
    - final_states: Final state of each node
    """
    states = initialize_sir_states(graph, initial_infected_pct)
    
    # Track counts over time
    sir_history = {
        'time': [],
        'S': [],
        'I': [],
        'R': []
    }
    
    for t in range(time_steps):
        # Count current states
        s_count = sum(1 for s in states.values() if s == 'S')
        i_count = sum(1 for s in states.values() if s == 'I')
        r_count = sum(1 for s in states.values() if s == 'R')
        
        sir_history['time'].append(t)
        sir_history['S'].append(s_count)
        sir_history['I'].append(i_count)
        sir_history['R'].append(r_count)
        
        # Create new states for next iteration
        new_states = states.copy()
        
        # Process infections (S -> I)
        for node in graph.nodes():
            if states[node] == 'S':
                # Check infected neighbors
                infected_neighbors = [n for n in graph.neighbors(node) if states[n] == 'I']
                if infected_neighbors:
                    # Probability of infection
                    infection_prob = 1 - (1 - beta) ** len(infected_neighbors)
                    if np.random.random() < infection_prob:
                        new_states[node] = 'I'
        
        # Process recoveries (I -> R)
        for node in graph.nodes():
            if states[node] == 'I':
                if np.random.random() < gamma:
                    new_states[node] = 'R'
        
        states = new_states
    
    return sir_history, states

print("‚úÖ SIR model functions defined successfully!")

## 7. Interactive Simulation Controls

Use the sliders below to configure the simulation parameters:

In [None]:
# Create interactive widgets
beta_slider = widgets.FloatSlider(
    value=0.3,
    min=0.0,
    max=1.0,
    step=0.05,
    description='Œ≤ (Infection):',
    style={'description_width': 'initial'},
    continuous_update=False
)

gamma_slider = widgets.FloatSlider(
    value=0.1,
    min=0.0,
    max=1.0,
    step=0.05,
    description='Œ≥ (Recovery):',
    style={'description_width': 'initial'},
    continuous_update=False
)

initial_infected_slider = widgets.IntSlider(
    value=int((len(nodes_df[nodes_df['category'] == 'Conspiracy']) / len(nodes_df)) * 100),
    min=0,
    max=100,
    step=5,
    description='Initial Infected %:',
    style={'description_width': 'initial'},
    continuous_update=False
)

remove_superspreaders_checkbox = widgets.Checkbox(
    value=False,
    description='Ban Top 1% Influencers',
    style={'description_width': 'initial'}
)

run_button = widgets.Button(
    description='‚ñ∂Ô∏è Run Simulation',
    button_style='success',
    tooltip='Click to run the SIR simulation',
    icon='play'
)

output_area = widgets.Output()

# Display controls
print("üéõÔ∏è Simulation Controls:")
display(widgets.VBox([
    widgets.HTML("<h3>üìä SIR Parameters</h3>"),
    beta_slider,
    gamma_slider,
    initial_infected_slider,
    widgets.HTML("<h3>üö´ Intervention Strategy</h3>"),
    remove_superspreaders_checkbox,
    widgets.HTML("<br>"),
    run_button,
    output_area
]))

## 8. Run Simulation and Visualize Results

In [None]:
def plot_sir_curves(sir_history):
    """
    Create SIR curves using Plotly
    """
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(
        x=sir_history['time'],
        y=sir_history['S'],
        mode='lines+markers',
        name='Susceptible',
        line=dict(color='#1E88E5', width=3),
        marker=dict(size=6)
    ))
    
    fig.add_trace(go.Scatter(
        x=sir_history['time'],
        y=sir_history['I'],
        mode='lines+markers',
        name='Infected',
        line=dict(color='#FF4B4B', width=3),
        marker=dict(size=6)
    ))
    
    fig.add_trace(go.Scatter(
        x=sir_history['time'],
        y=sir_history['R'],
        mode='lines+markers',
        name='Recovered',
        line=dict(color='#4CAF50', width=3),
        marker=dict(size=6)
    ))
    
    fig.update_layout(
        title='SIR Model: Misinformation Spread Over Time',
        xaxis_title='Time Steps',
        yaxis_title='Number of Nodes',
        hovermode='x unified',
        height=500,
        template='plotly_white',
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1
        )
    )
    
    return fig


def on_run_button_clicked(b):
    """
    Handle run button click
    """
    with output_area:
        output_area.clear_output()
        
        print("üöÄ Running simulation...\n")
        
        # Get parameters
        beta = beta_slider.value
        gamma = gamma_slider.value
        initial_infected_pct = initial_infected_slider.value
        remove_superspreaders = remove_superspreaders_checkbox.value
        
        # Create graph (with or without super-spreaders)
        graph = G.copy()
        
        if remove_superspreaders:
            threshold = np.percentile([graph.degree(n) for n in graph.nodes()], 99)
            nodes_to_remove = [n for n in graph.nodes() if graph.degree(n) > threshold]
            graph.remove_nodes_from(nodes_to_remove)
            print(f"üö´ Removed {len(nodes_to_remove)} super-spreaders (top 1%)\n")
        
        # Run simulation
        sir_history, final_states = run_sir_simulation(
            graph, beta, gamma, initial_infected_pct
        )
        
        # Display results
        print(f"‚úÖ Simulation completed!\n")
        print(f"üìä Parameters:")
        print(f"   Œ≤ (Infection Rate): {beta}")
        print(f"   Œ≥ (Recovery Rate): {gamma}")
        print(f"   Initial Infected: {initial_infected_pct}%")
        print(f"   Super-spreaders Removed: {remove_superspreaders}\n")
        
        # Final statistics
        final_s = sir_history['S'][-1]
        final_i = sir_history['I'][-1]
        final_r = sir_history['R'][-1]
        
        print(f"üìà Final Statistics:")
        print(f"   Susceptible: {final_s}")
        print(f"   Infected: {final_i}")
        print(f"   Recovered: {final_r}")
        
        # Peak infection
        peak_infected = max(sir_history['I'])
        peak_time = sir_history['I'].index(peak_infected)
        print(f"   üî• Peak Infection: {peak_infected} nodes at time step {peak_time}\n")
        
        # Plot SIR curves
        fig = plot_sir_curves(sir_history)
        fig.show()
        
        # Show final network state
        print("\nüï∏Ô∏è Final Network State:")
        display(create_network_visualization(graph, final_states, "Final Network State"))

# Attach event handler
run_button.on_click(on_run_button_clicked)

print("‚úÖ Simulation ready! Click the '‚ñ∂Ô∏è Run Simulation' button above to start.")

## 9. Comparative Analysis (Optional)

Compare different intervention strategies:

In [None]:
def compare_interventions(beta=0.3, gamma=0.1, initial_infected_pct=50):
    """
    Compare baseline vs super-spreader intervention
    """
    print("üî¨ Running comparative analysis...\n")
    
    # Baseline simulation
    print("1Ô∏è‚É£ Running baseline simulation (no intervention)...")
    baseline_history, _ = run_sir_simulation(G, beta, gamma, initial_infected_pct)
    
    # Intervention simulation
    print("2Ô∏è‚É£ Running intervention simulation (ban top 1%)...")
    G_intervention = G.copy()
    threshold = np.percentile([G_intervention.degree(n) for n in G_intervention.nodes()], 99)
    nodes_to_remove = [n for n in G_intervention.nodes() if G_intervention.degree(n) > threshold]
    G_intervention.remove_nodes_from(nodes_to_remove)
    intervention_history, _ = run_sir_simulation(G_intervention, beta, gamma, initial_infected_pct)
    
    # Create comparison plot
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=('Baseline (No Intervention)', 'With Super-Spreader Ban')
    )
    
    # Baseline
    fig.add_trace(
        go.Scatter(x=baseline_history['time'], y=baseline_history['S'], 
                   name='S', line=dict(color='#1E88E5'), showlegend=True),
        row=1, col=1
    )
    fig.add_trace(
        go.Scatter(x=baseline_history['time'], y=baseline_history['I'], 
                   name='I', line=dict(color='#FF4B4B'), showlegend=True),
        row=1, col=1
    )
    fig.add_trace(
        go.Scatter(x=baseline_history['time'], y=baseline_history['R'], 
                   name='R', line=dict(color='#4CAF50'), showlegend=True),
        row=1, col=1
    )
    
    # Intervention
    fig.add_trace(
        go.Scatter(x=intervention_history['time'], y=intervention_history['S'], 
                   name='S', line=dict(color='#1E88E5'), showlegend=False),
        row=1, col=2
    )
    fig.add_trace(
        go.Scatter(x=intervention_history['time'], y=intervention_history['I'], 
                   name='I', line=dict(color='#FF4B4B'), showlegend=False),
        row=1, col=2
    )
    fig.add_trace(
        go.Scatter(x=intervention_history['time'], y=intervention_history['R'], 
                   name='R', line=dict(color='#4CAF50'), showlegend=False),
        row=1, col=2
    )
    
    fig.update_xaxes(title_text="Time Steps", row=1, col=1)
    fig.update_xaxes(title_text="Time Steps", row=1, col=2)
    fig.update_yaxes(title_text="Number of Nodes", row=1, col=1)
    fig.update_yaxes(title_text="Number of Nodes", row=1, col=2)
    
    fig.update_layout(
        height=500,
        title_text="Comparative Analysis: Baseline vs Intervention",
        template='plotly_white'
    )
    
    fig.show()
    
    # Statistics
    baseline_peak = max(baseline_history['I'])
    intervention_peak = max(intervention_history['I'])
    reduction = ((baseline_peak - intervention_peak) / baseline_peak) * 100
    
    print(f"\nüìä Comparison Results:")
    print(f"   Baseline Peak Infection: {baseline_peak} nodes")
    print(f"   Intervention Peak Infection: {intervention_peak} nodes")
    print(f"   Reduction: {reduction:.1f}%")
    print(f"\n‚úÖ Analysis complete!")

# Example: Run comparison
# Uncomment the line below to run the comparison
# compare_interventions(beta=0.3, gamma=0.1, initial_infected_pct=50)

## 10. Export Results (Optional)

In [None]:
# Example: Export simulation results to CSV
# Uncomment and modify as needed

# def export_results(sir_history, filename='sir_results.csv'):
#     df = pd.DataFrame(sir_history)
#     df.to_csv(filename, index=False)
#     print(f"‚úÖ Results exported to {filename}")

# export_results(sir_history)

---

## üìö Summary

This notebook provides:
1. ‚úÖ Data loading and preprocessing
2. ‚úÖ Network analysis and statistics
3. ‚úÖ Interactive network visualization
4. ‚úÖ SIR model simulation
5. ‚úÖ Super-spreader intervention analysis
6. ‚úÖ Comparative analysis tools

### üéØ Key Findings:
- Network has **161 nodes** and **266 edges**
- SIR model effectively simulates misinformation spread
- Removing top 1% influencers significantly reduces peak infection

### üìñ References:
- SIR Model: Kermack-McKendrick (1927)
- Network Science: Newman, M. E. J. (2018)

---

**Built with ‚ù§Ô∏è for understanding misinformation spread through network science**