<a id='introduction'></a>
# <p style="padding:15px;background-color:#fff798;margin:10px 0;color:#435672;font-family:'Arial',sans-serif;text-align:center;border-radius:15px 50px;overflow:hidden;font-weight:600">üá™üá∫üèõÔ∏è European Citizens' Initiative: Commission Response</p>

<div align="center">
  <img src="LOGO CE_RGB_MUTE_POS.svg" alt="EU Commission Logo" height="200" style="display:inline-block; margin:10px;">
  <img src="1_2021_1-1.jpg" alt="ECI Material" height="200" style="display:inline-block; margin:10px;">
</div>

<p style="text-align:center;">
  <i>Source: European Citizens' Initiative | European Commission (CC BY 4.0)</i>
</p>

Examines Commission responses to [European Citizens' Initiative proposals](https://commission.europa.eu/get-involved/engage-eu-policymaking/european-citizens-initiative_en) that successfully met signature thresholds between 2012 and 2025. Once an ECI collects 1 million signatures from at least 7 member states, the Commission must provide a formal response within 6 months explaining whether it will propose new legislation. This dataset tracks 11 out of 16 ECIs that successfully met both signature criteria (1M+ signatures and 7-country thresholds) ‚Äî analyzing Commission response types, implementation timelines, parliamentary engagement, and follow-up actions.

This analysis focuses exclusively on what happens after ECIs meet signature requirements, building upon the previous [**üá™üá∫‚úçÔ∏è European Citizens' Initiative: Signature Collection**]() study which examined all 121 registered ECIs. It does not cover the registration approval process itself, including which proposed ECIs were refused registration or how to prepare a successful registration application ‚Äî [more about this](https://citizens-initiative.europa.eu/how-it-works_en). However, meeting signature thresholds does not guarantee legislative action ‚Äî a reality known as the ["successful but failed" paradox](https://thegoodlobby.eu/when-failure-succeeds-and-success-fails-a-reality-check-on-the-european-citizens-initiative/).

Success in this analysis is measured by Commission outcome categories (Law Active, Law Promised, Rejected, etc.) and implementation status, not by whether proposals were substantively "correct" or how individual organizers interpret their outcomes.

NOTE:<br>
> If you're interested in understanding how laws are passed at the EU level and how much power each institution holds, watch this [10-minutes explanation](https://www.youtube.com/watch?v=cotxhOkux18).

<a id='table-of-contents'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">üß≠ Table of content</p>

[üåü **Introduction**](#introduction)

[‚ùì **Questions to Ask:**](#question-1)
- [1. Success Patterns](#question-1)
- [2. Temporal Patterns](#question-2)
- [3. Parliament Actions](#question-3)
- [4. Funding Patterns](#question-4)
- [5. Geographic Strategies](#question-5)
- [6. Organizational](#question-6)
- [7. Content Features](#question-7)
- [8. Commission Engagement](#question-8)
- [9. Response Mechanisms](#question-9)
- [10. Key Findings](#question-10)


<a id='setup'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">‚öôÔ∏è Setup: Import Libraries and Load Data</p>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import json
import warnings
warnings.filterwarnings('ignore')

# Set defaults
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)

# Load main datasets
# Load the dataset
data_folder = "../data/2025-09-18_16-33-57"

df_initiatives = pd.read_csv(f'{data_folder}/eci_initiatives_2025-11-04_11-59-38.csv')
df_merger = pd.read_csv(f'{data_folder}/eci_merger_responses_and_followup_2026-01-13_13-52-31.csv')

print(f"‚úì Initiatives file: {df_initiatives.shape[0]} ECIs")
print(f"‚úì Merger file: {df_merger.shape[0]} Commission responses")
print(f"\nColumns: {len(df_initiatives.columns)} initiative columns")
print(f"        {len(df_merger.columns)} response columns")

‚úì Initiatives file: 121 ECIs
‚úì Merger file: 11 Commission responses

Columns: 26 initiative columns
        36 response columns


<a id='data-cleaning'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">üßπ Data Cleaning and Feature Engineering</p>

In [2]:
# ==============================================================================
# DATA CLEANING AND FEATURE ENGINEERING
# ==============================================================================

import pandas as pd
import numpy as np
import json
import json
import re
import pandas as pd
from datetime import datetime
from dateutil import parser

# ==============================================================================
# COMMON UTILITY FUNCTIONS (Reusable across entire notebook)
# ==============================================================================

# ------------------------------------------------------------------------------
# Numeric Parsing Functions
# ------------------------------------------------------------------------------

def parse_numeric_with_separators(value):
    """Convert numeric strings with commas/separators to float."""
    if pd.isna(value):
        return np.nan
    if isinstance(value, str):
        return float(value.replace(',', ''))
    return float(value)

def parse_numeric_or_zero(value):
    """Convert numeric strings to float, return 0 for missing values."""
    if pd.isna(value):
        return 0
    if isinstance(value, str):
        return float(value.replace(',', ''))
    return float(value)

# ------------------------------------------------------------------------------
# Date Parsing Functions
# ------------------------------------------------------------------------------

def parse_date(date_str):
    """Convert date string to datetime object (supports DD/MM/YYYY and ISO)."""
    if pd.isna(date_str) or date_str == '':
        return pd.NaT
    try:
        return pd.to_datetime(date_str, format='%d/%m/%Y')
    except:
        try:
            return pd.to_datetime(date_str, errors='coerce')
        except:
            return pd.NaT

def parse_dates_in_dataframe(df, date_columns):
    """Parse multiple date columns in a dataframe."""
    df = df.copy()
    for col in date_columns:
        if col in df.columns:
            df[col] = df[col].apply(parse_date)
    return df

def format_dates_for_display(df, date_columns):
    """
    Convert datetime columns to formatted strings for display.
    Creates new columns with '_formatted' suffix.
    """
    df = df.copy()
    for col in date_columns:
        if col in df.columns:
            df[col] = pd.to_datetime(df[col], errors='coerce')
            df[f'{col}_formatted'] = df[col].dt.strftime('%d %b %Y').fillna('Unknown')
    return df

# ------------------------------------------------------------------------------
# Duration Calculation Functions
# ------------------------------------------------------------------------------

def format_duration(start_date, end_date):
    """Format duration between two dates as 'X years Y months Z days'."""
    if pd.isna(start_date) or pd.isna(end_date):
        return 'Unknown'
    
    delta = (end_date - start_date).days
    
    if delta < 0:
        return 'Invalid date range'
    
    years = delta // 365
    remaining = delta % 365
    months = remaining // 30
    days = remaining % 30
    
    parts = []
    if years > 0:
        parts.append(f"{years} year{'s' if years > 1 else ''}")
    if months > 0:
        parts.append(f"{months} month{'s' if months > 1 else ''}")
    if days > 0 or len(parts) == 0:
        parts.append(f"{days} day{'s' if days != 1 else ''}")
    
    return ' '.join(parts)

# ------------------------------------------------------------------------------
# Text Formatting Functions
# ------------------------------------------------------------------------------

def format_objective(objective, words_per_line=11):
    """Format objective text with line breaks and bullet preservation for hover tooltips."""
    if pd.isna(objective):
        return "No objective provided"
    
    obj_str = str(objective)
    bullet_sections = obj_str.split('‚Ä¢')
    formatted_sections = []
    
    for i, section in enumerate(bullet_sections):
        section = section.strip()
        if not section:
            continue
            
        if i > 0:
            section = '‚Ä¢ ' + section
        
        words = section.split()
        lines = [' '.join(words[j:j+words_per_line]) for j in range(0, len(words), words_per_line)]
        formatted_sections.append('<br>'.join(lines))
    
    return '<br>'.join(formatted_sections)

# ------------------------------------------------------------------------------
# JSON Parsing Functions
# ------------------------------------------------------------------------------

def safe_json_load(x):
    """Safely parse JSON strings."""
    try:
        return json.loads(x) if pd.notna(x) else None
    except:
        return None

# ------------------------------------------------------------------------------
# Updating `status` for `laws_actions` from historical to current one
# ------------------------------------------------------------------------------

def extract_effective_date_text(text):
    """
    Extracts the effective date and its corresponding status label from text.
    Returns: (datetime_object, status_string)
             status_string is either 'law_active' or 'in_vacatio_legis'
    """
    if not isinstance(text, str):
        return None, None
    
    # Clean up text for easier parsing
    text_clean = text.replace('\xa0', ' ').strip()
    
    # 1. Regex definitions
    
    # Force: "entered into force on 18 August 2024"
    force_pattern = r"(?:enter(?:ed)?|coming|came).{0,10}(?:into force).{0,10}(?:on|in)\s+((?:\d{1,2}\s+)?[A-Za-z]{3,10}\s+\d{4}|\d{4}-\d{2}-\d{2})"
    
    # Apply: "applies from...", "applicable on..."
    apply_pattern = r"(?<!will )(?:appl(?:y|ies|ied|icable|ication)|transpos(?:ed|ition)|edition).{0,20}(?:on|from|in)\s+((?:\d{1,2}\s+)?[A-Za-z]{3,10}\s+\d{4}|\d{4}-\d{2}-\d{2})"
    
    # Special Case: "became applicable immediately"
    immediate_pattern = r"(?:bec(?:ome|ame|omes)|is|are)\s+applicable\s+immediately"
    
    def parse_date(d_str):
        try:
            # Clean up "on " or "in " prefixes if captured
            d_str = re.sub(r'^(on|in)\s+', '', d_str.strip(), flags=re.IGNORECASE)
            return parser.parse(d_str)
        except:
            return None

    valid_dates = []

    # 2. Check for "Applicable Immediately" Case
    # If explicitly stated as immediately applicable, the Force date IS 'law_active'
    if re.search(immediate_pattern, text_clean, re.IGNORECASE):
        force_match = re.search(force_pattern, text_clean, re.IGNORECASE)
        if force_match:
            dt = parse_date(force_match.group(1))
            if dt:
                return dt, 'law_active'

    # 3. Standard Search
    
    # Find Application Dates -> Map to 'law_active'
    for match in re.finditer(apply_pattern, text_clean, re.IGNORECASE):
        dt = parse_date(match.group(1))
        if dt:
            valid_dates.append({'date': dt, 'status': 'law_active'})

    # Find Entry into Force Dates -> Map to 'in_vacatio_legis'
    for match in re.finditer(force_pattern, text_clean, re.IGNORECASE):
        dt = parse_date(match.group(1))
        if dt:
            valid_dates.append({'date': dt, 'status': 'in_vacatio_legis'})

    if not valid_dates:
        return None, None
        
    # Priority: law_active > in_vacatio_legis
    active_dates = [d for d in valid_dates if d['status'] == 'law_active']
    if active_dates:
        active_dates.sort(key=lambda x: x['date'], reverse=True)
        return active_dates[0]['date'], 'law_active'
    
    # Fallback to force date (in_vacatio_legis)
    valid_dates.sort(key=lambda x: x['date'], reverse=True)
    return valid_dates[0]['date'], 'in_vacatio_legis'

def update_law_status(json_str, current_date_str="2026-01-13"):
    if pd.isna(json_str) or json_str == "":
        return json_str
    
    try:
        actions = json.loads(json_str)
    except:
        return json_str
        
    current_date = datetime.strptime(current_date_str, "%Y-%m-%d")
    updated_actions = []
    
    for action in actions:
        desc = action.get('description', '')
        
        # Get date AND the direct status label
        text_date, status_label = extract_effective_date_text(desc)
        
        if text_date:
            # Check if date is in the past relative to analysis date
            if text_date <= current_date:
                action['status'] = status_label
            else:
                # If date is future, it's always in_vacatio_legis regardless of type
                action['status'] = 'in_vacatio_legis'
            
            # FIX: Update the date field to reflect the new effective date found
            action['date'] = text_date.strftime('%Y-%m-%d')
                
        updated_actions.append(action)
        
    return json.dumps(updated_actions)


# ==============================================================================
# DATA CLEANING: Apply to Raw Data
# ==============================================================================

# ------------------------------------------------------------------------------
# Signature Data Parsing
# ------------------------------------------------------------------------------

df_initiatives['signatures_numeric'] = df_initiatives['signatures_collected'].apply(parse_numeric_with_separators)
df_initiatives['signatures_threshold_met_numeric'] = pd.to_numeric(
    df_initiatives['signatures_threshold_met'], errors='coerce'
)

# ------------------------------------------------------------------------------
# Funding Data Parsing
# ------------------------------------------------------------------------------

df_initiatives['funding_numeric'] = df_initiatives['funding_total'].apply(parse_numeric_or_zero)

# ------------------------------------------------------------------------------
# Date Parsing
# ------------------------------------------------------------------------------

date_cols = ['timeline_registered', 'timeline_collection_start_date', 
             'timeline_collection_closed', 'timeline_verification_start',
             'timeline_verification_end', 'timeline_response_commission_date']

df_initiatives = parse_dates_in_dataframe(df_initiatives, date_cols)

# Extract year from registration
df_initiatives['registration_year'] = df_initiatives['timeline_registered'].dt.year

# ------------------------------------------------------------------------------
# Duration Calculations (Basic)
# ------------------------------------------------------------------------------

df_initiatives['collection_days'] = (
    df_initiatives['timeline_collection_closed'] - 
    df_initiatives['timeline_collection_start_date']
).dt.days

df_initiatives['verification_days'] = (
    df_initiatives['timeline_verification_end'] - 
    df_initiatives['timeline_verification_start']
).dt.days

df_initiatives['time_to_response_days'] = (
    df_initiatives['timeline_response_commission_date'] - 
    df_initiatives['timeline_registered']
).dt.days

# ------------------------------------------------------------------------------
# JSON Fields Parsing
# ------------------------------------------------------------------------------

df_initiatives['organizer_data'] = df_initiatives['organizer_representative'].apply(safe_json_load)

# ==============================================================================
# FEATURE ENGINEERING: Create Analysis Features
# ==============================================================================

# ------------------------------------------------------------------------------
# Success Metrics (Core Analysis Features)
# ------------------------------------------------------------------------------

df_initiatives['reached_signatures'] = df_initiatives['signatures_numeric'] >= 1000000
df_initiatives['met_country_threshold'] = df_initiatives['signatures_threshold_met_numeric'] >= 7
df_initiatives['successful_eci'] = (
    df_initiatives['reached_signatures'] & 
    df_initiatives['met_country_threshold']
)

success_outcomes = ['Commission Response', 'Answered initiative', 'Valid initiative']
df_initiatives['is_successful'] = df_initiatives['final_outcome'].isin(success_outcomes).astype(int)

# ------------------------------------------------------------------------------
# Signature Volume Categories
# ------------------------------------------------------------------------------

df_initiatives['signature_category'] = pd.cut(
    df_initiatives['signatures_numeric'],
    bins=[0, 100000, 500000, 1000000, float('inf')],
    labels=['<100k', '100k-500k', '500k-1M', '>1M'],
    include_lowest=True
)

# ------------------------------------------------------------------------------
# Renaming 'Law Approved' to 'Law Passed' for better readability
# ------------------------------------------------------------------------------

df_merger['final_outcome_status'] = df_merger['final_outcome_status'].replace('Law Approved', 'Law Passed')

# ------------------------------------------------------------------------------
# Updating `laws actions` to current status
# ------------------------------------------------------------------------------

df_merger['laws_actions'] = df_merger['laws_actions'].apply(update_law_status)

# ------------------------------------------------------------------------------
#  `laws_actions` - Fix False Positive Status Values
# ------------------------------------------------------------------------------
# Corrects specific cases where "adopted" incorrectly refers to a policy 
# document (Vision for Agriculture and Food) rather than the legislation itself.
# These entries describe future legislative proposals ("will present proposals")
# and should be marked as "planned" instead of "adopted".
# ------------------------------------------------------------------------------

import json
import pandas as pd

def fix_vision_status(json_str):
    """
    Corrects the status of specific 'Vision for Agriculture' entries where
    'adopted' refers to the Vision document, not the legislation.
    """
    if pd.isna(json_str) or json_str == "":
        return json_str
    
    try:
        actions = json.loads(json_str)
    except (json.JSONDecodeError, TypeError):
        return json_str
        
    modified = False
    # Unique substring identifying the specific false positive cases
    target_phrase = "Vision for Agriculture and Food adopted on 19 February 2025"
    
    for action in actions:
        description = action['description']
        status = action['status']
        
        # Check if this is the specific false positive case
        if target_phrase in description and status == 'adopted':
            # "will present proposals" implies the legislation is in planning stage
            action['status'] = 'planned'
            modified = True
                
    if modified:
        return json.dumps(actions)
    return json_str

df_merger['laws_actions'] = df_merger['laws_actions'].apply(fix_vision_status)

# ------------------------------------------------------------------------------
# Updating `final status` based on the `laws actions`
# ------------------------------------------------------------------------------

# Define Hierarchy Ranks (Higher number = "more advanced" status)
OUTCOME_HIERARCHY = {
    'Law Active': 50,
    'Law Passed': 40,
    'Law Proposed': 30,
    'Law Promised': 20,
    'Action Plan Created': 20, # Equivalent to Promised
    'Being Studied': 10,
    'Rejected - Already Covered': 0,
    'Rejected - Alternative Actions': 0,
    'Rejected': 0,
    'Withdrawn': 0
}

# Map CSV 'status' values to the Outcome Hierarchy
# Tuple format: (New Label, Rank)
ACTION_STATUS_MAPPING = {
    'law_active': ('Law Active', 50),
    'in_vacatio_legis': ('Law Passed', 40),
    'adopted': ('Law Passed', 40), # Maps 'adopted' acts to Approved
    'proposed': ('Law Promised', 20), # Maps 'proposed'/consultations to Promised
    'planned': ('Law Promised', 20),
    'withdrawn': ('Withdrawn', 0)
}

def update_outcome_status_upwards(row):
    """
    Updates final_outcome_status only if the derived status from laws_actions
    is strictly higher in the hierarchy than the current status.
    """

    current_status = row['final_outcome_status']
    current_rank = OUTCOME_HIERARCHY.get(current_status, -1)
    
    actions_json = row['laws_actions']
    
    # Parse JSON
    if pd.isna(actions_json) or actions_json == '':
        return current_status
    try:
        actions = json.loads(actions_json)
    except:
        return current_status
        
    if not actions:
        return current_status

    # Find the highest rank achieved by any action in the list
    max_action_rank = -1
    best_new_status = current_status
    
    for action in actions:
        act_status = action['status'].lower()
        
        if act_status in ACTION_STATUS_MAPPING:
            candidate_status, candidate_rank = ACTION_STATUS_MAPPING[act_status]
            
            # Keep the highest ranked action
            if candidate_rank > max_action_rank:
                max_action_rank = candidate_rank
                best_new_status = candidate_status
    
    # Update only if the found action status is higher than the current outcome
    if max_action_rank > current_rank:
        return best_new_status
    
    return current_status

# Apply the function to the merged dataframe
df_merger['final_outcome_status'] = df_merger.apply(update_outcome_status_upwards, axis=1)

# ==============================================================================
# CREATE MASTER MERGED DATAFRAME (Once, used by all plots)
# ==============================================================================

# Merge df_merger with df_initiatives to create comprehensive dataset
merged_data = df_merger.merge(
    df_initiatives, 
    on='registration_number', 
    how='left',
    suffixes=('_response', '_initiative')
)


print("‚úÖ Data cleaning complete!")
print(f"   - Initiatives: {len(df_initiatives)} records")
print(f"   - Commission Responses: {len(df_merger)} records")
print(f"   - Merged dataset: {len(merged_data)} records")
print(f"   - Signatures converted: {df_initiatives['signatures_numeric'].notna().sum()} records")
print(f"   - Successful ECIs: {df_initiatives['successful_eci'].sum()} out of {len(df_initiatives)}")

‚úÖ Data cleaning complete!
   - Initiatives: 121 records
   - Commission Responses: 11 records
   - Merged dataset: 11 records
   - Signatures converted: 60 records
   - Successful ECIs: 16 out of 121


<a id='question-1'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">1. What are the success patterns and outcome distributions for ECIs?</p>

Analyzes the final outcomes of European Citizens' Initiatives that reached the Commission response stage, categorizing them by the **highest legislative status achieved** to understand what happens after 1 million signatures are collected.<br>

This question examines whether collecting signatures translates into concrete legal changes, policy commitments, or rejections. A "successful" classification means **some legislative action occurred**, not that the initiative achieved its full vision. Consequently, initiatives resulting **only in non-legislative actions** (such as working groups or communications) are **not treated as "success"** in this specific analysis. Furthermore, the **interpretation of "success" can vary significantly**: organizers may view partial implementation as a failure if core demands are unmet, while institutions may view the same outcome as a policy victory.<br>

**NOTE:**<br>
> - The **outcome ranking** (`Law Active`, `Law Passed`, etc.) represents the **highest legislative milestone reached**, not comprehensive success.
> - A status like `Law Active` means **at least one law** related to the ECI became operational, but **does not guarantee all ECI objectives were met**.
> - The classification **prioritizes legislative actions** over non-legislative measures (like international negotiations, policy frameworks, or stakeholder consultations).
> - **Partial implementation is common**: The Commission may adopt some ECI demands while rejecting others, leading to outcomes that organizers may not consider successful.

**Example: Ban Glyphosate ECI (2017/000002)**<br>
This initiative had [three objectives](https://citizens-initiative.europa.eu/initiatives/details/2017/000002/ban-glyphosate-and-protect-people-and-environment-toxic-pesticides_en):
1. **Ban glyphosate-based herbicides** ‚Üí ‚ùå **Rejected**: Commission stated "neither scientific nor legal grounds to justify a ban"
2. **Ensure transparent, publicly-commissioned studies** ‚Üí ‚úÖ **Success**: [Regulation 2019/1381](https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32019R1381) entered into force on March 27, 2021, strengthening transparency in EU food safety assessments
3. **Set mandatory pesticide reduction targets** ‚Üí üü° **Partial**: The Farm to Fork Strategy (May 2020) established binding 50% reduction targets for chemical pesticide use and risk by 2030. However, these remain **policy commitments** rather than legally binding law.


**Organizers' Response**: Despite the passed law, some organizers were [dissatisfied](https://www.pan-europe.info/press-releases/2017/12/commission-rejects-demands-stopglyphosate-citizens%E2%80%99-initiative), stating the Commission *"proposed action that could fulfil one aspect of one of the three demands"* while rejecting the primary objective of banning glyphosate and ignoring mandatory pesticide reduction targets.

In [3]:
# ==============================================================================
# PIE CHART: Outcome Distribution
# ==============================================================================


import plotly.graph_objects as go
import json


# Helper function to check if ECI has 'Collection closed' status in timeline
def has_collection_closed(timeline_json):
    """Check if 'Collection closed' appears in timeline."""
    if pd.isna(timeline_json):
        return False
    try:
        timeline = json.loads(timeline_json) if isinstance(timeline_json, str) else timeline_json
        if isinstance(timeline, list):
            return any('Collection closed' in str(event.get('step', '')) for event in timeline)
    except:
        return False
    return False


# Find ECIs that met thresholds AND have 'Collection closed' in their timeline
successful_closed = df_initiatives[
    (df_initiatives['successful_eci'] == True) & 
    (df_initiatives['timeline'].apply(has_collection_closed))
]['registration_number'].tolist()

# Get ECIs that have actual Commission responses (in df_merger)
responded_ecis = df_merger['registration_number'].tolist()

# ECIs with 'Collection closed' status but no Commission response yet = waiting
waiting_ecis = [eci for eci in successful_closed if eci not in responded_ecis]
waiting_count = len(waiting_ecis)


# Use merged_data ONLY for ECIs with Commission responses
plot_data = merged_data[['final_outcome_status', 'title']].copy()


# Prepare data for responded ECIs
outcome_counts = plot_data['final_outcome_status'].value_counts().reset_index()
outcome_counts.columns = ['Outcome', 'Count']


# Add "Waiting for Response" row for closed collections without response
if waiting_count > 0:
    waiting_row = pd.DataFrame({
        'Outcome': ['Waiting for Response'],
        'Count': [waiting_count]
    })
    outcome_counts = pd.concat([outcome_counts, waiting_row], ignore_index=True)


# Calculate total (responded + waiting)
total_count = len(df_merger) + waiting_count
outcome_counts['Percentage'] = (outcome_counts['Count'] / total_count * 100).round(1)


# Define outcome ranking for color assignment (higher = better)
outcome_ranking = {
    'Law Active': 5,
    'Law Passed': 4,
    'Law Promised': 3,
    'Action Plan Created': 2,
    'Being Studied': 1,
    'Waiting for Response': 0,
    'Rejected - Alternative Actions': -1,
    'Rejected - Already Covered': -2,
}


# Add ranking to dataframe
outcome_counts['Rank'] = outcome_counts['Outcome'].map(outcome_ranking)


# Create gradient colors based on ranking
def get_outcome_color(outcome):
    """Generate color based on outcome (green=good, yellow=neutral, red=bad)."""
    if outcome == 'Law Active':
        return 'rgb(60, 163, 113)'
    elif outcome == 'Law Passed':
        return 'rgb(102, 187, 106)'
    elif outcome == 'Law Promised':
        return 'rgb(156, 204, 101)'
    elif outcome == 'Action Plan Created':
        return 'rgb(255, 193, 7)'
    elif outcome == 'Being Studied':
        return 'rgb(255, 152, 0)'
    elif outcome == 'Rejected - Alternative Actions':
        return 'rgb(244, 67, 54)'
    elif outcome == 'Rejected - Already Covered':
        return 'rgb(183, 28, 28)'
    elif outcome == 'Waiting for Response':
        return 'rgb(158, 158, 158)'
    else:
        return 'rgb(117, 117, 117)'


outcome_counts['Color'] = outcome_counts['Outcome'].apply(get_outcome_color)


# Prepare ECI lists for hover
def prepare_eci_list_for_hover(ecis, max_items=15):
    """Prepare ECI title list for hover tooltips with truncation."""
    if not ecis:
        return "No ECIs"
    elif len(ecis) <= max_items:
        return '<br>'.join(f"‚Ä¢ {title}" for title in ecis)
    else:
        text = '<br>'.join(f"‚Ä¢ {title}" for title in ecis[:max_items])
        text += f"<br><i>... (and {len(ecis) - max_items} more)</i>"
        return text


eci_lists = []
for outcome in outcome_counts['Outcome']:
    if outcome == 'Waiting for Response':
        waiting_titles = df_initiatives[
            df_initiatives['registration_number'].isin(waiting_ecis)
        ]['title'].tolist()
        eci_lists.append(prepare_eci_list_for_hover(waiting_titles))
    else:
        ecis = plot_data[plot_data['final_outcome_status'] == outcome]['title'].tolist()
        eci_lists.append(prepare_eci_list_for_hover(ecis))


outcome_counts['ECI_List'] = eci_lists


# Sort by rank (best outcomes first)
outcome_counts = outcome_counts.sort_values('Rank', ascending=False)


# Create pie chart
fig = go.Figure(go.Pie(
    labels=outcome_counts['Outcome'],
    values=outcome_counts['Count'],
    hole=0.1,
    marker=dict(colors=outcome_counts['Color'].tolist()),
    customdata=outcome_counts['ECI_List'],
    hovertemplate='<b>%{label}</b><br>' +
                  'Count: %{value}<br>' +
                  'Percentage: %{percent}<br><br>' +
                  '<b>ECIs:</b><br>%{customdata}' +
                  '<extra></extra>',
    textinfo='percent+label',
    textposition='inside',
    textfont=dict(size=11, color='white', family='Arial Black'),
    sort=False
))


fig.update_layout(
    title=f'<b>Commission Response Outcomes ({len(df_merger)} responded, {waiting_count} waiting)</b>',
    height=600,
    showlegend=True,
    legend=dict(
        font=dict(size=12),
        orientation='v',
        yanchor='middle',
        y=0.5,
        xanchor='left',
        x=1.02
    )
)


fig.show()


from IPython.display import HTML


# Write down all ECIs
outcome_counts_display = outcome_counts[['ECI_List', 'Outcome']].copy()
pd.set_option('display.max_colwidth', None)
display(HTML(outcome_counts_display.to_html(escape=False, index=False)))


ECI_List,Outcome
"‚Ä¢ Water and sanitation are a human right! Water is a public good, not a commodity! ‚Ä¢ Ban glyphosate and protect people and the environment from toxic pesticides ‚Ä¢ Save bees and farmers ! Towards a bee-friendly agriculture for a healthy environment ‚Ä¢ Stop Finning ‚Äì Stop the trade",Law Active
‚Ä¢ Fur Free Europe,Law Passed
‚Ä¢ End the Cage Age,Law Promised
‚Ä¢ SAVE CRUELTY FREE COSMETICS - COMMIT TO A EUROPE WITHOUT ANIMAL TESTING,Action Plan Created
"‚Ä¢ Stop Extremism ‚Ä¢ Ban on conversion practices in the European Union ‚Ä¢ My Voice, My Choice: For Safe And Accessible Abortion ‚Ä¢ Stop Destroying Videogames",Waiting for Response
‚Ä¢ Stop vivisection,Rejected - Alternative Actions
‚Ä¢ One of us ‚Ä¢ Minority SafePack ‚Äì one million signatures for diversity in Europe ‚Ä¢ Cohesion policy for the equality of the regions and sustainability of the regional cultures,Rejected - Already Covered


For an ECI organizer, collecting 1 million signatures is just the **"entry ticket"**. The real challenge begins afterward. Here is how the legislative machine processes our demands, illustrated by actual ECI cases from data.

1. **(Law Active, Law Passed)** - Legislative Success:<br>
   The law is adopted. The legislative process is finished. 
   The organizer has secured a tangible legal act.<br>
   Example: ['Right2Water'](https://www.greeneuropeanjournal.eu/the-success-story-of-the-right2water-european-citizens-initiative/) protected public water services from privatization and influenced EU drinking water standards.
   
2. **(Law Promised, Action Plan Created, Being Studied)** - Political Commitment:<br>
   The Commission has promised action or is investigating, but NO legal act exists yet.<br>
   Risky phase: Promises can be delayed or dropped (e.g., ['End the Cage Age'](https://animalequality.org.uk/news/2025/03/25/in-2021-the-eu-promised-to-ban-cages-but-years-later-nothing-has-changed/)).
   
3. **(Rejected - All Types)** - Proposal Declined:<br>
   The Commission refuses to propose NEW legislation.<br>
   They may offer non-legislative alternatives (funding, better enforcement of old laws),<br>
   but the core legislative demand is denied.

**NOTE:**<br>
> - "Law Active" means the EU passed the law; however, each country must still put it into their own legal system. Poland [missed the 2023 deadline](https://notesfrompoland.com/2023/08/30/poland-has-not-implemented-eu-water-quality-directive/) to implement the Drinking Water Directive (linked to Right2Water ECI)

In [4]:
# ==============================================================================
# SCATTER PLOT: Signatures vs Outcome
# ==============================================================================

import plotly.express as px

# Use merged_data, add plot-specific formatting
plot_data = merged_data.copy()

# Get the commission response date (try multiple columns)
if 'official_communication_adoption_date' in plot_data.columns:
    plot_data['response_date'] = pd.to_datetime(plot_data['official_communication_adoption_date'], errors='coerce')
elif 'commission_submission_date' in plot_data.columns:
    plot_data['response_date'] = pd.to_datetime(plot_data['commission_submission_date'], errors='coerce')
else:
    plot_data['response_date'] = pd.NaT

# Add formatted dates for hover
plot_data = format_dates_for_display(
    plot_data, 
    ['timeline_registered', 'timeline_collection_closed', 'response_date']
)

# Add formatted objective
plot_data['objective_display'] = plot_data['objective'].apply(format_objective)

# Add wait time
plot_data['wait_time'] = plot_data.apply(
    lambda row: format_duration(row['timeline_collection_closed'], row['response_date']), 
    axis=1
)

# Define outcome ranking and colors (same as pie chart)
outcome_ranking = {
    'Law Active': 5,
    'Law Passed': 4,
    'Law Promised': 3,
    'Action Plan Created': 2,
    'Being Studied': 1,
    'Rejected - Alternative Actions': -1,
    'Rejected - Already Covered': -2,
}

outcome_colors = {
    'Law Active': 'rgb(60, 163, 113)',
    'Law Passed': 'rgb(102, 187, 106)',
    'Law Promised': 'rgb(156, 204, 101)',
    'Action Plan Created': 'rgb(255, 193, 7)',
    'Being Studied': 'rgb(255, 152, 0)',
    'Rejected - Alternative Actions': 'rgb(244, 67, 54)',
    'Rejected - Already Covered': 'rgb(183, 28, 28)',
}

# Add ranking for sorting
plot_data['outcome_rank'] = plot_data['final_outcome_status'].map(outcome_ranking)

# Apply power scaling for bubble sizes
plot_data['size_scaled'] = plot_data['signatures_numeric'] ** 4.3

# Sort outcomes by rank (best to worst) for legend ordering
outcome_order = sorted(
    plot_data['final_outcome_status'].unique(), 
    key=lambda x: outcome_ranking.get(x, 0), 
    reverse=True
)

# Create categorical x-axis positions
plot_data['outcome_position'] = plot_data['final_outcome_status'].map(
    {outcome: i for i, outcome in enumerate(outcome_order)}
)

# Create scatter plot
fig = px.scatter(
    plot_data.sort_values('outcome_rank', ascending=False),
    x='outcome_position',
    y='signatures_numeric',
    size='size_scaled',
    color='final_outcome_status',
    custom_data=['title', 'timeline_registered_formatted', 'timeline_collection_closed_formatted', 
                 'response_date_formatted', 'wait_time', 'final_outcome_status', 
                 'signatures_numeric', 'objective_display'],
    title=f'<b>Signatures vs Commission Response Outcome (Responses: {len(plot_data)})</b>',
    labels={
        'outcome_position': 'Outcome Category',
        'signatures_numeric': 'Total Signatures Collected',
        'final_outcome_status': 'Outcome'
    },
    category_orders={'final_outcome_status': outcome_order},
    color_discrete_map=outcome_colors,
    size_max=60
)

# Custom hover template with full timeline + formatted objectives
fig.update_traces(
    hovertemplate=(
        '<b>%{customdata[0]}</b><br><br>'
        '<b>Outcome:</b> %{customdata[5]}<br>'
        '<b>Signatures:</b> %{customdata[6]:,.0f}<br><br>'
        '<b>Registration:</b> %{customdata[1]}<br>'
        '<b>Signatures Ended:</b> %{customdata[2]}<br>'
        '<b>Response Date:</b> %{customdata[3]}<br>'
        '<b>Response Wait Time:</b> %{customdata[4]}<br><br>'
        '<b>Objective:</b><br>%{customdata[7]}<br>'
        '<extra></extra>'
    )
)

# Update x-axis to show outcome labels
fig.update_xaxes(
    tickmode='array',
    tickvals=list(range(len(outcome_order))),
    ticktext=outcome_order,
    tickangle=45,
    title=dict(text='Outcome Category', font=dict(size=14))
)

# Format y-axis with comma separators
fig.update_yaxes(
    title=dict(text='Total Signatures Collected', font=dict(size=14)),
    tickformat=','
)

fig.update_layout(
    height=650,
    showlegend=True,
    legend=dict(
        title=dict(text='<b>Outcome</b>', font=dict(size=12)),
        font=dict(size=11),
        orientation='v',
        yanchor='top',
        y=1,
        xanchor='left',
        x=1.02
    ),
    hovermode='closest'
)

fig.show()


### The Signature Paradox: Why More Signatures Don't Guarantee Success

Examples of rejected ECIs:

In [5]:
# ==============================================================================
# TABLE: Rejected ECIs with Full Reasons
# ==============================================================================


# Filter merged_data for rejected outcomes
REJECTION_OUTCOMES = ['Rejected - Already Covered', 'Rejected - Alternative Actions']
plot_data = merged_data[merged_data['final_outcome_status'].isin(REJECTION_OUTCOMES)].copy()


# Select and format columns
plot_data = plot_data[[
    'registration_number',
    'title',
    'signatures_numeric',
    'final_outcome_status',
    'commission_rejection_reason'
]].sort_values('signatures_numeric', ascending=False)


# Rename columns for display
plot_data.columns = [
    'Registration Number',
    'ECI Title',
    'Signatures',
    'Rejection Type',
    'Rejection Reason'
]


# Format signatures with commas
plot_data['Total Signatures'] = plot_data['Signatures'].apply(
    lambda x: f'{x:,.0f}' if pd.notna(x) else 'N/A'
)


# Configure pandas to show full text
pd.set_option('display.max_colwidth', None)


# Display styled table
display(plot_data[[
    'Registration Number', 
    'ECI Title', 
    'Total Signatures',
    'Rejection Type', 
    'Rejection Reason'
]].style.set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap'
}).hide(axis='index'))

pd.reset_option('display.max_colwidth')


print(f"\nüìä Total Rejected ECIs: {len(plot_data)}")
print(f"   - Rejected - Already Covered: {len(plot_data[plot_data['Rejection Type'] == 'Rejected - Already Covered'])}")
print(f"   - Rejected - Alternative Actions: {len(plot_data[plot_data['Rejection Type'] == 'Rejected - Alternative Actions'])}")

Registration Number,ECI Title,Total Signatures,Rejection Type,Rejection Reason
2012/000005,One of us,1721626,Rejected - Already Covered,The Commission decided not to make a legislative proposal.
2019/000007,Cohesion policy for the equality of the regions and sustainability of the regional cultures,1269351,Rejected - Already Covered,"The Commission carefully analysed the citizens' proposals and concluded that while some proposals fall outside of EU competence, as they would interfere with the existing constitutional setup of the concerned Member States, others are already covered under the current Cohesion policy thanks to its robust safeguards promoting inclusion and equal treatment of minorities, as well as the respect for cultural and linguistic diversity."
2012/000007,Stop vivisection,1173130,Rejected - Alternative Actions,"While the Commission does share the conviction that animal testing should be phased out in Europe, its approach for achieving that objective differs from the one proposed in this Citizens' Initiative. The Commission considers that the Directive on the protection of animals used for scientific purposes (Directive 2010/63/EU), which the Initiative seeks to repeal, is the right legislation to achieve the underlying objectives of the Initiative. It sets full replacement of animals as its ultimate goal as soon as it is scientifically possibly, and provides a legally binding stepwise approach as non-animal alternatives become available. Therefore, no repeal of that legislation was proposed."
2017/000004,Minority SafePack ‚Äì one million signatures for diversity in Europe,1123422,Rejected - Already Covered,The Commission decided not to make a legislative proposal.



üìä Total Rejected ECIs: 4
   - Rejected - Already Covered: 3
   - Rejected - Alternative Actions: 1


Example of one of the successful ECIs:

In [6]:
# ==============================================================================
# TABLE: Water and Sanitation ECI Data
# ==============================================================================

# Filter merged_data for the specific title
target_title = 'Water and sanitation are a human right!  Water is a public good, not a commodity!'
# Note: Using str.contains or just direct comparison. Added .strip() to be safe.
plot_data = merged_data[merged_data['title'].str.strip() == target_title.strip()].copy()

# Select columns
plot_data = plot_data[[
    'registration_number',
    'title',
    'signatures_numeric',
    'commission_answer_text'
]]

# Rename columns for display
plot_data.columns = [
    'Registration Number',
    'ECI Title',
    'Signatures Numeric', # Temporary column for formatting
    'commission_answer_text'
]

# Format signatures with commas and create final 'Total Signatures' column
plot_data['Total Signatures'] = plot_data['Signatures Numeric'].apply(
    lambda x: f'{x:,.0f}' if pd.notna(x) else 'N/A'
)

# Reorder columns to match request
final_columns = [
    'Registration Number',
    'ECI Title',
    'Total Signatures',
    'commission_answer_text'
]

# Configure pandas to show full text
pd.set_option('display.max_colwidth', None)

# Display styled table
display(plot_data[final_columns].style.set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap'
}).hide(axis='index'))

# Reset pandas display option
pd.reset_option('display.max_colwidth')


Registration Number,ECI Title,Total Signatures,commission_answer_text
2012/000003,"Water and sanitation are a human right! Water is a public good, not a commodity!",1659543,"The Commission committed, in particular, to taking the following actions: reinforcing implementation of EU water quality legislation, building on the commitments presented in the 7th Environment Action Programme (EAP) and the Water Blueprint;launching an EU-wide public consultation on the Drinking Water Directive, notably in view of improving access to quality water in the EU;improving transparency for urban wastewater and drinking water data management and explore the idea of benchmarking water quality;bringing about a more structured dialogue between stakeholders on transparency in the water sector;cooperating with existing initiatives to provide a wider set of benchmarks for water services;stimulating innovative approaches for development assistance (e.g. support to partnerships between water operators and to public-public partnerships);promoting sharing of best practices between Member States (e.g. on solidarity instruments) and identifying new opportunities for cooperation;advocating universal access to safe drinking water and sanitation as a priority area for Sustainable Development Goals. Official documents related to the decision:"


In [7]:
# ==============================================================================
# COMPARISON: Signature Paradox - Extreme Cases
# ==============================================================================

# Use merged_data to find extreme cases
plot_data = merged_data[['registration_number', 'signatures_numeric', 'title', 'final_outcome_status']].copy()

# Find the specific ECIs
highest_sig_eci = plot_data.loc[plot_data['signatures_numeric'].idxmax()]
lowest_sig_eci = plot_data.loc[plot_data['signatures_numeric'].idxmin()]

# Define success and rejection categories for validation
SUCCESS_OUTCOMES = ['Law Active', 'Law Passed', 'Law Promised']
REJECTION_OUTCOMES = ['Rejected - Already Covered', 'Rejected - Alternative Actions']
NEUTRAL_OUTCOMES = ['Being Studied', 'Action Plan Created']

# VALIDATION: Check if our assumption holds
highest_outcome = highest_sig_eci['final_outcome_status']
lowest_outcome = lowest_sig_eci['final_outcome_status']

# Validate highest signature ECI (should be rejected or neutral, NOT successful)
if highest_outcome in SUCCESS_OUTCOMES:
    raise ValueError(
        f"‚ùå VALIDATION ERROR: Highest signature ECI ({highest_sig_eci['signatures_numeric']:,.0f}) "
        f"has SUCCESS outcome '{highest_outcome}'.\n"
        f"This contradicts the 'signature paradox' narrative. Review your interpretation."
    )

# Validate lowest signature ECI (should be successful or neutral, NOT rejected)
if lowest_outcome in REJECTION_OUTCOMES:
    raise ValueError(
        f"‚ùå VALIDATION ERROR: Lowest signature ECI ({lowest_sig_eci['signatures_numeric']:,.0f}) "
        f"has REJECTION outcome '{lowest_outcome}'.\n"
        f"This contradicts the 'signature paradox' narrative. Review your interpretation."
    )

# Determine appropriate emojis based on outcome
def get_outcome_emoji(outcome):
    """Return appropriate emoji based on outcome category."""
    if outcome in SUCCESS_OUTCOMES:
        return "‚úÖ"
    elif outcome in REJECTION_OUTCOMES:
        return "‚ùå"
    elif outcome in NEUTRAL_OUTCOMES:
        return "‚è≥"
    else:
        raise ValueError(
            f"‚ùå UNKNOWN OUTCOME ERROR: Encountered unexpected outcome '{outcome}'.\n"
            f"Known outcomes are:\n"
            f"  - Success: {SUCCESS_OUTCOMES}\n"
            f"  - Rejection: {REJECTION_OUTCOMES}\n"
            f"  - Neutral: {NEUTRAL_OUTCOMES}\n"
            f"Please add this outcome to the appropriate category or investigate if it's a data quality issue."
        )

# Key findings that demonstrate signature count paradox
signature_paradox_data = {
    'Finding': [
        'üèÜ Highest signature count',
        'üìâ Lowest signature count',
    ],
    
    'ECI Title': [
        highest_sig_eci['title'],
        lowest_sig_eci['title'],
    ],
    
    'Signatures': [
        f"{highest_sig_eci['signatures_numeric']:,.0f}",
        f"{lowest_sig_eci['signatures_numeric']:,.0f}",
    ],
    
    'Outcome': [
        f"{get_outcome_emoji(highest_outcome)} {highest_outcome}",
        f"{get_outcome_emoji(lowest_outcome)} {lowest_outcome}",
    ],
    
    'Explanation': [
        'Most signatures ‚Üí ' + ('Still rejected' if highest_outcome in REJECTION_OUTCOMES else 'Outcome: ' + highest_outcome),
        'Least signatures ‚Üí ' + ('Full success' if lowest_outcome == 'Law Active' else 'Outcome: ' + lowest_outcome),
    ]
}

df_paradox = pd.DataFrame(signature_paradox_data)

# Display
divider_line = "=" * 120
print(divider_line)
print("üìä SIGNATURE PARADOX: The Two Most Extreme Cases")
print(divider_line)
display(df_paradox.style.set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap'
}).hide(axis='index'))


# ==============================================================================
# STATISTICAL SUMMARY: Law Active vs Rejected Comparison
# ==============================================================================

# Calculate stats for Law Active
law_active_stats = plot_data[plot_data['final_outcome_status'] == 'Law Active']['signatures_numeric']
law_active_count = len(law_active_stats)
law_active_min = law_active_stats.min()
law_active_mean = law_active_stats.mean()
law_active_max = law_active_stats.max()

# Calculate stats for Rejected - Already Covered
rejected_stats = plot_data[plot_data['final_outcome_status'] == 'Rejected - Already Covered']['signatures_numeric']
rejected_count = len(rejected_stats)
rejected_min = rejected_stats.min()
rejected_mean = rejected_stats.mean()
rejected_max = rejected_stats.max()

# Calculate differences
diff_min = rejected_min - law_active_min
diff_mean = rejected_mean - law_active_mean
diff_max = rejected_max - law_active_max

# Determine insights
count_comparison = 'Same' if law_active_count == rejected_count else f'{rejected_count} vs {law_active_count}'
min_insight = f'+{diff_min:,.0f} more for rejected' if diff_min > 0 else f'{abs(diff_min):,.0f} less for rejected'
mean_insight = f'+{diff_mean:,.0f} more for rejected' if diff_mean > 0 else f'{abs(diff_mean):,.0f} less for rejected'
max_insight = f'+{diff_max:,.0f} more for rejected' if diff_max > 0 else f'{abs(diff_max):,.0f} less for rejected'

summary_comparison = pd.DataFrame({
    'Outcome Category': [
        'Law Active (Best)', 
        'Rejected - Already Covered (Worst)', 
        'Difference'
    ],
    'Count': [
        law_active_count,
        rejected_count,
        count_comparison
    ],
    'Min Signatures': [
        f'{law_active_min:,.0f}',
        f'{rejected_min:,.0f}',
        min_insight
    ],
    'Mean Signatures': [
        f'{law_active_mean:,.0f}',
        f'{rejected_mean:,.0f}',
        mean_insight
    ],
    'Max Signatures': [
        f'{law_active_max:,.0f}',
        f'{rejected_max:,.0f}',
        max_insight
    ],
    'Key Insight': [
        'Success with LEAST signatures' if law_active_mean < rejected_mean else 'Success with MORE signatures',
        'Rejection with MOST signatures' if rejected_mean > law_active_mean else 'Rejection with LESS signatures', 
        'Rejected have consistently MORE' if diff_mean > 0 else 'Law Active have consistently MORE'
    ]
})

print("\n" + divider_line)
print("üìà STATISTICAL SUMMARY: Law Active vs Rejected")
print(divider_line)
display(summary_comparison.style.hide(axis='index'))

print("\nüí° KEY TAKEAWAY FOR ORGANIZERS:")
key_takeaway = """
   Beyond 1 million signatures, collecting more doesn't significantly improve success rates.
   
   The Commission evaluates each ECI based on policy alignment, legal feasibility, 
   and current EU priorities. Since Commissioners are appointed rather than directly 
   elected, they respond to broader EU institutional dynamics rather than signature 
   counts alone. 
"""
print(key_takeaway)


üìä SIGNATURE PARADOX: The Two Most Extreme Cases


Finding,ECI Title,Signatures,Outcome,Explanation
üèÜ Highest signature count,One of us,1721626,‚ùå Rejected - Already Covered,Most signatures ‚Üí Still rejected
üìâ Lowest signature count,Save bees and farmers ! Towards a bee-friendly agriculture for a healthy environment,1054973,‚úÖ Law Active,Least signatures ‚Üí Full success



üìà STATISTICAL SUMMARY: Law Active vs Rejected


Outcome Category,Count,Min Signatures,Mean Signatures,Max Signatures,Key Insight
Law Active (Best),4,1054973,1226344,1659543,Success with LEAST signatures
Rejected - Already Covered (Worst),3,1123422,1371466,1721626,Rejection with MOST signatures
Difference,3 vs 4,"+68,449 more for rejected","+145,122 more for rejected","+62,083 more for rejected",Rejected have consistently MORE



üí° KEY TAKEAWAY FOR ORGANIZERS:

   Beyond 1 million signatures, collecting more doesn't significantly improve success rates.
   
   The Commission evaluates each ECI based on policy alignment, legal feasibility, 
   and current EU priorities. Since Commissioners are appointed rather than directly 
   elected, they respond to broader EU institutional dynamics rather than signature 
   counts alone. 



<a id='question-2'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">2. What are the key temporal patterns from submission through implementation?</p>

In [8]:
import plotly.graph_objects as go
import numpy as np
import pandas as pd
import json
import textwrap



# ==============================================================================
# 1. SETUP & DEFINITIONS
# ==============================================================================


CURRENT_DATE = pd.Timestamp.now()
STATUS_REJECTED = ['Rejected - Already Covered', 'Rejected - Alternative Actions', 'Rejected', 'Withdrawn']


def extract_collection_closed_date(timeline_json):
    if pd.isna(timeline_json): return pd.NaT
    try:
        timeline = json.loads(timeline_json) if isinstance(timeline_json, str) else timeline_json
        for step in timeline:
            if step.get('step') == 'Collection closed':
                return pd.to_datetime(step.get('date'), format='%d/%m/%Y', errors='coerce')
        return pd.NaT
    except: return pd.NaT

def days_to_readable(total_days):
    if pd.isna(total_days):
        return "N/A"
    
    total_days = int(total_days)
    if total_days == 0:
        return "0 days"
    
    is_negative = total_days < 0
    abs_days = abs(total_days)
    
    years = abs_days // 365
    remaining_days = abs_days % 365
    months = remaining_days // 30
    days = remaining_days % 30
    
    parts = []
    if years > 0:
        parts.append(f"{years} year{'s' if years != 1 else ''}")
    if months > 0:
        parts.append(f"{months} month{'s' if months != 1 else ''}")
    if days > 0 or len(parts) == 0:
        parts.append(f"{days} day{'s' if days != 1 else ''}")
    
    readable_str = ' '.join(parts)
    
    if is_negative:
        return f"- {readable_str}" 
    return readable_str

def format_hover_text(text, width=50):
    """
    Formats text for hover tooltips:
    1. Normalizes whitespace (tabs, newlines -> single space).
    2. Splits long text into multiple lines (word wrap).
    3. Adds <br> instead of newlines for HTML rendering.
    """
    if not isinstance(text, str):
        return str(text)
    
    # Wrap text
    wrapped = "<br>".join(textwrap.wrap(text, width=width))
    # Normalize whitespace: replace any whitespace sequence with a single space
    normalized_text = ' '.join(wrapped.split())

    return normalized_text
    
# ==============================================================================
# 2. DATA PREPARATION (PIE CHART LOGIC)
# ==============================================================================


# GROUP 1: "Still Waiting for Response"
successful_ecis = df_initiatives[df_initiatives['successful_eci'] == True]['registration_number'].tolist()
responded_ecis = df_merger['registration_number'].tolist()
waiting_ecis = [eci for eci in successful_ecis if eci not in responded_ecis]


# Ensure 'objective' is selected here
df_waiting = df_initiatives[df_initiatives['registration_number'].isin(waiting_ecis)].copy()
df_waiting = df_waiting[['registration_number', 'title', 'objective', 'timeline']].copy() # Keep objective
df_waiting['date_collection_closed'] = df_waiting['timeline'].apply(extract_collection_closed_date)
df_waiting['plot_duration'] = (CURRENT_DATE - df_waiting['date_collection_closed']).dt.days
df_waiting['group_label'] = 'Still Waiting for Response'


# GROUP 2 & 3: Responded ECIs
# Ensure 'objective' is selected in the merge
df_responded = df_merger.merge(
    df_initiatives[['registration_number', 'title', 'objective', 'timeline']], 
    on='registration_number', 
    how='left',
    suffixes=('', '_init')
)


df_responded['date_collection_closed'] = df_responded['timeline'].apply(extract_collection_closed_date)
df_responded['response_date'] = pd.to_datetime(df_responded['official_communication_adoption_date'], errors='coerce')
df_responded['plot_duration'] = (df_responded['response_date'] - df_responded['date_collection_closed']).dt.days


mask_rejected = df_responded['final_outcome_status'].isin(STATUS_REJECTED)
df_responded.loc[mask_rejected, 'group_label'] = 'Rejected'
df_responded.loc[~mask_rejected, 'group_label'] = 'Commitment from Commission'


# Combine
plot_df = pd.concat([
    df_waiting[['registration_number', 'title', 'objective', 'plot_duration', 'group_label']],
    df_responded[['registration_number', 'title', 'objective', 'plot_duration', 'group_label']]
], ignore_index=True)


plot_df = plot_df[plot_df['plot_duration'].notna()].copy()


# Apply formatting
plot_df['readable_duration'] = plot_df['plot_duration'].apply(days_to_readable)
plot_df['objective_formatted'] = plot_df['objective'].apply(lambda x: format_hover_text(x))

# ==============================================================================
# 3. CREATE DOT PLOT
# ==============================================================================

fig = go.Figure()

categories = ['Commitment from Commission', 'Still Waiting for Response', 'Rejected']
colors = {'Commitment from Commission': '#66bb6a', 'Still Waiting for Response': '#9e9e9e', 'Rejected': '#ef5350'}

median_legend_added = False  # Track if we've added the legend entry

for cat in categories:
    subset = plot_df[plot_df['group_label'] == cat]
    if subset.empty: continue
    
    # Median Line
    median_val = subset['plot_duration'].median()
    if not np.isnan(median_val):
        fig.add_trace(go.Scatter(
            x=[cat, cat], y=[median_val, median_val],
            mode='markers',
            marker=dict(symbol='line-ew', size=60, color='black', line=dict(width=3)),
            hoverinfo='skip', 
            showlegend=not median_legend_added,  # Only show legend for first median
            name='Median',
            legendgroup='median'
        ))
        median_legend_added = True

    # Points with Objective in Tooltip
    custom_data_stack = np.stack((
        subset['readable_duration'].to_numpy(), 
        subset['objective_formatted'].to_numpy()
    ), axis=-1)

    fig.add_trace(go.Scatter(
        x=[cat] * len(subset),
        y=subset['plot_duration'],
        mode='markers',
        name=cat,
        text=subset['title'],
        customdata=custom_data_stack,
        hovertemplate=(
            '<b>%{text}</b><br>'
            'Waited: %{customdata[0]}<br>'
            '<br><b>Objective:</b><br>%{customdata[1]}'
            '<extra></extra>'
        ),
        marker=dict(color=colors[cat], size=12, line=dict(color='white', width=1), opacity=0.8)
    ))

# Layout with Years on Y-Axis
max_days = plot_df['plot_duration'].max()
max_years = int(max_days // 365) + 2

fig.update_layout(
    title='<b>Time from Collection End to Commission Response (or Current Wait)</b>',
    yaxis=dict(
        title='Duration',
        zeroline=True,
        gridcolor='rgb(243, 243, 243)',
        tickmode='array',
        tickvals=[i * 365 for i in range(max_years)],
        ticktext=[f"{i} yr" for i in range(max_years)]
    ),
    xaxis=dict(title='Outcome Group'),
    showlegend=True,  # Changed to True
    plot_bgcolor='white',
    height=500,
    width=1000
)

fig.show()


**NOTE:**
> - [**Total timeline from collection end**](https://citizens-initiative.europa.eu/how-it-works_en#Step-4-Get-statements-of-support-verified):<br>Up to 3 months (organizers submit for verification) + 3 months (authorities verify) + 3 months (organizers submit the initiative) + 6 months (Commission responds) = **maximum 15 months** from when collection closes
> - [**Stop Extremism**](https://citizens-initiative.europa.eu/initiatives/details/2017/000007_en):<br>However, actual response times can be extreme: this ECI has been waiting over **7 years (and still ongoing)** for a Commission response

In [9]:
import plotly.graph_objects as go
import numpy as np
from scipy import stats
import pandas as pd
import json
import re
import textwrap


# ==============================================================================
# 1. HELPER FUNCTIONS FOR DATE EXTRACTION
# ==============================================================================

def find_earlier_force_date(description, active_date_obj):
    """
    Scans description for 'entered into force' or 'adopted' dates earlier than the active date.
    Returns the earliest such date found, or None.
    """
    if not isinstance(description, str) or pd.isna(active_date_obj):
        return None
        
    # Regex to catch "entered into force on DD Month YYYY" or "adopted on Month YYYY"
    # Captures: "12 January 2021", "June 2020", "28 October 2015"
    date_pattern = r"(?:entered into force|adopted)(?:\s+\w+){0,5}?(?: on| in)?\s+(\d{1,2}\s+[A-Za-z]+\s+\d{4}|[A-Za-z]+\s+\d{4})"
    
    matches = re.finditer(date_pattern, description, re.IGNORECASE)
    
    earliest_date = None
    
    for match in matches:
        date_text = match.group(1)
        try:
            found_date = pd.to_datetime(date_text, errors='coerce')
            # Check if valid and strictly earlier than the active date (buffer of 30 days to avoid near-matches)
            if pd.notna(found_date) and found_date < (active_date_obj - pd.Timedelta(days=30)):
                if earliest_date is None or found_date < earliest_date:
                    earliest_date = found_date
        except:
            continue
            
    return earliest_date

def extract_timeline_dates(timeline_json):
    if pd.isna(timeline_json):
        return {}
    try:
        timeline = json.loads(timeline_json) if isinstance(timeline_json, str) else timeline_json
        dates = {}
        for step in timeline:
            step_name = step.get('step', '')
            date_str = step.get('date', '')
            if date_str:
                dates[step_name] = pd.to_datetime(date_str, format='%d/%m/%Y', errors='coerce')
        return dates
    except:
        return {}

def extract_action_dates(actions_json, enrich_vacatio=False):
    """
    Extract all dates from laws_actions or policies_actions.
    If enrich_vacatio is True, splits 'law_active' into 'in_vacatio_legis' if earlier dates are found.
    """
    if pd.isna(actions_json):
        return []
    try:
        actions = json.loads(actions_json) if isinstance(actions_json, str) else actions_json
        dates = []
        for action in actions:
            if 'date' in action and action['date']:
                date_obj = pd.to_datetime(action['date'], errors='coerce')
                
                if pd.notna(date_obj):
                    current_status = action.get('status', 'Unknown')
                    
                    # 1. Add the original action
                    dates.append({
                        'date': date_obj,
                        'type': action.get('type', 'Unknown'),
                        'status': current_status
                    })
                    
                    # 2. Enrichment Logic: If law_active, check for earlier "vacatio" start date
                    if enrich_vacatio and current_status == 'law_active':
                        earlier_date = find_earlier_force_date(action.get('description', ''), date_obj)
                        if earlier_date:
                            dates.append({
                                'date': earlier_date,
                                'type': action.get('type', 'Unknown') + " (Adopted/Force)",
                                'status': 'in_vacatio_legis' # Treat as "Law Passed" phase
                            })
                            
        return dates
    except:
        return []

def extract_followup_dates(followup_json):
    """Extract all dates from followup_events_with_dates"""
    if pd.isna(followup_json):
        return []
    try:
        followup = json.loads(followup_json) if isinstance(followup_json, str) else followup_json
        all_dates = []
        for event in followup:
            if 'dates' in event and event['dates']:
                for date_str in event['dates']:
                    if date_str:
                        date_obj = pd.to_datetime(date_str, errors='coerce')
                        if pd.notna(date_obj):
                            all_dates.append(date_obj)
        return all_dates
    except:
        return []

# ==============================================================================
# 2. DATA PROCESSING
# ==============================================================================

df = merged_data.copy()

# Process timeline
df['timeline_dates'] = df['timeline'].apply(extract_timeline_dates)

# Extract ALL timeline dates
df['date_collection_start'] = df['timeline_dates'].apply(lambda x: x.get('Collection start date'))
df['date_collection_closed'] = df['timeline_dates'].apply(lambda x: x.get('Collection closed'))
df['date_valid_initiative'] = df['timeline_dates'].apply(lambda x: x.get('Valid initiative'))

# Calculate ALL stage durations
df['stage_collection'] = (df['date_collection_closed'] - df['date_collection_start']).dt.days
df['stage_verification'] = (df['date_valid_initiative'] - df['date_collection_closed']).dt.days
df['stage_response_wait'] = (df['timeline_response_commission_date'] - df['date_valid_initiative']).dt.days

# Extract law and policy action dates (WITH ENRICHMENT for laws)
df['laws_action_dates'] = df['laws_actions'].apply(lambda x: extract_action_dates(x, enrich_vacatio=True))
df['policies_action_dates'] = df['policies_actions'].apply(lambda x: extract_action_dates(x, enrich_vacatio=False))

# Function to get earliest Law Passed date (in_vacatio_legis, adopted, or proposed as fallback)
def get_law_approved_date(action_dates):
    if not action_dates:
        return None
    # Include 'in_vacatio_legis' which we just enriched, and 'adopted'
    approved_dates = [a['date'] for a in action_dates if a['status'] in ['in_vacatio_legis', 'adopted']]
    return min(approved_dates) if approved_dates else None

# Function to get earliest law active date (law_active status)
def get_law_active_date(action_dates):
    if not action_dates:
        return None
    active_dates = [a['date'] for a in action_dates if a['status'] == 'law_active']
    return min(active_dates) if active_dates else None

# Function to get earliest policy action date
def get_earliest_policy_date(action_dates):
    if not action_dates:
        return None
    dates = [a['date'] for a in action_dates]
    return min(dates) if dates else None

# Calculate dates (earliest per ECI)
df['law_approved_date'] = df['laws_action_dates'].apply(get_law_approved_date)
df['law_active_date'] = df['laws_action_dates'].apply(get_law_active_date)
df['policy_action_date'] = df['policies_action_dates'].apply(get_earliest_policy_date)

# Calculate durations from Response to each milestone
df['stage_to_law_approved'] = (df['law_approved_date'] - df['timeline_response_commission_date']).dt.days
df['stage_to_law_active'] = (df['law_active_date'] - df['timeline_response_commission_date']).dt.days
df['stage_to_policy_action'] = (df['policy_action_date'] - df['timeline_response_commission_date']).dt.days

# Calculate statistics
def calculate_stats(series):
    series_clean = series.dropna()
    if len(series_clean) == 0:
        return {'mode': 0, 'min': 0, 'median': 0, 'max': 0, 'count': 0}
    
    mode_result = stats.mode(series_clean, keepdims=True)
    mode_val = mode_result.mode[0] if len(mode_result.mode) > 0 else series_clean.median()
    
    return {
        'mode': float(mode_val),
        'min': float(series_clean.min()),
        'median': float(series_clean.median()),
        'max': float(series_clean.max()),
        'count': len(series_clean)
    }

stats_collection = calculate_stats(df['stage_collection'])
stats_verification = calculate_stats(df['stage_verification'])
stats_response_wait = calculate_stats(df['stage_response_wait'])
stats_law_approved = calculate_stats(df['stage_to_law_approved'])
stats_law_active = calculate_stats(df['stage_to_law_active'])
stats_policy_actions = calculate_stats(df['stage_to_policy_action'])

eci_list_law_approved = df[df['stage_to_law_approved'].notna()]['title'].tolist()
eci_list_law_active = df[df['stage_to_law_active'].notna()]['title'].tolist()
eci_list_policy = df[df['stage_to_policy_action'].notna()]['title'].tolist()

# ==============================================================================
# 3. PLOTTING
# ==============================================================================

def format_hover(stage_name, stats_dict, eci_list=None):
    hover = (
        f'<b>{stage_name}</b><br>'
        f'<b>most cases:</b> {days_to_readable(stats_dict["mode"])}<br>'
        '<br>'
        f'<b>min:</b> {days_to_readable(stats_dict["min"])}<br>'
        f'<b>median:</b> {days_to_readable(stats_dict["median"])}<br>'
        f'<b>max:</b> {days_to_readable(stats_dict["max"])}<br>'
        f'<br>'
        f'<b>n:</b> {stats_dict["count"]} ECIs'
    )
    
    if eci_list is not None and len(eci_list) > 0:
        hover += '<br><br><b>ECIs:</b><br>'
        for i, eci in enumerate(eci_list, 1):
            eci_short = eci if len(eci) <= 60 else eci[:57] + '...'
            hover += f'{i}. {eci_short}<br>'
    
    return hover

fig = go.Figure()

# Sequential stages
stages_sequential = ['Collection', 'Verification', 'Response Wait']
durations_sequential = [
    stats_collection['mode'],
    stats_verification['mode'],
    stats_response_wait['mode']
]
cumulative_seq = np.cumsum([0] + durations_sequential)

colors_seq = ['rgb(66, 165, 245)', 'rgb(100, 181, 246)', 'rgb(158, 158, 158)']

for i, (stage, duration) in enumerate(zip(stages_sequential, durations_sequential)):
    hover_map = {
        'Collection': format_hover('Collection Signatures', stats_collection),
        'Verification': format_hover('Verification', stats_verification),
        'Response Wait': format_hover('Response Wait', stats_response_wait)
    }
    
    fig.add_trace(go.Bar(
        x=[stage],
        y=[duration],
        base=[cumulative_seq[i]],
        marker_color=colors_seq[i],
        text=[f"{int(duration)} days"],
        textposition='outside',
        hovertext=hover_map[stage],
        hoverinfo='text',
        showlegend=False
    ))

# Base for branching
base_cumulative = stats_collection['mode'] + stats_verification['mode'] + stats_response_wait['mode']

# Branching stages (Law Passed vs Active vs Policy)
stages_branch = ['Law Passed', 'Law Active', 'Other Non-Legislative Actions']
durations_branch = [
    stats_law_approved['mode'],
    stats_law_active['mode'],
    stats_policy_actions['mode']
]
hover_branch = [
    format_hover('Law Passed (since Response)', stats_law_approved, eci_list_law_approved),
    format_hover('Law Active (since Response)', stats_law_active, eci_list_law_active),
    format_hover('Other Non-Legislative Actions (since Response)', stats_policy_actions, eci_list_policy)
]
colors_branch = ['rgb(102, 187, 106)', 'rgb(60, 163, 113)', 'rgb(255, 152, 0)']

for stage, duration, hover, color in zip(stages_branch, durations_branch, hover_branch, colors_branch):
    fig.add_trace(go.Bar(
        x=[stage],
        y=[duration],
        base=[base_cumulative],
        marker_color=color,
        text=[f"{int(duration)} days" if not pd.isna(duration) and duration > 0 else "N/A"],
        textposition='outside',
        hovertext=hover,
        hoverinfo='text',
        showlegend=False
    ))

fig.update_layout(
    title='<b>Time to First Commission Action: From Signatures to Legislative Proposal</b>',
    xaxis=dict(
        title='Process Stage',
        categoryorder='array',
        categoryarray=['Collection', 'Verification', 'Response Wait', 'Law Passed', 'Law Active', 'Other Non-Legislative Actions']
    ),
    yaxis=dict(
        title='Duration (Days)',
        tickformat=','
    ),
    height=600,
    barmode='overlay',
    plot_bgcolor='rgb(240, 242, 246)',
    annotations=[
        dict(
            x=4.5,
            y=base_cumulative - 30,
            text="Since Response ",
            showarrow=True,
            arrowhead=2,
            arrowsize=1,
            arrowwidth=2,
            arrowcolor="gray",
            ax=-80,
            ay=0,
            font=dict(size=10, color="gray")
        )
    ]
)

fig.show()

**NOTE**
> This chart shows the time until the **first** law is formally adopted ("Law Passed") or becomes binding ("Law Active"). 
> 
> *   **Transition Period:** The gap between the `Law Passed` and `Law Active` stages is intentional. It provides a necessary time buffer (often called *vacatio legis*) for companies, citizens, and public institutions to learn the new regulations and prepare their systems for implementation before the rules become legally binding.
> *   **Data Limitations:** Official EU communications may sometimes omit specific legal terminology like "applicable from" or "entered into force," focusing instead on the adoption event. This inconsistency in reporting can affect the precision of the "Law Active" dates in this analysis.
> *   **Long-term Process:** Formal adoption is rarely the finish line. Often **additional laws** are needed to fully achieve the ECI's goals, and the battle shifts to **implementation**. Campaigning for complementary legislation remains crucial, as a single legal act rarely solves the entire issue.

In [10]:
# ==============================================================================
# 1. EXTRACT DATA FOR PLOTTING
# ==============================================================================

def parse_action_dates_for_plot(row):
    """
    Extracts all actions with their dates, types, and categories for a single ECI.
    Returns a list of dictionaries: 
    {'date': ..., 'type': ..., 'category': ..., 'desc': ..., 'marker': ..., 'color': ...}
    """
    actions_list = []
    
    # 1. LAW ACTIONS (Category: Legislative Action)
    # ------------------------------------------------------------------
    if isinstance(row.get('laws_actions'), str):
        try:
            laws = json.loads(row['laws_actions'])
            for action in laws:
                if action.get('date'):
                    # Use 'type' from JSON if available (e.g. "Directive", "Regulation"), 
                    # otherwise generic "Legislative Act"
                    act_type = action.get('type', 'Legislative Act')
                    
                    actions_list.append({
                        'date': action['date'],
                        'description': action.get('description', 'Law Action'),
                        'action_type': act_type, # Specific type for hover
                        'category': 'Legislative Action (Laws)',
                        'marker': 'circle',
                        'color': '#2ca02c' # Green
                    })
        except: pass

    # 2. POLICY ACTIONS (Category: High-Engagement vs Other)
    # ------------------------------------------------------------------
    high_engagement_types = [
        "Impact Assessment and Consultation",
        "Stakeholder Dialogue",
        "Scientific Activity",
        "International Cooperation",
        "Monitoring and Enforcement"
    ]
    
    if isinstance(row.get('policies_actions'), str):
        try:
            policies = json.loads(row['policies_actions'])
            for action in policies:
                if action.get('date'):
                    act_type = action.get('type', 'Other Policy')
                    
                    if act_type in high_engagement_types:
                        category = 'High-Engagement Policy'
                        color = '#ff7f0e' # Orange
                        marker = 'diamond'
                    else:
                        category = 'Other Policy Action'
                        color = '#1f77b4' # Blue
                        marker = 'x'
                        
                    actions_list.append({
                        'date': action['date'],
                        'description': action.get('description', act_type),
                        'action_type': act_type, # Specific type for hover
                        'category': category,
                        'marker': marker,
                        'color': color
                    })
        except: pass

    # 3. FOLLOW-UP EVENTS (Category: Other Follow-up)
    # ------------------------------------------------------------------
    if isinstance(row.get('followup_events_with_dates'), str):
        try:
            followups = json.loads(row['followup_events_with_dates'])
            for event in followups:
                for date_str in event.get('dates', []):
                    if date_str:
                        full_desc = event.get('action', 'Follow-up Event')
                        
                        actions_list.append({
                            'date': date_str,
                            'description': full_desc,
                            'action_type': '', # Empty string implies no specific type label needed
                            'category': 'Other Follow-up',
                            'marker': 'x',
                            'color': '#1f77b4' # Blue
                        })
        except: pass
        
    return actions_list

# Prepare the data list
plot_data_list = []

# Iterate through the merged_data DataFrame
for index, row in merged_data.iterrows():
    response_date = row.get('timeline_response_commission_date')
    if pd.isna(response_date):
        continue
        
    eci_title = row['title']
    
    # Extract all actions
    actions = parse_action_dates_for_plot(row)
    
    for act in actions:
        act_date = pd.to_datetime(act['date'])
        days_since = (act_date - response_date).days
        
        # Format days as YMD string for hover
        days_formatted = days_to_readable(days_since)
        
        # Apply formatting to description for tooltip
        formatted_desc = format_hover_text(act['description'], width=60)
        
        plot_data_list.append({
            'ECI': eci_title,
            'Days Since Response': days_since,
            'Days Formatted': days_formatted, # For hover
            'Date': act_date.strftime('%Y-%m-%d'),
            'Category': act['category'],
            'Action Type': act['action_type'], # The new field
            'Description': formatted_desc, 
            'Color': act['color'],
            'Marker': act['marker']
        })

# Create DataFrame for Plotly
df_plot = pd.DataFrame(plot_data_list)

# ==============================================================================
# 3. CREATE THE DOT PLOT
# ==============================================================================

if not df_plot.empty:
    fig = go.Figure()

    # Drawing Order: Other (Bottom) -> High-Engagement -> Laws (Top)
    # Using your requested category names or defaults
    categories = [
        ('Other Follow-up', '#1f77b4', 'x', 'Administrative Follow-up'), # Combined category
        ('High-Engagement Policy', '#ff7f0e', 'diamond', 'Stakeholder Oversight & Consultation'),
        ('Legislative Action (Laws)', '#2ca02c', 'circle', 'Legislative Acts')
    ]

    for cat_name, color, symbol, legend_name in categories:
        if cat_name == 'Other Follow-up':
            subset = df_plot[df_plot['Category'].isin(['Other Follow-up', 'Other Policy Action'])]
        else:
            subset = df_plot[df_plot['Category'] == cat_name]
        
        if not subset.empty:
            # Prepare Custom Data array
            # Col 0: Date
            # Col 1: Description
            # Col 2: Action Type
            # Col 3: Days Formatted
            # Col 4: Legend Name (Category) - NEW
            
            # Create a column for the legend name to pass to customdata
            legend_col = [legend_name] * len(subset)
            
            custom_data = np.stack((
                subset['Date'], 
                subset['Description'], 
                subset['Action Type'],
                subset['Days Formatted'],
                legend_col
            ), axis=-1)

            # Define hover template conditionally based on category
            if cat_name == 'Other Follow-up':
                hover_temp = (
                    '<i>%{customdata[4]}</i><br>'
                    '<br>'
                    '<b>%{y}</b><br>'
                    'Date: %{customdata[0]}<br>'
                    'Time since response: %{customdata[3]}<br><br>'
                    '<b>Action:</b><br>%{customdata[1]}'
                    '<extra></extra>'
                )
            else:
                hover_temp = (
                    '<i>%{customdata[4]}</i><br>'
                    '<br>'
                    '<b>%{y}</b><br>'
                    'Date: %{customdata[0]}<br>'
                    'Time since response: %{customdata[3]}<br><br>'
                    '<b>Type:</b> %{customdata[2]}<br>'
                    '<b>Action:</b><br>%{customdata[1]}'
                    '<extra></extra>'
                )

            fig.add_trace(go.Scatter(
                x=subset['Days Since Response'],
                y=subset['ECI'],
                mode='markers',
                name=legend_name,
                marker=dict(
                    color=color,
                    symbol=symbol,
                    size=10,
                    line=dict(width=1, color='DarkSlateGrey') if symbol != 'x' else dict(width=2)
                ),
                customdata=custom_data,
                hovertemplate=hover_temp
            ))

    fig.add_vline(x=0, line_width=2, line_dash="dash", line_color="black", annotation_text="Commission Response", annotation_position="top right")

    # Calculate X-axis ticks for Years
    if not df_plot.empty:
        max_days = df_plot['Days Since Response'].max()
        min_days = df_plot['Days Since Response'].min()
        
        # Create ticks for every year from min to max
        # Start from roughly the first year before response if needed, up to max
        start_year = int(min_days / 365) if min_days < 0 else 0
        end_year = int(max_days / 365) + 1
        
        tick_vals = [y * 365.25 for y in range(start_year, end_year + 1)]
        tick_text = [str(y) for y in range(start_year, end_year + 1)]

        fig.update_layout(
            title='<b>The Long Tail of Impact: Timeline of Actions After Commission Response</b>',
            xaxis=dict(
                title='Years After Initial Response',
                zeroline=False,
                gridcolor='lightgrey',
                tickmode='array',
                tickvals=tick_vals,
                ticktext=tick_text
            ),
            yaxis=dict(
                title='',
                type='category',
                dtick=1
            ),
            height=max(600, len(merged_data) * 50),
            plot_bgcolor='white',
            legend=dict(
                orientation="h",
                yanchor="bottom",
                y=1.02,
                xanchor="right",
                x=1
            ),
            margin=dict(l=10, r=10, t=80, b=50)
        )

    fig.show()

**NOTE:**<br>
> - **Commission's Selection**: All events shown ‚Äî past or future ‚Äî are actions specifically identified by the European Commission as relevant follow-up steps to the ECI.
> - **Multiple Dates**: It is common for a single event description to appear multiple times. This is **not an error**; it indicates different milestones (e.g., start date, interim report, final deadline) associated with the same initiative or action.<br><br> The example:<br>
> Proposal for the Nature Restoration Law: following the **agreement of the European Parliament** on the text (on **27 February 2024**), the Council of the EU **adopted** the regulation on **17 June 2024**. It **entered into force** on **18 August 2024** (20 days after its publication in the Official Journal of the EU) and **became applicable immediately**.
`


<a id='question-3'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">3. How do optional Parliament actions correlate with Commission decisions?</p>

In [24]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
import json


# Load the merger and initiatives files
# Merge to get timeline data
df_plot = merged_data.copy()


# Function to extract collection closed date
def extract_collection_closed(timeline_json):
    if pd.isna(timeline_json):
        return pd.NaT
    try:
        timeline = json.loads(timeline_json) if isinstance(timeline_json, str) else timeline_json
        for step in timeline:
            if step.get('step') == 'Collection closed':
                return pd.to_datetime(step.get('date'), format='%d/%m/%Y', errors='coerce')
        return pd.NaT
    except:
        return pd.NaT


# Function to extract valid initiative date
def extract_valid_initiative(timeline_json):
    if pd.isna(timeline_json):
        return pd.NaT
    try:
        timeline = json.loads(timeline_json) if isinstance(timeline_json, str) else timeline_json
        for step in timeline:
            if step.get('step') == 'Valid initiative':
                return pd.to_datetime(step.get('date'), format='%d/%m/%Y', errors='coerce')
        return pd.NaT
    except:
        return pd.NaT


# Function to truncate text
def truncate_text(text, max_length=300):
    if pd.isna(text) or text == '':
        return 'No answer text available'
    text = str(text).strip()
    if len(text) > max_length:
        return text[:max_length] + '...'
    return text


# Convert date columns
df_plot['collection_closed'] = df_plot['timeline'].apply(extract_collection_closed)
df_plot['valid_initiative'] = df_plot['timeline'].apply(extract_valid_initiative)
df_plot['commission_submission_date'] = pd.to_datetime(df_plot['commission_submission_date'], errors='coerce')
df_plot['parliament_hearing_date'] = pd.to_datetime(df_plot['parliament_hearing_date'], errors='coerce')
df_plot['plenary_debate_date'] = pd.to_datetime(df_plot['plenary_debate_date'], errors='coerce')
df_plot['official_communication_adoption_date'] = pd.to_datetime(df_plot['official_communication_adoption_date'], errors='coerce')


# Calculate days from collection ended
df_plot['days_to_verification'] = (df_plot['valid_initiative'] - df_plot['collection_closed']).dt.days
df_plot['days_to_submission'] = (df_plot['commission_submission_date'] - df_plot['collection_closed']).dt.days
df_plot['days_to_hearing'] = (df_plot['parliament_hearing_date'] - df_plot['collection_closed']).dt.days
df_plot['days_to_plenary'] = (df_plot['plenary_debate_date'] - df_plot['collection_closed']).dt.days
df_plot['days_to_decision'] = (df_plot['official_communication_adoption_date'] - df_plot['collection_closed']).dt.days


# Calculate the duration for verification to submission
df_plot['verification_to_submission_duration'] = df_plot['days_to_submission'] - df_plot['days_to_verification']


# Check if all verification to submission durations are 0
all_verification_to_submission_zero = (
    df_plot['verification_to_submission_duration'].dropna() == 0
).all() if len(df_plot['verification_to_submission_duration'].dropna()) > 0 else True


# Truncate long titles
def truncate_title(title, max_length=50):
    if pd.isna(title):
        return "Unknown"
    if len(title) > max_length:
        return title[:max_length-3] + "..."
    return title


df_plot['title_short'] = df_plot['initiative_title'].apply(lambda x: truncate_title(x, 55))


# Truncate commission answer
df_plot['commission_answer_truncated'] = df_plot['commission_answer_text'].apply(lambda x: format_hover_text(x)).apply(lambda x: truncate_text(x, 900))


# Classify outcomes
def classify_outcome(outcome_status):
    if pd.isna(outcome_status):
        return 'Unknown'
    if 'Rejected' in outcome_status:
        return 'Rejected'
    elif outcome_status in ['Law Active', 'Law Passed', 'Law Promised', 'Action Plan Created']:
        return 'Legislative Action Taken'
    else:
        return 'Other'

df_plot['outcome_category'] = df_plot['final_outcome_status'].apply(classify_outcome)


# Sort by decision date
df_plot = df_plot.sort_values('days_to_decision', ascending=True).reset_index(drop=True)


# Create figure
fig = go.Figure()


# Define colors using Viridis scale
# Viridis goes from Dark Purple (0) to Yellow (1)
# We want: Darkest (earliest) -> Brightest (latest)
# So we map our 5 periods to the Viridis scale

viridis_colors = px.colors.sequential.Viridis

# We have 5 periods, so we can pick 5 equidistant colors or manually select them for good contrast
# Since Viridis is perceptually uniform, equidistant indices work well
# There are typically 10 colors in the default Viridis list
colors = {
    'verification': viridis_colors[0],  # Darkest (Purple)
    'submission': viridis_colors[2],    # Medium-Dark (Blueish)
    'hearing': viridis_colors[4],       # Medium (Teal/Green)
    'plenary': viridis_colors[7],       # Medium-Light (Light Green)
    'decision': viridis_colors[9]       # Brightest (Yellow)
}


# Track which legend groups have been shown
legend_shown = set()


# Add stacked bars for periods
for idx, row in df_plot.iterrows():
    y_pos = idx
    
    # Period 1: Collection to Verification
    if pd.notna(row['days_to_verification']):
        show_legend = 'verification' not in legend_shown
        if show_legend:
            legend_shown.add('verification')
        
        fig.add_trace(go.Bar(
            x=[row['days_to_verification']],
            y=[row['title_short']],
            orientation='h',
            marker=dict(color=colors['verification']),
            name='Collection ‚Üí Verification',
            legendgroup='verification',
            showlegend=show_legend,
            legendrank=1,
            hovertemplate=f"<b>{row['initiative_title']}</b><br>Verification: {row['days_to_verification']:.0f} days<extra></extra>"
        ))
    
    # Period 2: Verification to Submission
    if pd.notna(row['days_to_submission']) and pd.notna(row['days_to_verification']):
        duration = row['days_to_submission'] - row['days_to_verification']
        
        # Only show in legend if not all durations are zero
        show_legend = ('submission' not in legend_shown) and (not all_verification_to_submission_zero)
        if show_legend:
            legend_shown.add('submission')
        
        fig.add_trace(go.Bar(
            x=[duration],
            y=[row['title_short']],
            orientation='h',
            marker=dict(color=colors['submission']),
            name='Verification ‚Üí Submission',
            legendgroup='submission',
            showlegend=show_legend,
            legendrank=2,
            hovertemplate=f"<b>{row['initiative_title']}</b><br>To Submission: {duration:.0f} days<extra></extra>"
        ))
    
    # Period 3: Submission to Hearing
    if pd.notna(row['days_to_hearing']) and pd.notna(row['days_to_submission']):
        duration = row['days_to_hearing'] - row['days_to_submission']
        show_legend = 'hearing' not in legend_shown
        if show_legend:
            legend_shown.add('hearing')
        
        fig.add_trace(go.Bar(
            x=[duration],
            y=[row['title_short']],
            orientation='h',
            marker=dict(color=colors['hearing']),
            name='Submission ‚Üí Hearing',
            legendgroup='hearing',
            showlegend=show_legend,
            legendrank=3,
            hovertemplate=f"<b>{row['initiative_title']}</b><br>To Hearing: {duration:.0f} days<extra></extra>"
        ))
    
    # Period 4: Hearing to Plenary (if exists)
    if pd.notna(row['days_to_plenary']) and pd.notna(row['days_to_hearing']):
        duration = row['days_to_plenary'] - row['days_to_hearing']
        show_legend = 'plenary' not in legend_shown
        if show_legend:
            legend_shown.add('plenary')
        
        fig.add_trace(go.Bar(
            x=[duration],
            y=[row['title_short']],
            orientation='h',
            marker=dict(color=colors['plenary']),
            name='Hearing ‚Üí Plenary',
            legendgroup='plenary',
            showlegend=show_legend,
            legendrank=4,
            hovertemplate=f"<b>{row['initiative_title']}</b><br>To Plenary: {duration:.0f} days<extra></extra>"
        ))
    
    # Period 5: Last event to Decision
    if pd.notna(row['days_to_decision']):
        # Find the last milestone before decision
        last_milestone = 0
        if pd.notna(row['days_to_plenary']):
            last_milestone = row['days_to_plenary']
        elif pd.notna(row['days_to_hearing']):
            last_milestone = row['days_to_hearing']
        elif pd.notna(row['days_to_submission']):
            last_milestone = row['days_to_submission']
        elif pd.notna(row['days_to_verification']):
            last_milestone = row['days_to_verification']
        
        duration = row['days_to_decision'] - last_milestone
        show_legend = 'decision' not in legend_shown
        if show_legend:
            legend_shown.add('decision')
        
        fig.add_trace(go.Bar(
            x=[duration],
            y=[row['title_short']],
            orientation='h',
            marker=dict(color=colors['decision']),
            name='Last Event ‚Üí Decision',
            legendgroup='decision',
            showlegend=show_legend,
            legendrank=5,
            hovertemplate=f"<b>{row['initiative_title']}</b><br>To Decision: {duration:.0f} days<extra></extra>"
        ))


# Add outcome markers at the end of bars
outcome_markers = {
    'Legislative Action Taken': [],
    'Rejected': []
}

for idx, row in df_plot.iterrows():
    if pd.notna(row['days_to_decision']) and row['outcome_category'] in outcome_markers:
        outcome_markers[row['outcome_category']].append({
            'x': row['days_to_decision'],
            'y': row['title_short'],
            'title': row['initiative_title'],
            'outcome': row['final_outcome_status'],
            'answer': row['commission_answer_truncated'],
            'days': row['days_to_decision']
        })

# Add markers for Legislative Action Taken
if outcome_markers['Legislative Action Taken']:
    data = outcome_markers['Legislative Action Taken']
    fig.add_trace(go.Scatter(
        x=[d['x'] for d in data],
        y=[d['y'] for d in data],
        mode='markers',
        marker=dict(
            symbol='circle',
            size=14,
            color='rgb(46, 125, 50)',
            line=dict(width=2, color='white')
        ),
        name='‚úì Legislative Action Taken',
        legendgroup='outcome_positive',
        customdata=[[d['title'], d['outcome'], d['answer'], d['days']] for d in data],
        hovertemplate=(
            '<b>%{customdata[0]}</b><br>' +
            'Outcome: %{customdata[1]}<br>' +
            'Days since collection closed: %{customdata[3]:.0f}<br>' +
            '<br>' +
            '<i>Commission Answer:</i><br>' +
            '%{customdata[2]}<extra></extra>'
        )
    ))

# Add markers for Rejected
if outcome_markers['Rejected']:
    data = outcome_markers['Rejected']
    fig.add_trace(go.Scatter(
        x=[d['x'] for d in data],
        y=[d['y'] for d in data],
        mode='markers',
        marker=dict(
            symbol='x',
            size=14,
            color='rgb(198, 40, 40)',
            line=dict(width=2, color='white')
        ),
        name='‚úó Rejected',
        legendgroup='outcome_negative',
        customdata=[[d['title'], d['outcome'], d['answer'], d['days']] for d in data],
        hovertemplate=(
            '<b>%{customdata[0]}</b><br>' +
            'Outcome: %{customdata[1]}<br>' +
            'Days since collection closed: %{customdata[3]:.0f}<br>' +
            '<br>' +
            '<i>Commission Answer:</i><br>' +
            '%{customdata[2]}<extra></extra>'
        )
    ))


# Update layout
fig.update_layout(
    title='<b>ECI Timeline: From Collection End to Commission Decision</b>',
    xaxis=dict(
        title='Days from Collection End',
        gridcolor='rgb(230, 230, 230)',
        zeroline=True
    ),
    yaxis=dict(
        title='',
        tickfont=dict(size=10)
    ),
    barmode='stack',
    height=600,
    width=1200,
    plot_bgcolor='white',
    hovermode='closest',
    legend=dict(
        orientation='v',
        yanchor='top',
        y=1,
        xanchor='left',
        x=1.02,
        font=dict(size=9)
    ),
    margin=dict(l=300)
)


fig.show()


**NOTE: UNDERSTANDING THE ECI TIMELINE STEPS**<br>
> - **Collection ‚Üí Verification**:<br>After the 12-month collection period closes, organizers have 3 months to submit statements of support to national authorities for verification.<br><br>
> - **Verification ‚Üí Submission**:<br>Once verification is complete (max 3 months), organizers have 3 months to submit the successful initiative to the European Commission.<br><br>
> - **Submission ‚Üí Hearing**:<br>The European Parliament must organize a public hearing within 3 months of submission to allow organizers to present their initiative.<br>In the [End the Cage Age strategy video](https://www.animalwelfareintergroup.eu/calendar/european-citizens-initiative-end-cage-age), organizers plan to make sure friendly politicians get a chance to speak during the hearing. Their main goal is to show strong facts and science so that Parliament will later vote to support their ideas.<br><br>
> - **Hearing ‚Üí Plenary**:<br>Parliament holds a plenary debate to discuss the initiative. <br>Since 1 January 2020, a plenary debate is [**mandatory**](https://eur-lex.europa.eu/eli/reg/2019/788/oj/eng) for all successful ECIs. While organizers do not have a formal speaking role in the plenary (unlike the hearing), this is a critical political moment where Parliament decides whether to adopt a **resolution** supporting the initiative, increasing pressure on the Commission to act.<br><br>
> - **Last Event ‚Üí Decision**:<br>The Commission adopts a formal Communication setting out its legal and political conclusions, the action it intends to take (or not take), and its reasoning.



<a id='question-4'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">4. What funding patterns distinguish successful ECIs from unsuccessful ones?</p>

**Visualizations:**

- Grouped bar: Average funding (successful vs unsuccessful)
- Histogram: Number of sponsors distribution with success overlay
- Comparison bar: Private vs organizational sponsor success rates
- Stacked bar: Funding thresholds (‚Ç¨50k, ‚Ç¨100k+) vs outcomes

In [13]:
# Q4 Analysis Code
# Convert funding_total to numeric first (FIX HERE)
if 'funding_total' in df_initiatives.columns:
    df_initiatives['funding_total'] = pd.to_numeric(
        df_initiatives['funding_total'], errors='coerce')

# Funding by success status
funding_by_success = df_initiatives.groupby('is_successful')['funding_total'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('std', 'std'),
    ('max', 'max')
]).round(2)
funding_by_success.index = ['Unsuccessful', 'Successful']

print("Funding Statistics by Success:")
print(funding_by_success)

# Funding threshold analysis
df_initiatives['funding_level'] = pd.cut(df_initiatives['funding_total'],
    bins=[-1, 0, 50000, 100000, float('inf')],
    labels=['None', '‚Ç¨0-50k', '‚Ç¨50k-100k', '‚Ç¨100k+'])

funding_threshold = df_initiatives.groupby('funding_level')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100)
]).round(1)

print("\nSuccess Rate by Funding Level:")
print(funding_threshold)

# Sponsor analysis if available
if 'funding_by' in df_initiatives.columns:
    # Parse funding sources (JSON)
    def count_sponsors(x):
        try:
            return len(json.loads(x)) if pd.notna(x) else 0
        except:
            return 0
    
    df_initiatives['sponsors_count'] = df_initiatives['funding_by'].apply(count_sponsors)
    
    sponsor_by_success = df_initiatives.groupby('is_successful')['sponsors_count'].agg([
        ('count', 'count'),
        ('mean', 'mean'),
        ('median', 'median')
    ]).round(1)
    sponsor_by_success.index = ['Unsuccessful', 'Successful']
    
    print("\nSponsor Count by Success:")
    print(sponsor_by_success)

Funding Statistics by Success:
              count   mean  median  std    max
Unsuccessful      1  500.0   500.0  NaN  500.0
Successful        0    NaN     NaN  NaN    NaN

Success Rate by Funding Level:
               count  successes  success_rate_%
funding_level                                  
None               0          0             NaN
‚Ç¨0-50k             1          0             0.0
‚Ç¨50k-100k          0          0             NaN
‚Ç¨100k+             0          0             NaN

Sponsor Count by Success:
              count  mean  median
Unsuccessful    110   2.0     0.0
Successful       11  21.9     8.0


<a id='question-5'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">5. Which geographic strategies correlate with ECI success?</p>

**Visualizations:**

- Network/heatmap: Most frequent country combinations
- Comparison bar: Success rates when Germany/France meet thresholds
- Scatter: Number of countries vs outcome
- Box plot: Geographic diversity index by outcome

In [14]:
# Q5 Analysis Code
# Countries meeting threshold analysis
if 'signatures_collected_by_country' in df_initiatives.columns:
    def count_countries_met_threshold(sig_by_country_json):
        try:
            if pd.isna(sig_by_country_json):
                return 0
            data = json.loads(sig_by_country_json)
            countries_met = sum(1 for country, stats in data.items() 
                              if float(stats.get('percentage', 0)) >= 100)
            return countries_met
        except:
            return 0
    
    df_initiatives['countries_threshold_met'] = df_initiatives['signatures_collected_by_country'].apply(
        count_countries_met_threshold)
    
    # Convert to numeric to ensure proper aggregation
    df_initiatives['countries_threshold_met'] = pd.to_numeric(
        df_initiatives['countries_threshold_met'], errors='coerce').fillna(0).astype(int)
    
    # Countries by success
    country_by_success = df_initiatives.groupby('is_successful')['countries_threshold_met'].agg([
        ('count', 'count'),
        ('mean', 'mean'),
        ('median', 'median'),
        ('std', 'std')
    ]).round(1)
    country_by_success.index = ['Unsuccessful', 'Successful']
    
    print("Countries Meeting Threshold by Success:")
    print(country_by_success)
    
    # Success rate by country threshold distribution
    df_initiatives['country_category'] = pd.cut(
        df_initiatives['countries_threshold_met'],
        bins=[-0.1, 3, 7, 12, 30],
        labels=['0-3', '4-7', '8-12', '13+'],
        include_lowest=True)
    
    country_success = df_initiatives.groupby('country_category', observed=False, dropna=False)['is_successful'].agg([
        ('count', 'count'),
        ('successes', 'sum'),
        ('success_rate_%', lambda x: (x.sum()/len(x))*100 if len(x) > 0 else 0)
    ]).round(1)
    
    print("\nSuccess Rate by Country Diversity:")
    print(country_success)

Countries Meeting Threshold by Success:
              count  mean  median  std
Unsuccessful    110   0.0     0.0  0.0
Successful       11   0.0     0.0  0.0

Success Rate by Country Diversity:
                  count  successes  success_rate_%
country_category                                  
0-3                 121         11             9.1
4-7                   0          0             0.0
8-12                  0          0             0.0
13+                   0          0             0.0


<a id='question-6'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">6. What organizational characteristics optimize ECI success?</p>

**Visualizations:**

- Histogram: Team size distribution with success overlay
- Bar: Multiple vs single representative success rates
- Comparison: Multi-country vs single-country teams
- Box plot: Optimal team size by outcome

In [15]:
# Q6 Analysis Code
# Parse organizer structure
def parse_organizer_count(org_json):
    try:
        if pd.isna(org_json):
            return 0
        data = json.loads(org_json)
        return data.get('number_of_people', 0)
    except:
        return 0

df_initiatives['organizer_count'] = df_initiatives['organizer_representative'].apply(parse_organizer_count)

# Convert to numeric and handle NaN
df_initiatives['organizer_count'] = pd.to_numeric(
    df_initiatives['organizer_count'], errors='coerce').fillna(0).astype(int)

# Team size by success
team_by_success = df_initiatives.groupby('is_successful')['organizer_count'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('std', 'std')
]).round(1)
team_by_success.index = ['Unsuccessful', 'Successful']

print("Organizer Count by Success:")
print(team_by_success)

# Optimal team size
df_initiatives['team_category'] = pd.cut(df_initiatives['organizer_count'],
    bins=[-0.1, 2, 5, 10, float('inf')],
    labels=['1-2', '3-5', '6-10', '10+'],
    include_lowest=True)

team_success = df_initiatives.groupby('team_category', observed=False, dropna=False)['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100 if len(x) > 0 else 0)
]).round(1)

print("\nSuccess Rate by Team Size:")
print(team_success)

# Multi-country organizers
def has_international_team(org_json):
    try:
        if pd.isna(org_json):
            return False
        data = json.loads(org_json)
        countries = data.get('countries_of_residence', {})
        return len(countries) > 1
    except:
        return False

df_initiatives['is_international_team'] = df_initiatives['organizer_representative'].apply(has_international_team)

intl_success = df_initiatives.groupby('is_international_team')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100 if len(x) > 0 else 0)
]).round(1)

# FIX: Only rename index if we have both groups
if len(intl_success) == 2:
    intl_success.index = ['Single Country', 'Multi-Country']
elif len(intl_success) == 1:
    # Check which group exists
    if intl_success.index[0] == False:
        intl_success.index = ['Single Country']
    else:
        intl_success.index = ['Multi-Country']

print("\nSuccess Rate by Team Internationality:")
print(intl_success)

Organizer Count by Success:
              count  mean  median  std
Unsuccessful    110   1.0     1.0  0.1
Successful       11   1.0     1.0  0.0

Success Rate by Team Size:
               count  successes  success_rate_%
team_category                                  
1-2              121         11             9.1
3-5                0          0             0.0
6-10               0          0             0.0
10+                0          0             0.0

Success Rate by Team Internationality:
                count  successes  success_rate_%
Single Country    121         11             9.1


<a id='question-7'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">7. How do content features affect ECI outcomes?</p>

**Visualizations:**

- Bar: Annexes present vs absent success rates
- Box plot: Number of languages vs outcome
- Stacked bar: Existing legislation vs new frameworks
- Pie: Amendment vs new law requests

In [16]:
# Q7 Analysis Code
# Annexes analysis
df_initiatives['has_annex'] = df_initiatives['annex'].notna().astype(int)

annex_success = df_initiatives.groupby('has_annex')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100)
]).round(1)
annex_success.index = ['No Annex', 'Has Annex']

print("Success Rate by Annex Presence:")
print(annex_success)

# Language availability
def count_languages(lang_str):
    try:
        if isinstance(lang_str, str) and lang_str.startswith('['):
            return len(eval(lang_str))
        elif isinstance(lang_str, str):
            return len(lang_str.split(','))
        return 0
    except:
        return 0

df_initiatives['language_count'] = df_initiatives['languages_available'].apply(count_languages)

lang_success = df_initiatives.groupby('is_successful')['language_count'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('std', 'std')
]).round(1)
lang_success.index = ['Unsuccessful', 'Successful']

print("\nLanguage Count by Success:")
print(lang_success)

# Legislative target (existing vs new)
import re

def classify_legislation_target(objective, title):
    if pd.isna(objective):
        return 'Unknown'
    text = f"{title} {objective}".lower()
    
    # Check for explicit directive/regulation references
    if re.search(r'directive \d{4}/\d{1,3}', text):
        return 'Existing'
    if re.search(r'regulation \(eu\)', text):
        return 'Existing'
    if any(verb in text for verb in ['abrogate', 'amend', 'repeal']):
        return 'Existing'
    if any(verb in text for verb in ['propose legislation', 'establish', 'create']):
        return 'New'
    
    return 'Unclear'

df_initiatives['leg_target'] = df_initiatives.apply(
    lambda x: classify_legislation_target(x['objective'], x['title']), axis=1)

target_success = df_initiatives.groupby('leg_target')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100)
]).round(1)

print("\nSuccess Rate by Legislative Target:")
print(target_success)

Success Rate by Annex Presence:
           count  successes  success_rate_%
No Annex      74          9            12.2
Has Annex     47          2             4.3

Language Count by Success:
              count  mean  median  std
Unsuccessful    110  24.0    24.0  0.0
Successful       11  24.0    24.0  0.0

Success Rate by Legislative Target:
            count  successes  success_rate_%
leg_target                                  
Existing       12          2            16.7
New            25          3            12.0
Unclear        84          6             7.1


<a id='question-8'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">8. What Commission engagement patterns predict implementation success?</p>

**Visualizations:**

- Grouped bar: Commission official roles met vs outcomes
- Comparison bar: Deadline presence vs law implementation rate
- Multi-panel: Follow-up activity profile by outcome
- Stacked bar: Roadmaps/workshops vs implementation

In [17]:
# Q8 Analysis Code
# Commission engagement from merger file
if 'commission_officials_met' in df_merger.columns:
    has_officials = df_merger['commission_officials_met'].notna().sum()
    print(f"ECIs with Commission officials met: {has_officials}/{len(df_merger)}")

# Deadlines analysis
if 'commission_deadlines' in df_merger.columns:
    df_merger['has_deadline'] = df_merger['commission_deadlines'].notna().astype(int)
    
    deadline_by_outcome = df_merger.groupby('has_deadline')['final_outcome_status'].value_counts()
    print("\nOutcomes by Deadline Presence:")
    print(deadline_by_outcome)

# Follow-up actions
if 'has_followup_section' in df_merger.columns:
    followup_summary = pd.DataFrame({
        'Has Roadmap': [df_merger['has_roadmap'].sum()],
        'Has Workshop': [df_merger['has_workshop'].sum()],
        'Has Partnership': [df_merger['has_partnership_programs'].sum()],
        'Total Actions': [len(df_merger)]
    })
    
    print("\nFollow-up Actions Summary:")
    print(followup_summary)
    
    # Actions by outcome
    actions_by_outcome = df_merger.groupby('final_outcome_status').agg({
        'has_roadmap': 'sum',
        'has_workshop': 'sum',
        'has_partnership_programs': 'sum'
    })
    
    print("\nFollow-up Actions by Outcome:")
    print(actions_by_outcome)

ECIs with Commission officials met: 11/11

Outcomes by Deadline Presence:
has_deadline  final_outcome_status          
0             Rejected - Already Covered        3
              Law Active                        2
              Action Plan Created               1
              Rejected - Alternative Actions    1
1             Law Active                        2
              Law Passed                        1
              Law Promised                      1
Name: count, dtype: int64

Follow-up Actions Summary:
   Has Roadmap  Has Workshop  Has Partnership  Total Actions
0            1             3                4             11

Follow-up Actions by Outcome:
                                has_roadmap  has_workshop  \
final_outcome_status                                        
Action Plan Created                       1             1   
Law Active                                0             1   
Law Passed                                0             0   
Law Promised       

<a id='question-9'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">9. What commission response mechanisms characterize different outcomes?</p>

**Visualizations:**

- Box plot: Number of referenced legislation pieces by outcome
- Bar: Impact assessment presence vs outcomes
- Stacked bar: Stakeholder dialogue frequency
- Small multiples: Court cases presence and outcomes

In [18]:
# Q9 Analysis Code
# Referenced legislation
if 'referenced_legislation_by_name' in df_merger.columns:
    def count_referenced_legislation(ref_json):
        try:
            data = json.loads(ref_json)
            count = 0
            for category, items in data.items():
                if isinstance(items, list):
                    count += len(items)
                elif isinstance(items, dict):
                    count += len(items)
            return count
        except:
            return 0
    
    df_merger['referenced_leg_count'] = df_merger['referenced_legislation_by_name'].apply(count_referenced_legislation)
    
    leg_by_outcome = df_merger.groupby('final_outcome_status')['referenced_leg_count'].agg([
        ('count', 'count'),
        ('mean', 'mean'),
        ('median', 'median')
    ]).round(1)
    
    print("Referenced Legislation by Outcome:")
    print(leg_by_outcome)

# Impact assessments and stakeholder dialogue
if 'policies_actions' in df_merger.columns:
    def has_impact_assessment(actions_json):
        try:
            data = json.loads(actions_json)
            return any('impact assessment' in str(action).lower() for action in data)
        except:
            return False
    
    df_merger['has_assessment'] = df_merger['policies_actions'].apply(has_impact_assessment)
    
    assessment_by_outcome = df_merger.groupby('final_outcome_status')['has_assessment'].agg([
        ('count', 'count'),
        ('assessments', 'sum')
    ])
    
    print("\nImpact Assessments by Outcome:")
    print(assessment_by_outcome)

# Court cases
if 'court_cases_referenced' in df_merger.columns:
    has_court = df_merger['court_cases_referenced'].notna().sum()
    print(f"\nECIs with court cases referenced: {has_court}/{len(df_merger)}")

Referenced Legislation by Outcome:
                                count  mean  median
final_outcome_status                               
Action Plan Created                 1   3.0     3.0
Law Active                          4   2.2     2.5
Law Passed                          1   2.0     2.0
Law Promised                        1   0.0     0.0
Rejected - Already Covered          3   2.0     2.0
Rejected - Alternative Actions      1   1.0     1.0

Impact Assessments by Outcome:
                                count  assessments
final_outcome_status                              
Action Plan Created                 1            0
Law Active                          4            2
Law Passed                          1            1
Law Promised                        1            1
Rejected - Already Covered          3            0
Rejected - Alternative Actions      1            0

ECIs with court cases referenced: 2/11


<a id='question-1'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">10. What Are the Key Findings?</p>

**Summary Analysis (Descriptive Only):**

- Simple comparisons: means, medians, distributions
- Correlation matrices: relationships between key variables
- Visual profiles: radar charts, parallel coordinates
- Key factors: distinguish successful from unsuccessful ECIs
- ‚ö†Ô∏è NO PREDICTIONS: Purely exploratory analysis

In [19]:
# # Q10 Analysis Code
# # Overall success rate
# overall_success_rate = (df_initiatives['is_successful'].sum() / len(df_initiatives)) * 100
# print(f"Overall ECI Success Rate: {overall_success_rate:.1f}%")

# # Comparative profile
# profile_comparison = df_initiatives.groupby('is_successful').agg({
#     'signatures_collected': ['mean', 'median'],
#     'collection_days': ['mean', 'median'],
#     'countries_threshold_met': 'mean',
#     'organizer_count': 'mean',
#     'language_count': 'mean',
#     'funding_total': ['mean', 'median']
# }).round(1)

# profile_comparison.index = ['Unsuccessful', 'Successful']

# print("\nSuccessful vs Unsuccessful ECI Profile Comparison:")
# print(profile_comparison)

# # Key distinguishing factors - FIX HERE
# numeric_cols = df_initiatives.select_dtypes(include=[np.number]).columns.tolist()
# # Filter columns with enough non-null data
# numeric_cols = [col for col in numeric_cols if df_initiatives[col].notna().sum() > 10]

# if numeric_cols and 'is_successful' in df_initiatives.columns:
#     try:
#         # Compute correlation matrix
#         corr_data = df_initiatives[numeric_cols + ['is_successful']].copy()
#         # Drop columns that are all NaN or have no variance
#         corr_data = corr_data.loc[:, corr_data.notna().sum() > 10]
        
#         correlation_matrix = corr_data.corr()
        
#         if 'is_successful' in correlation_matrix.columns:
#             correlation = correlation_matrix['is_successful'].drop('is_successful').sort_values(ascending=False)
#             print("\nTop Factors Correlated with Success:")
#             print(correlation.head(10))
#     except Exception as e:
#         print(f"\nCorrelation analysis skipped: {e}")

# # Outcome distribution in successful ECIs
# if 'df_merger' in locals() and len(df_merger) > 0:
#     print("\nCommission Outcome Distribution (for successful ECIs):")
#     print(df_merger['final_outcome_status'].value_counts())
    
#     # Implementation rate
#     impl_rate = df_merger['law_implementation_date'].notna().sum() / len(df_merger) * 100
#     print(f"\nLaw Implementation Rate: {impl_rate:.1f}%")