<a id='introduction'></a>
# <p style="padding:15px;background-color:#fff798;margin:10px 0;color:#435672;font-family:'Arial',sans-serif;text-align:center;border-radius:15px 50px;overflow:hidden;font-weight:600">üá™üá∫üèõÔ∏è European Citizens' Initiative: Commission Response</p>

<div align="center">
  <img src="LOGO CE_RGB_MUTE_POS.svg" alt="EU Commission Logo" height="200" style="display:inline-block; margin:10px;">
  <img src="1_2021_1-1.jpg" alt="ECI Material" height="200" style="display:inline-block; margin:10px;">
</div>

<p style="text-align:center;">
  <i>Source: European Citizens' Initiative | European Commission (CC BY 4.0)</i>
</p>

Examines Commission responses to [European Citizens' Initiative proposals](https://commission.europa.eu/get-involved/engage-eu-policymaking/european-citizens-initiative_en) that successfully met signature thresholds between 2012 and 2025. Once an ECI collects 1 million signatures from at least 7 member states, the Commission must provide a formal response within 6 months explaining whether it will propose new legislation. This dataset tracks 11 out of 16 ECIs that successfully met both signature criteria (1M+ signatures and 7-country thresholds)‚Äîanalyzing Commission response types, implementation timelines, parliamentary engagement, and follow-up actions.

This analysis focuses exclusively on what happens after ECIs meet signature requirements, building upon the previous [**üá™üá∫‚úçÔ∏è European Citizens' Initiative: Signature Collection**]() study which examined all 121 registered ECIs. It does not cover the registration approval process itself, including which proposed ECIs were refused registration or how to prepare a successful registration application‚Äî[more about this](https://citizens-initiative.europa.eu/how-it-works_en).

Success in this analysis is measured by Commission outcome categories (Law Active, Law Promised, Rejected, etc.) and implementation status, not by whether proposals were substantively "correct" or how individual organizers interpret their outcomes.

<a id='table-of-contents'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">üß≠ Table of content</p>

[üåü **Introduction**](#introduction)

[‚ùì **Questions to Ask:**](#question-1)
- [1. Success Patterns](#question-1)
- [2. Temporal Patterns](#question-2)
- [3. Parliament Actions](#question-3)
- [4. Funding Patterns](#question-4)
- [5. Geographic Strategies](#question-5)
- [6. Organizational](#question-6)
- [7. Content Features](#question-7)
- [8. Commission Engagement](#question-8)
- [9. Response Mechanisms](#question-9)
- [10. Key Findings](#question-10)


<a id='setup'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">‚öôÔ∏è Setup: Import Libraries and Load Data</p>

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import json
import warnings
warnings.filterwarnings('ignore')

# Set defaults
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)

# Load main datasets
# Load the dataset
data_folder = "../data/2025-09-18_16-33-57"

df_initiatives = pd.read_csv(f'{data_folder}/eci_initiatives_2025-11-04_11-59-38.csv')
df_merger = pd.read_csv(f'{data_folder}/eci_merger_responses_and_followup_2025-12-15_15-33-12.csv')

print(f"‚úì Initiatives file: {df_initiatives.shape[0]} ECIs")
print(f"‚úì Merger file: {df_merger.shape[0]} Commission responses")
print(f"\nColumns: {len(df_initiatives.columns)} initiative columns")
print(f"        {len(df_merger.columns)} response columns")

‚úì Initiatives file: 121 ECIs
‚úì Merger file: 11 Commission responses

Columns: 26 initiative columns
        36 response columns


<a id='data-cleaning'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">üßπ Data Cleaning and Feature Engineering</p>

In [None]:
# Convert date columns
date_cols = ['timeline_registered', 'timeline_collection_start_date', 
             'timeline_collection_closed', 'timeline_verification_start',
             'timeline_verification_end', 'timeline_response_commission_date']

for col in date_cols:
    if col in df_initiatives.columns:
        df_initiatives[col] = pd.to_datetime(df_initiatives[col], errors='coerce')

# Parse JSON fields if needed
def safe_json_load(x):
    try:
        return json.loads(x) if pd.notna(x) else None
    except:
        return None

# Example: Parse organizer data
if 'organizer_representative' in df_initiatives.columns:
    df_initiatives['organizer_data'] = df_initiatives['organizer_representative'].apply(safe_json_load)

# Create derived features
# Collection duration
df_initiatives['collection_days'] = (df_initiatives['timeline_collection_closed'] - 
                                     df_initiatives['timeline_collection_start_date']).dt.days

# Define success outcomes
success_outcomes = ['Commission Response', 'Answered initiative', 'Valid initiative']
df_initiatives['is_successful'] = df_initiatives['final_outcome'].isin(success_outcomes).astype(int)

# Success rate by year
df_initiatives['registration_year'] = df_initiatives['timeline_registered'].dt.year

# Convert signatures_collected to numeric (FIX HERE)
if 'signatures_collected' in df_initiatives.columns:
    df_initiatives['signatures_collected'] = pd.to_numeric(
        df_initiatives['signatures_collected'], errors='coerce')

# Signature threshold met (convert if string)
if 'signatures_threshold_met' in df_initiatives.columns:
    if df_initiatives['signatures_threshold_met'].dtype == 'object':
        df_initiatives['signatures_threshold_met'] = pd.to_numeric(
            df_initiatives['signatures_threshold_met'], errors='coerce')

# Create signature volume categories (AFTER conversion to numeric)
df_initiatives['signature_category'] = pd.cut(
    df_initiatives['signatures_collected'],
    bins=[0, 100000, 500000, 1000000, float('inf')],
    labels=['<100k', '100k-500k', '500k-1M', '>1M'],
    include_lowest=True)

‚úì Feature engineering complete


<a id='question-1'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">1. What are the success patterns and outcome distributions for ECIs?</p>

<div class="analysis-components">
    <strong>Visualizations:</strong>
    <ul>
        <li>Pie chart: Outcome distribution (Law Active, Rejected, etc.)</li>
        <li>Scatter plot: Total signatures vs outcome type</li>
        <li>Grouped bar: Success rate by signature volume</li>
        <li>Data tables: Outcome counts, signature statistics</li>
    </ul>
</div>

In [7]:
# Q1 Analysis Code
# Outcome distribution
outcome_dist = df_initiatives['final_outcome'].value_counts()
print("Outcome Distribution:")
print(outcome_dist)
print(f"\nSuccess Rate: {df_initiatives['is_successful'].mean()*100:.1f}%")

# Signature statistics by outcome
sig_by_outcome = df_initiatives.groupby('final_outcome')['signatures_collected'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('max', 'max')
]).round(0).sort_values('count', ascending=False)

print("\nSignature Statistics by Outcome:")
print(sig_by_outcome)

# Success rate by signature category
if 'signature_category' in df_initiatives.columns:
    cat_success = df_initiatives.groupby('signature_category')['is_successful'].agg([
        ('count', 'count'),
        ('successes', 'sum'),
        ('success_rate_%', lambda x: (x.sum()/len(x))*100)
    ]).round(1)
    print("\nSuccess Rate by Signature Volume:")
    print(cat_success)

Outcome Distribution:
final_outcome
Unsuccessful Collection    71
Withdrawn                  27
Commission Response        11
Name: count, dtype: int64

Success Rate: 9.1%

Signature Statistics by Outcome:
                         count   mean  median    max
final_outcome                                       
Unsuccessful Collection      1  254.0   254.0  254.0
Withdrawn                    1  291.0   291.0  291.0
Commission Response          0    NaN     NaN    NaN

Success Rate by Signature Volume:
                    count  successes  success_rate_%
signature_category                                  
<100k                   2          0             0.0
100k-500k               0          0             NaN
500k-1M                 0          0             NaN
>1M                     0          0             NaN


<a id='question-2'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">2. What are the key temporal patterns from submission through implementation?</p>

**Visualizations:**

- Timeline waterfall: Collection ‚Üí Verification ‚Üí Response ‚Üí Implementation
- Box plot: Response time distribution (successful vs rejected)
- Scatter: Verification duration vs outcome correlation
- Data tables: Median times by outcome, milestone durations

In [8]:
# Q2 Analysis Code
# Calculate timing metrics
df_initiatives['verification_days'] = (df_initiatives['timeline_verification_end'] - 
                                       df_initiatives['timeline_verification_start']).dt.days

df_initiatives['response_days'] = (df_initiatives['timeline_response_commission_date'] - 
                                   df_initiatives['timeline_collection_closed']).dt.days

# Response time by outcome (merger file)
if 'commission_submission_date' in df_merger.columns and 'law_implementation_date' in df_merger.columns:
    df_merger['commission_submission_date'] = pd.to_datetime(df_merger['commission_submission_date'], errors='coerce')
    df_merger['law_implementation_date'] = pd.to_datetime(df_merger['law_implementation_date'], errors='coerce')
    df_merger['response_lag_days'] = (df_merger['law_implementation_date'] - 
                                      df_merger['commission_submission_date']).dt.days
    
    print("Commission Response Lag Statistics:")
    print(df_merger[['registration_number', 'response_lag_days']].dropna().describe())

# Collection duration comparison
collection_by_success = df_initiatives.groupby('is_successful')['collection_days'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('std', 'std')
]).round(1)
collection_by_success.index = ['Unsuccessful', 'Successful']

print("\nCollection Duration (Days) by Success:")
print(collection_by_success)

Commission Response Lag Statistics:
       response_lag_days
count           3.000000
mean         1808.000000
std          1473.196864
min           681.000000
25%           974.500000
50%          1268.000000
75%          2371.500000
max          3475.000000

Collection Duration (Days) by Success:
              count   mean  median    std
Unsuccessful     31  388.5   365.0  127.3
Successful        5  325.2   365.0  267.5


<a id='question-3'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">3. How do optional Parliament actions correlate with Commission decisions?</p>

**Visualizations:**

- Histogram: Time from submission to Parliament hearing
- Histogram: Time from hearing to Commission decision
- Histogram: Time from plenary debate to Commission decision
- Stacked bar: Parliament engagement presence vs outcomes

In [9]:
# Q3 Analysis Code
# Parliament engagement from merger file
if 'parliament_hearing_date' in df_merger.columns:
    df_merger['parliament_hearing_date'] = pd.to_datetime(df_merger['parliament_hearing_date'], errors='coerce')
    df_merger['commission_submission_date'] = pd.to_datetime(df_merger['commission_submission_date'], errors='coerce')
    
    df_merger['hearing_to_submission_days'] = (df_merger['parliament_hearing_date'] - 
                                              df_merger['commission_submission_date']).dt.days
    
    print("Days from Submission to Parliament Hearing:")
    print(df_merger['hearing_to_submission_days'].describe())

# Plenary debate timing
if 'plenary_debate_date' in df_merger.columns:
    df_merger['plenary_debate_date'] = pd.to_datetime(df_merger['plenary_debate_date'], errors='coerce')
    df_merger['plenary_to_outcome_days'] = (df_merger['final_outcome_status'].notna().astype(int))
    
    # Check which ECIs have plenary debates
    has_plenary = df_merger['plenary_debate_date'].notna().sum()
    has_hearing = df_merger['parliament_hearing_date'].notna().sum()
    
    print(f"\nParliament Engagement:")
    print(f"  Hearings: {has_hearing}/{len(df_merger)} ECIs")
    print(f"  Plenary debates: {has_plenary}/{len(df_merger)} ECIs")
    
    # Compare outcomes by plenary presence
    plenary_by_outcome = df_merger.groupby(df_merger['plenary_debate_date'].notna())['final_outcome_status'].value_counts()
    print("\nOutcomes by Plenary Debate Presence:")
    print(plenary_by_outcome)

Days from Submission to Parliament Hearing:
count     11.000000
mean     111.181818
std       71.019460
min       41.000000
25%       64.000000
50%      109.000000
75%      119.000000
max      279.000000
Name: hearing_to_submission_days, dtype: float64

Parliament Engagement:
  Hearings: 11/11 ECIs
  Plenary debates: 7/11 ECIs

Outcomes by Plenary Debate Presence:
plenary_debate_date  final_outcome_status          
False                Law Active                        2
                     Rejected - Already Covered        1
                     Rejected - Alternative Actions    1
True                 Rejected - Already Covered        2
                     Action Plan Created               1
                     Being Studied                     1
                     Law Active                        1
                     Law Approved                      1
                     Law Promised                      1
Name: count, dtype: int64


<a id='question-4'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">4. What funding patterns distinguish successful ECIs from unsuccessful ones?</p>

**Visualizations:**

- Grouped bar: Average funding (successful vs unsuccessful)
- Histogram: Number of sponsors distribution with success overlay
- Comparison bar: Private vs organizational sponsor success rates
- Stacked bar: Funding thresholds (‚Ç¨50k, ‚Ç¨100k+) vs outcomes

In [12]:
# Q4 Analysis Code
# Convert funding_total to numeric first (FIX HERE)
if 'funding_total' in df_initiatives.columns:
    df_initiatives['funding_total'] = pd.to_numeric(
        df_initiatives['funding_total'], errors='coerce')

# Funding by success status
funding_by_success = df_initiatives.groupby('is_successful')['funding_total'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('std', 'std'),
    ('max', 'max')
]).round(2)
funding_by_success.index = ['Unsuccessful', 'Successful']

print("Funding Statistics by Success:")
print(funding_by_success)

# Funding threshold analysis
df_initiatives['funding_level'] = pd.cut(df_initiatives['funding_total'],
    bins=[-1, 0, 50000, 100000, float('inf')],
    labels=['None', '‚Ç¨0-50k', '‚Ç¨50k-100k', '‚Ç¨100k+'])

funding_threshold = df_initiatives.groupby('funding_level')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100)
]).round(1)

print("\nSuccess Rate by Funding Level:")
print(funding_threshold)

# Sponsor analysis if available
if 'funding_by' in df_initiatives.columns:
    # Parse funding sources (JSON)
    def count_sponsors(x):
        try:
            return len(json.loads(x)) if pd.notna(x) else 0
        except:
            return 0
    
    df_initiatives['sponsors_count'] = df_initiatives['funding_by'].apply(count_sponsors)
    
    sponsor_by_success = df_initiatives.groupby('is_successful')['sponsors_count'].agg([
        ('count', 'count'),
        ('mean', 'mean'),
        ('median', 'median')
    ]).round(1)
    sponsor_by_success.index = ['Unsuccessful', 'Successful']
    
    print("\nSponsor Count by Success:")
    print(sponsor_by_success)

Funding Statistics by Success:
              count   mean  median  std    max
Unsuccessful      1  500.0   500.0  NaN  500.0
Successful        0    NaN     NaN  NaN    NaN

Success Rate by Funding Level:
               count  successes  success_rate_%
funding_level                                  
None               0          0             NaN
‚Ç¨0-50k             1          0             0.0
‚Ç¨50k-100k          0          0             NaN
‚Ç¨100k+             0          0             NaN

Sponsor Count by Success:
              count  mean  median
Unsuccessful    110   2.0     0.0
Successful       11  21.9     8.0


<a id='question-5'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">5. Which geographic strategies correlate with ECI success?</p>

**Visualizations:**

- Network/heatmap: Most frequent country combinations
- Comparison bar: Success rates when Germany/France meet thresholds
- Scatter: Number of countries vs outcome
- Box plot: Geographic diversity index by outcome

In [16]:
# Q5 Analysis Code
# Countries meeting threshold analysis
if 'signatures_collected_by_country' in df_initiatives.columns:
    def count_countries_met_threshold(sig_by_country_json):
        try:
            if pd.isna(sig_by_country_json):
                return 0
            data = json.loads(sig_by_country_json)
            countries_met = sum(1 for country, stats in data.items() 
                              if float(stats.get('percentage', 0)) >= 100)
            return countries_met
        except:
            return 0
    
    df_initiatives['countries_threshold_met'] = df_initiatives['signatures_collected_by_country'].apply(
        count_countries_met_threshold)
    
    # Convert to numeric to ensure proper aggregation
    df_initiatives['countries_threshold_met'] = pd.to_numeric(
        df_initiatives['countries_threshold_met'], errors='coerce').fillna(0).astype(int)
    
    # Countries by success
    country_by_success = df_initiatives.groupby('is_successful')['countries_threshold_met'].agg([
        ('count', 'count'),
        ('mean', 'mean'),
        ('median', 'median'),
        ('std', 'std')
    ]).round(1)
    country_by_success.index = ['Unsuccessful', 'Successful']
    
    print("Countries Meeting Threshold by Success:")
    print(country_by_success)
    
    # Success rate by country threshold distribution
    df_initiatives['country_category'] = pd.cut(
        df_initiatives['countries_threshold_met'],
        bins=[-0.1, 3, 7, 12, 30],
        labels=['0-3', '4-7', '8-12', '13+'],
        include_lowest=True)
    
    country_success = df_initiatives.groupby('country_category', observed=False, dropna=False)['is_successful'].agg([
        ('count', 'count'),
        ('successes', 'sum'),
        ('success_rate_%', lambda x: (x.sum()/len(x))*100 if len(x) > 0 else 0)
    ]).round(1)
    
    print("\nSuccess Rate by Country Diversity:")
    print(country_success)

Countries Meeting Threshold by Success:
              count  mean  median  std
Unsuccessful    110   0.0     0.0  0.0
Successful       11   0.0     0.0  0.0

Success Rate by Country Diversity:
                  count  successes  success_rate_%
country_category                                  
0-3                 121         11             9.1
4-7                   0          0             0.0
8-12                  0          0             0.0
13+                   0          0             0.0


<a id='question-6'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">6. What organizational characteristics optimize ECI success?</p>

**Visualizations:**

- Histogram: Team size distribution with success overlay
- Bar: Multiple vs single representative success rates
- Comparison: Multi-country vs single-country teams
- Box plot: Optimal team size by outcome

In [19]:
# Q6 Analysis Code
# Parse organizer structure
def parse_organizer_count(org_json):
    try:
        if pd.isna(org_json):
            return 0
        data = json.loads(org_json)
        return data.get('number_of_people', 0)
    except:
        return 0

df_initiatives['organizer_count'] = df_initiatives['organizer_representative'].apply(parse_organizer_count)

# Convert to numeric and handle NaN
df_initiatives['organizer_count'] = pd.to_numeric(
    df_initiatives['organizer_count'], errors='coerce').fillna(0).astype(int)

# Team size by success
team_by_success = df_initiatives.groupby('is_successful')['organizer_count'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('std', 'std')
]).round(1)
team_by_success.index = ['Unsuccessful', 'Successful']

print("Organizer Count by Success:")
print(team_by_success)

# Optimal team size
df_initiatives['team_category'] = pd.cut(df_initiatives['organizer_count'],
    bins=[-0.1, 2, 5, 10, float('inf')],
    labels=['1-2', '3-5', '6-10', '10+'],
    include_lowest=True)

team_success = df_initiatives.groupby('team_category', observed=False, dropna=False)['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100 if len(x) > 0 else 0)
]).round(1)

print("\nSuccess Rate by Team Size:")
print(team_success)

# Multi-country organizers
def has_international_team(org_json):
    try:
        if pd.isna(org_json):
            return False
        data = json.loads(org_json)
        countries = data.get('countries_of_residence', {})
        return len(countries) > 1
    except:
        return False

df_initiatives['is_international_team'] = df_initiatives['organizer_representative'].apply(has_international_team)

intl_success = df_initiatives.groupby('is_international_team')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100 if len(x) > 0 else 0)
]).round(1)

# FIX: Only rename index if we have both groups
if len(intl_success) == 2:
    intl_success.index = ['Single Country', 'Multi-Country']
elif len(intl_success) == 1:
    # Check which group exists
    if intl_success.index[0] == False:
        intl_success.index = ['Single Country']
    else:
        intl_success.index = ['Multi-Country']

print("\nSuccess Rate by Team Internationality:")
print(intl_success)

Organizer Count by Success:
              count  mean  median  std
Unsuccessful    110   1.0     1.0  0.1
Successful       11   1.0     1.0  0.0

Success Rate by Team Size:
               count  successes  success_rate_%
team_category                                  
1-2              121         11             9.1
3-5                0          0             0.0
6-10               0          0             0.0
10+                0          0             0.0

Success Rate by Team Internationality:
                count  successes  success_rate_%
Single Country    121         11             9.1


<a id='question-7'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">7. How do content features affect ECI outcomes?</p>

**Visualizations:**

- Bar: Annexes present vs absent success rates
- Box plot: Number of languages vs outcome
- Stacked bar: Existing legislation vs new frameworks
- Pie: Amendment vs new law requests

In [20]:
# Q7 Analysis Code
# Annexes analysis
df_initiatives['has_annex'] = df_initiatives['annex'].notna().astype(int)

annex_success = df_initiatives.groupby('has_annex')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100)
]).round(1)
annex_success.index = ['No Annex', 'Has Annex']

print("Success Rate by Annex Presence:")
print(annex_success)

# Language availability
def count_languages(lang_str):
    try:
        if isinstance(lang_str, str) and lang_str.startswith('['):
            return len(eval(lang_str))
        elif isinstance(lang_str, str):
            return len(lang_str.split(','))
        return 0
    except:
        return 0

df_initiatives['language_count'] = df_initiatives['languages_available'].apply(count_languages)

lang_success = df_initiatives.groupby('is_successful')['language_count'].agg([
    ('count', 'count'),
    ('mean', 'mean'),
    ('median', 'median'),
    ('std', 'std')
]).round(1)
lang_success.index = ['Unsuccessful', 'Successful']

print("\nLanguage Count by Success:")
print(lang_success)

# Legislative target (existing vs new)
import re

def classify_legislation_target(objective, title):
    if pd.isna(objective):
        return 'Unknown'
    text = f"{title} {objective}".lower()
    
    # Check for explicit directive/regulation references
    if re.search(r'directive \d{4}/\d{1,3}', text):
        return 'Existing'
    if re.search(r'regulation \(eu\)', text):
        return 'Existing'
    if any(verb in text for verb in ['abrogate', 'amend', 'repeal']):
        return 'Existing'
    if any(verb in text for verb in ['propose legislation', 'establish', 'create']):
        return 'New'
    
    return 'Unclear'

df_initiatives['leg_target'] = df_initiatives.apply(
    lambda x: classify_legislation_target(x['objective'], x['title']), axis=1)

target_success = df_initiatives.groupby('leg_target')['is_successful'].agg([
    ('count', 'count'),
    ('successes', 'sum'),
    ('success_rate_%', lambda x: (x.sum()/len(x))*100)
]).round(1)

print("\nSuccess Rate by Legislative Target:")
print(target_success)

Success Rate by Annex Presence:
           count  successes  success_rate_%
No Annex      74          9            12.2
Has Annex     47          2             4.3

Language Count by Success:
              count  mean  median  std
Unsuccessful    110  24.0    24.0  0.0
Successful       11  24.0    24.0  0.0

Success Rate by Legislative Target:
            count  successes  success_rate_%
leg_target                                  
Existing       12          2            16.7
New            25          3            12.0
Unclear        84          6             7.1


<a id='question-8'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">8. What Commission engagement patterns predict implementation success?</p>

**Visualizations:**

- Grouped bar: Commission official roles met vs outcomes
- Comparison bar: Deadline presence vs law implementation rate
- Multi-panel: Follow-up activity profile by outcome
- Stacked bar: Roadmaps/workshops vs implementation

In [21]:
# Q8 Analysis Code
# Commission engagement from merger file
if 'commission_officials_met' in df_merger.columns:
    has_officials = df_merger['commission_officials_met'].notna().sum()
    print(f"ECIs with Commission officials met: {has_officials}/{len(df_merger)}")

# Deadlines analysis
if 'commission_deadlines' in df_merger.columns:
    df_merger['has_deadline'] = df_merger['commission_deadlines'].notna().astype(int)
    
    deadline_by_outcome = df_merger.groupby('has_deadline')['final_outcome_status'].value_counts()
    print("\nOutcomes by Deadline Presence:")
    print(deadline_by_outcome)

# Follow-up actions
if 'has_followup_section' in df_merger.columns:
    followup_summary = pd.DataFrame({
        'Has Roadmap': [df_merger['has_roadmap'].sum()],
        'Has Workshop': [df_merger['has_workshop'].sum()],
        'Has Partnership': [df_merger['has_partnership_programs'].sum()],
        'Total Actions': [len(df_merger)]
    })
    
    print("\nFollow-up Actions Summary:")
    print(followup_summary)
    
    # Actions by outcome
    actions_by_outcome = df_merger.groupby('final_outcome_status').agg({
        'has_roadmap': 'sum',
        'has_workshop': 'sum',
        'has_partnership_programs': 'sum'
    })
    
    print("\nFollow-up Actions by Outcome:")
    print(actions_by_outcome)

ECIs with Commission officials met: 11/11

Outcomes by Deadline Presence:
has_deadline  final_outcome_status          
0             Rejected - Already Covered        3
              Law Active                        2
              Action Plan Created               1
              Rejected - Alternative Actions    1
1             Being Studied                     1
              Law Active                        1
              Law Approved                      1
              Law Promised                      1
Name: count, dtype: int64

Follow-up Actions Summary:
   Has Roadmap  Has Workshop  Has Partnership  Total Actions
0            1             3                4             11

Follow-up Actions by Outcome:
                                has_roadmap  has_workshop  \
final_outcome_status                                        
Action Plan Created                       1             1   
Being Studied                             0             0   
Law Active                    

<a id='question-9'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">9. What commission response mechanisms characterize different outcomes?</p>

**Visualizations:**

- Box plot: Number of referenced legislation pieces by outcome
- Bar: Impact assessment presence vs outcomes
- Stacked bar: Stakeholder dialogue frequency
- Small multiples: Court cases presence and outcomes

In [22]:
# Q9 Analysis Code
# Referenced legislation
if 'referenced_legislation_by_name' in df_merger.columns:
    def count_referenced_legislation(ref_json):
        try:
            data = json.loads(ref_json)
            count = 0
            for category, items in data.items():
                if isinstance(items, list):
                    count += len(items)
                elif isinstance(items, dict):
                    count += len(items)
            return count
        except:
            return 0
    
    df_merger['referenced_leg_count'] = df_merger['referenced_legislation_by_name'].apply(count_referenced_legislation)
    
    leg_by_outcome = df_merger.groupby('final_outcome_status')['referenced_leg_count'].agg([
        ('count', 'count'),
        ('mean', 'mean'),
        ('median', 'median')
    ]).round(1)
    
    print("Referenced Legislation by Outcome:")
    print(leg_by_outcome)

# Impact assessments and stakeholder dialogue
if 'policies_actions' in df_merger.columns:
    def has_impact_assessment(actions_json):
        try:
            data = json.loads(actions_json)
            return any('impact assessment' in str(action).lower() for action in data)
        except:
            return False
    
    df_merger['has_assessment'] = df_merger['policies_actions'].apply(has_impact_assessment)
    
    assessment_by_outcome = df_merger.groupby('final_outcome_status')['has_assessment'].agg([
        ('count', 'count'),
        ('assessments', 'sum')
    ])
    
    print("\nImpact Assessments by Outcome:")
    print(assessment_by_outcome)

# Court cases
if 'court_cases_referenced' in df_merger.columns:
    has_court = df_merger['court_cases_referenced'].notna().sum()
    print(f"\nECIs with court cases referenced: {has_court}/{len(df_merger)}")

Referenced Legislation by Outcome:
                                count  mean  median
final_outcome_status                               
Action Plan Created                 1   3.0     3.0
Being Studied                       1   1.0     1.0
Law Active                          3   2.7     3.0
Law Approved                        1   2.0     2.0
Law Promised                        1   0.0     0.0
Rejected - Already Covered          3   2.0     2.0
Rejected - Alternative Actions      1   1.0     1.0

Impact Assessments by Outcome:
                                count  assessments
final_outcome_status                              
Action Plan Created                 1            0
Being Studied                       1            1
Law Active                          3            1
Law Approved                        1            1
Law Promised                        1            1
Rejected - Already Covered          3            0
Rejected - Alternative Actions      1            0

ECIs 

<a id='question-1'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">10. What Are the Key Findings?</p>

**Summary Analysis (Descriptive Only):**

- Simple comparisons: means, medians, distributions
- Correlation matrices: relationships between key variables
- Visual profiles: radar charts, parallel coordinates
- Key factors: distinguish successful from unsuccessful ECIs
- ‚ö†Ô∏è NO PREDICTIONS: Purely exploratory analysis

In [24]:
# Q10 Analysis Code
# Overall success rate
overall_success_rate = (df_initiatives['is_successful'].sum() / len(df_initiatives)) * 100
print(f"Overall ECI Success Rate: {overall_success_rate:.1f}%")

# Comparative profile
profile_comparison = df_initiatives.groupby('is_successful').agg({
    'signatures_collected': ['mean', 'median'],
    'collection_days': ['mean', 'median'],
    'countries_threshold_met': 'mean',
    'organizer_count': 'mean',
    'language_count': 'mean',
    'funding_total': ['mean', 'median']
}).round(1)

profile_comparison.index = ['Unsuccessful', 'Successful']

print("\nSuccessful vs Unsuccessful ECI Profile Comparison:")
print(profile_comparison)

# Key distinguishing factors - FIX HERE
numeric_cols = df_initiatives.select_dtypes(include=[np.number]).columns.tolist()
# Filter columns with enough non-null data
numeric_cols = [col for col in numeric_cols if df_initiatives[col].notna().sum() > 10]

if numeric_cols and 'is_successful' in df_initiatives.columns:
    try:
        # Compute correlation matrix
        corr_data = df_initiatives[numeric_cols + ['is_successful']].copy()
        # Drop columns that are all NaN or have no variance
        corr_data = corr_data.loc[:, corr_data.notna().sum() > 10]
        
        correlation_matrix = corr_data.corr()
        
        if 'is_successful' in correlation_matrix.columns:
            correlation = correlation_matrix['is_successful'].drop('is_successful').sort_values(ascending=False)
            print("\nTop Factors Correlated with Success:")
            print(correlation.head(10))
    except Exception as e:
        print(f"\nCorrelation analysis skipped: {e}")

# Outcome distribution in successful ECIs
if 'df_merger' in locals() and len(df_merger) > 0:
    print("\nCommission Outcome Distribution (for successful ECIs):")
    print(df_merger['final_outcome_status'].value_counts())
    
    # Implementation rate
    impl_rate = df_merger['law_implementation_date'].notna().sum() / len(df_merger) * 100
    print(f"\nLaw Implementation Rate: {impl_rate:.1f}%")

Overall ECI Success Rate: 9.1%

Successful vs Unsuccessful ECI Profile Comparison:
             signatures_collected        collection_days         \
                             mean median            mean median   
Unsuccessful                272.5  272.5           388.5  365.0   
Successful                    NaN    NaN           325.2  365.0   

             countries_threshold_met organizer_count language_count  \
                                mean            mean           mean   
Unsuccessful                     0.0             1.0           24.0   
Successful                       0.0             1.0           24.0   

             funding_total         
                      mean median  
Unsuccessful         500.0  500.0  
Successful             NaN    NaN  

Correlation analysis skipped: DataFrame.sort_values() missing 1 required positional argument: 'by'

Commission Outcome Distribution (for successful ECIs):
final_outcome_status
Law Active                        3
Reject

<a id='question-1'></a>
## <p style="padding:10px;background-color:#fff798;margin:0;color:#435672;font-family:newtimesroman;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">üíæ Export Analysis Results to CSV</p>

In [26]:
# Export all Q1-Q10 results
def export_all_results(df_initiatives, df_merger, output_dir='eda_data_output'):
    import os
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    # Q1 Results
    df_initiatives['final_outcome'].value_counts().to_csv(f'{output_dir}/Q1_outcome_distribution.csv')
    
    # Q2 Temporal
    temp_data = df_initiatives.groupby('is_successful')['collection_days'].agg([
        ('Count', 'count'),
        ('Mean_Days', 'mean'),
        ('Median_Days', 'median')
    ])
    temp_data.index = ['Unsuccessful', 'Successful']
    temp_data.to_csv(f'{output_dir}/Q2_temporal_patterns.csv')
    
    # Q3 Parliament (from merger)
    if 'parliament_hearing_date' in df_merger.columns:
        parliament_data = df_merger.groupby(df_merger['parliament_hearing_date'].notna())['final_outcome_status'].value_counts()
        parliament_data.to_csv(f'{output_dir}/Q3_parliament_engagement.csv')
    
    # Q4 Funding
    funding_data = df_initiatives.groupby('is_successful')['funding_total'].agg([
        ('Count', 'count'),
        ('Mean_EUR', 'mean'),
        ('Median_EUR', 'median'),
        ('Max_EUR', 'max')
    ])
    funding_data.index = ['Unsuccessful', 'Successful']
    funding_data.to_csv(f'{output_dir}/Q4_funding_patterns.csv')
    
    # Q5 Geographic
    if 'countries_threshold_met' in df_initiatives.columns:
        geo_data = df_initiatives.groupby('is_successful')['countries_threshold_met'].agg([
            ('Count', 'count'),
            ('Mean_Countries', 'mean'),
            ('Median_Countries', 'median')
        ])
        geo_data.index = ['Unsuccessful', 'Successful']
        geo_data.to_csv(f'{output_dir}/Q5_geographic_patterns.csv')
    
    # Q6 Organizational
    if 'organizer_count' in df_initiatives.columns:
        org_data = df_initiatives.groupby('is_successful')['organizer_count'].agg([
            ('Count', 'count'),
            ('Mean_Organizers', 'mean'),
            ('Median_Organizers', 'median')
        ])
        org_data.index = ['Unsuccessful', 'Successful']
        org_data.to_csv(f'{output_dir}/Q6_organizational_characteristics.csv')
    
    # Q7 Content
    if 'leg_target' in df_initiatives.columns:
        content_data = df_initiatives.groupby('leg_target')['is_successful'].agg([
            ('Count', 'count'),
            ('Successes', 'sum'),
            ('Success_Rate_%', lambda x: (x.sum()/len(x))*100)
        ]).round(1)
        content_data.to_csv(f'{output_dir}/Q7_content_features.csv')
    
    # Q8 Commission Engagement (from merger)
    if 'has_deadline' in df_merger.columns:
        commission_data = df_merger.groupby('has_deadline')['final_outcome_status'].value_counts()
        commission_data.to_csv(f'{output_dir}/Q8_commission_engagement.csv')
    
    # Q9 Response Mechanisms
    if 'referenced_leg_count' in df_merger.columns:
        response_data = df_merger.groupby('final_outcome_status')['referenced_leg_count'].agg([
            ('Count', 'count'),
            ('Mean_References', 'mean')
        ]).round(1)
        response_data.to_csv(f'{output_dir}/Q9_response_mechanisms.csv')
    
    # Q10 Summary
    summary_data = df_initiatives.groupby('is_successful').agg({
        'signatures_collected': 'mean',
        'collection_days': 'mean',
        'funding_total': 'mean'
    }).round(2)
    summary_data.index = ['Unsuccessful', 'Successful']
    summary_data.to_csv(f'{output_dir}/Q10_key_findings_summary.csv')
    
    print(f"‚úì All results exported to '{output_dir}/' directory")
    print(f"‚úì Files created: Q1_outcome_distribution.csv through Q10_key_findings_summary.csv")

# Run export function
export_all_results(df_initiatives, df_merger)

‚úì All results exported to 'eda_data_output/' directory
‚úì Files created: Q1_outcome_distribution.csv through Q10_key_findings_summary.csv
