This is the **eighth script to run** in the workflow.  

# Build EU Cabinets Dataset from ParlGov (2000–Today)

This script merges ParlGov cabinet, election, and party data to create a dataset of EU governments (excluding the UK) from 2000 onward.  

**Steps:**  
1. Load election, cabinet, and party files.  
2. Merge datasets to combine cabinet, election, and party info.  
3. Keep cabinets from 2000+ in EU member states.  
4. Select key variables (party names, ideology, seats, vote share).  
5. Save as `final_datagov.csv`.  

In [None]:
import pandas as pd

# === Step 1: Load input datasets ===
# Source: ParlGov 2024 (elections, cabinets, parties).
# These files will be merged to build a dataset of EU cabinets.
view_election = pd.read_csv('insert/your/path/ParlGov Data/2024/view_election.csv')
view_cabinet = pd.read_csv('insert/your/path/Datas/ParlGov Data/2024/view_cabinet.csv')
view_party = pd.read_csv('insert/your/path/Datas/ParlGov Data/2024/view_party.csv')

# === Step 2: Merge cabinets with elections ===
# Match on election_id and party_id so each cabinet entry 
# includes information about the corresponding election.
cabinet_with_election = pd.merge(
    view_cabinet,
    view_election,
    how="left",
    on=["election_id", "party_id"],
    suffixes=("", "_election")
)

# === Step 3: Merge with party-level information ===
# Add party metadata (names, ideology, etc.) using party_id and country_id.
full_merged = pd.merge(
    cabinet_with_election,
    view_party,
    how="left",
    on=["party_id", "country_id"],
    suffixes=("", "_party")
)

# === Step 4: Filter by time (2000–today) ===
# Convert start_date to datetime and keep only cabinets starting in 2000 or later.
full_merged["start_date"] = pd.to_datetime(full_merged["start_date"], errors="coerce")
filtered = full_merged[full_merged["start_date"].dt.year >= 2000]

# === Step 5: Filter by geography (EU only) ===
# Keep only EU member states; the UK is excluded.
eu_countries = [
    "AUT", "BEL", "BGR", "HRV", "CYP", "CZE", "DNK", "EST", "FIN", "FRA",
    "DEU", "GRC", "HUN", "IRL", "ITA", "LVA", "LTU", "LUX", "MLT", "NLD",
    "POL", "PRT", "ROU", "SVK", "SVN", "ESP", "SWE"
]
filtered = filtered[filtered["country_name_short"].isin(eu_countries)]

# === Step 6: Select relevant variables ===
# Keep identifiers, party/cabinet names, PM flag, election info,
# and ideological indicators (left_right, state_market, etc.).
final = filtered[[
    "country_name_short", "country_name",
    "cabinet_name", "start_date",
    "party_name_short", "party_name_english", "party_name", "party_name_ascii",
    "cabinet_party", "prime_minister",
    "vote_share", "seats", "election_id", "cabinet_id",
    "left_right", "state_market", "liberty_authority", "eu_anti_pro", "cmp"
]]

# === Step 7: Save final dataset ===
# Export the cleaned dataset to CSV for further analysis.
final.to_csv('/Users/nicolomarchini/Documents/Università/Magistrale/Tesi Magistrale/Datas/ParlGov Data/2024/final_datagov.csv', index=False)

This is the **ninth script to run** in the workflow.  

# Extract EU Parties’ Climate and Ideology Data from MPDS 2025a

This script processes the Manifesto Project Dataset (2025a) to build an EU-only dataset on party identity, climate salience, and ideological positions.  

**Steps:**  
1. Load the raw MPDS 2025a file.  
2. Keep/rename variables on party metadata, climate issues, and ideology.  
3. Filter to EU member states.  
4. Save as `manifesto_final.csv`.  

In [None]:
import pandas as pd

# === Step 1: Load input dataset ===
# Source: Manifesto Project Dataset (MPDS) 2025a.
# Contains coded party manifestos across countries and elections.
df = pd.read_csv('inser/your/path/Manifesto Data/MPDataset_MPDS2025a.csv')

# === Step 2: Select and rename relevant variables ===
# Keep only variables needed for Green Deal analysis:
# - Party metadata (country, ID, family, election date)
# - Environmental/climate salience (proportion of manifesto text)
# - Ideological indicators (left-right, economy, welfare, peace)
columns_to_keep = {
    # Identification and metadata
    "countryname": "country",               
    "party": "party_id",                    
    "partyname": "party_name",              
    "parfam": "party_family",               
    "edate": "election_date",               

    # Environmental & climate salience variables
    "per501": "env_protection",             
    "per410": "nuclear_energy",             
    "per416": "energy_climate",             
    "per416_2": "renewables",               
    "per502": "sustainable_dev",            

    # Summary ideological indicators
    "rile": "left_right",                   
    "planeco": "planned_economy",           
    "markeco": "market_economy",            
    "welfare": "welfare_state",             
    "intpeace": "international_peace"       
}

df_reduced = df[list(columns_to_keep.keys())].rename(columns=columns_to_keep)

# === Step 3: Filter by geography (EU only) ===
# Keep only current EU member states (27 countries, excluding UK).
eu_countries = [
    "Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czech Republic", "Denmark",
    "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland",
    "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands",
    "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden"
]
filtered = df_reduced[df_reduced["country"].isin(eu_countries)]

# === Step 4: Save final dataset ===
# Export the cleaned EU-only dataset to CSV for further analysis.
filtered.to_csv('/Users/nicolomarchini/Documents/Università/Magistrale/Tesi Magistrale/Datas/Manifesto Data/manifesto_final.csv', index=False)

  df = pd.read_csv('/Users/nicolomarchini/Documents/Università/Magistrale/Tesi Magistrale/Datas/Manifesto Data/MPDataset_MPDS2025a.csv')


This is the **tenth script to run** in the workflow.  

# Merge EU Cabinets with Closest Manifesto Data

This script links EU government cabinets (ParlGov) with the closest party manifesto (MPDS 2025a) in time, creating a combined dataset for analysis of party positions and government participation.  

**Steps:**  
1. Load cleaned cabinet (ParlGov) and manifesto (MPDS) datasets.  
2. Convert cabinet start dates and manifesto election dates to datetime.  
3. Rename manifesto election date for clarity (`cmp_election_date`).  
4. For each cabinet-party, find the closest manifesto in time.  
5. Build a merged dataset combining cabinet and manifesto variables.  
6. Save the full merged dataset (`merged_gov_manifesto.csv`).  
7. Filter and save only parties that were cabinet members (`merged_gov_manifesto_cabinet_only.csv`).  

In [None]:
import pandas as pd
from datetime import timedelta

# === Step 1: Load input datasets ===
# Load cleaned government (ParlGov) and manifesto (MPDS) datasets.
df_datagov = pd.read_csv('insert/your/path/ParlGov Data/2024/final_datagov.csv')
df_manifesto = pd.read_csv('insert/your/path/Manifesto Data/manifesto_final.csv')

# === Step 2: Ensure dates are in datetime format ===
# Convert start_date (cabinets) and election_date (manifestos) to proper datetime objects.
df_datagov["start_date"] = pd.to_datetime(df_datagov["start_date"], errors="coerce")
df_manifesto["election_date"] = pd.to_datetime(df_manifesto["election_date"], errors="coerce")

# === Step 3: Rename columns for clarity ===
# Avoid confusion during merging by renaming manifesto election_date.
df_manifesto = df_manifesto.rename(columns={"election_date": "cmp_election_date"})

# === Step 4: Match closest manifesto to each cabinet start date ===
# For each cabinet-party combination:
# - Find all manifestos of the same party
# - Compute time difference between cabinet start and manifesto election
# - Select the manifesto closest in time
merged_rows = []

for _, row in df_datagov.iterrows():
    party_id = row["cmp"]  
    gov_date = row["start_date"]
    
    # Filter manifesto entries for the same party
    party_manifestos = df_manifesto[df_manifesto["party_id"] == party_id].copy()
    
    if not party_manifestos.empty:
        # Compute absolute time difference
        party_manifestos["date_diff"] = (party_manifestos["cmp_election_date"] - gov_date).abs()
        
        # Select closest manifesto
        closest = party_manifestos.loc[party_manifestos["date_diff"].idxmin()]
        
        # Merge government and manifesto info
        merged_row = row.to_dict()
        for key, val in closest.items():
            if key not in merged_row:
                merged_row[f"cmp_{key}" if not key.startswith("cmp_") else key] = val
        merged_rows.append(merged_row)
    else:
        # If no manifesto is found, keep only government info
        merged_rows.append(row.to_dict())

# === Step 5: Convert results into DataFrame ===
df_merged_closest = pd.DataFrame(merged_rows)

# === Step 6: Save full merged dataset ===
df_merged_closest.to_csv(
    'insert/your/path/merged_gov_manifesto.csv', 
    index=False
)

# === Step 7: Keep only actual cabinet parties ===
# Filter to parties that were cabinet members (cabinet_party == 1).
df_merged_closest_cabinet = df_merged_closest[df_merged_closest["cabinet_party"] == 1]

# Save dataset of cabinet-only parties.
df_merged_closest_cabinet.to_csv(
    'insert/your/path/merged_gov_manifesto_cabinet_only.csv', 
    index=False
)


This is the **eleventh script to run** in the workflow.  

# Append New Governments to Merged Cabinet–Manifesto Dataset

This script updates the merged cabinet–manifesto dataset by appending additional government records and reordering the data.  

**Steps:**  
1. Load existing merged dataset and new governments file.  
2. Concatenate them into a single DataFrame.  
3. Convert `start_date` to datetime.  
4. Sort by country and start date.  
5. Reset index for a clean sequence.  
6. Save final dataset as `final_merged_gov_manifesto.csv`.  

In [None]:
import pandas as pd

# === Step 1: Load input datasets ===
# - merged_gov_manifesto_cabinet_only.csv: cabinets + matched manifestos
# - new_govs.csv: additional government entries to append
df_govs = pd.read_csv('insert/your/path/merged_gov_manifesto_cabinet_only.csv')
df_new_govs = pd.read_csv('insert/your/path/new_govs.csv')

# === Step 2: Merge vertically (append) ===
# Concatenate the two datasets into a single DataFrame.
merged_df = pd.concat([df_govs, df_new_govs], ignore_index=True)

# === Step 3: Convert dates ===
# Ensure start_date is properly parsed as datetime.
merged_df['start_date'] = pd.to_datetime(merged_df['start_date'], errors='coerce')

# === Step 4: Sort observations ===
# Order by country (alphabetical) and start_date (chronological).
merged_df = merged_df.sort_values(by=['country_name_short', 'start_date'], ascending=[True, True])

# === Step 5: Reset index ===
# Clean up index after sorting.
merged_df = merged_df.reset_index(drop=True)

# === Step 6: Save final dataset ===
# Export combined dataset with appended governments.
merged_df.to_csv('insert/your/path/final_merged_gov_manifesto.csv', index=False)

This is the **twelvth script to run** in the workflow.  

# Aggregate Cabinets with Weighted Party Positions

This script aggregates government-level characteristics by weighting party attributes (from manifestos) according to electoral strength.  

**Steps:**  
1. Load merged cabinet–manifesto dataset.  
2. Keep and rename relevant identifiers, cabinet info, and ideological/policy variables.  
3. Define weights (vote share or seats) and list dimensions to aggregate.  
4. Aggregate each cabinet:  
   - Coalition size, technical cabinet flag  
   - Total seats and vote share  
   - Weighted averages of policy/ideological dimensions  
5. Merge back full country names.  
6. Sort by country and start date.  
7. Save final aggregated dataset (`aggregated_governments_with_weighted_avg.csv`).  

In [None]:
import pandas as pd
import numpy as np

# === Step 1: Load input dataset ===
# Source: merged cabinet–manifesto dataset (final_merged_gov_manifesto.csv).
df_gov = pd.read_csv('insert/your/path/final_merged_gov_manifesto.csv')

# === Step 2: Select and rename relevant variables ===
# Keep identifiers, cabinet/party details, and policy/ideological dimensions.
columns_to_keep = {
    'country_name_short': 'Country Code', 
    'country_name': 'Country',
    'cabinet_name': 'Cabinet',
    'start_date': 'Start Date',
    'party_name_short': 'Party Code',
    'party_name_ascii': 'Party Name',
    'vote_share': 'Vote Share',
    'seats': 'Seats',
    'left_right': 'Left-Right',
    'state_market': 'State-Market',
    'liberty_authority': 'Liberty-Authority',
    'eu_anti_pro': 'EU Anti-Pro',
    'cmp_env_protection': 'Environment Protection',
    'cmp_nuclear_energy': 'Nuclear Energy',
    'cmp_energy_climate': 'Climate Energy',
    'cmp_renewables': 'Renewables',
    'cmp_sustainable_dev': 'Sustainable Development',
    'cmp_welfare_state': 'Welfare State',
    'cmp_international_peace': 'International Peace'
}
df_govs = df_gov[list(columns_to_keep.keys())].rename(columns=columns_to_keep)

# === Step 3: Define weighting and policy dimensions ===
# Aggregation is weighted by vote share (can be changed to seats if needed).
weight_column = 'Vote Share'
columns_to_aggregate = [
    'Left-Right', 'State-Market', 'Liberty-Authority', 'EU Anti-Pro',
    'Environment Protection', 'Nuclear Energy', 'Climate Energy',
    'Renewables', 'Sustainable Development', 'Welfare State', 'International Peace'
]

# === Step 4: Define aggregation function per cabinet ===
# For each cabinet:
# - Store identifiers
# - Compute coalition size and technical cabinet flag
# - Compute total seats and vote share
# - Compute weighted average of ideological/policy dimensions
def aggregate_government(group):
    result = {}
    result['Country Code'] = group['Country Code'].iloc[0]
    result['Cabinet'] = group['Cabinet'].iloc[0]
    result['Start Date'] = group['Start Date'].iloc[0]
    result['Coalition Size'] = group['Party Code'].nunique()
    result['Technical Cabinet'] = not group[weight_column].notna().any()
    result['Total Seats'] = group['Seats'].sum(min_count=1)
    result['Total Vote Share'] = group['Vote Share'].sum(min_count=1)

    for col in columns_to_aggregate:
        values = group[col].dropna().values
        weights = group[weight_column].dropna().values
        usable_n = min(len(values), len(weights))

        if usable_n == 0:
            result[col] = np.nan
        elif usable_n == 1:
            result[col] = values[0]
        else:
            result[col] = np.average(values[:usable_n], weights=weights[:usable_n])

    return pd.Series(result)

# === Step 5: Apply aggregation per government ===
aggregated_df = df_govs.groupby(['Country Code', 'Cabinet', 'Start Date']).apply(aggregate_government).reset_index(drop=True)

# === Step 6: Merge country names ===
# Add back the full country name next to country codes.
country_map = df_govs[['Country Code', 'Country']].drop_duplicates()
aggregated_df = aggregated_df.merge(country_map, on='Country Code', how='left')
cols = aggregated_df.columns.tolist()
cols.insert(cols.index('Country Code') + 1, cols.pop(cols.index('Country')))
aggregated_df = aggregated_df[cols]

# === Step 7: Clean and sort dataset ===
aggregated_df['Start Date'] = pd.to_datetime(aggregated_df['Start Date'], errors='coerce')
aggregated_df = aggregated_df.sort_values(by=['Country Code', 'Start Date'], ascending=[True, True]).reset_index(drop=True)

# === Step 8: Save final dataset ===
aggregated_df.to_csv('insert/your/path/aggregated_governments_with_weighted_avg.csv', index=False)