## How we got the data!
We used two data sources "europarl.europa.eu" and "howtheyvote.eu/api". This jupyter notebook outlines how each CSV file was created
Note: the code displays the CSV files being saved on "~/Desktop". This was just for easier accessibility when checking the values inside the CSV file and 3 CVS Files were made. 


### mep_ref.csv
Uses European Parliament Open API to get every MEP's name, ID, country of representation, and the EU group they are a part of. Uses the GET/mep API call and interates through every row (uses mep_id since the API is a json dic, and the id is the key) and stores the "mep_id", "name", "country", and "european_group". 

In [3]:
from flask import Flask, jsonify
import requests
import pandas as pd


app = Flask(__name__)

API_URL = "https://data.europarl.europa.eu/api/v2/meps"
HEADERS = {
    "Accept": "application/json"
}

@app.route("/meps")
def getMEPs():
    try:
        resp = requests.get("https://data.europarl.europa.eu/api/v2/meps", headers=HEADERS)
        data = resp.json()

        meps_raw = data.get("data", [])
        mep_records = []

        for mep in meps_raw:
            if not isinstance(mep, dict):
                continue

            mep_id_raw = mep.get("id", "")
            mep_id = mep_id_raw.split("/")[-1]
            name = mep.get("label", "")
            european_group = mep.get("politicalGroup", {}).get("label", "") 

            if mep_id and name:
                mep_records.append({
                    "mep_id": mep_id,
                    "name": name,
                    "country": country_of_representation
                    "european_group": european_group
                })

        meps_df = pd.DataFrame(mep_records)
        meps_df.to_csv("meps_ref.csv", index=False)

        return jsonify(mep_records)

### votes_ref.csv

Uses 'howtheyvote.eu/api' to get every "vote_ID" from the past 5 years long with the date it was voted on and the name of it. Will be used to 
help the next CSV file.

In [3]:
import requests
import pandas as pd
from datetime import datetime, timedelta

base_url = "https://howtheyvote.eu/api/votes/search"
cutoff_date = datetime.now() - timedelta(days=5 * 365)

vote_list = []
page = 1
page_size = 1000  #overrides the 20 default 

while True:
    params = {
        "page": page,
        "page_size": page_size,
        "sort_by": "timestamp",
        "sort_order": "desc"
    }

    response = requests.get(base_url, params=params)
    data = response.json()

    results = data.get("results", [])
    if not results:
        break

    for vote in results:
        timestamp_str = vote.get("timestamp", "")[:10]
        vote_date = datetime.strptime(timestamp_str, "%Y-%m-%d")
        if vote_date < cutoff_date:
            break

        vote_list.append({
            "vote_id": vote["id"],
            "timestamp": timestamp_str,
            "display_title": vote.get("display_title", "")
        })

    # If we reached a vote older than 5 years, stop fetching more pages
    if vote_date < cutoff_date:
        break

    page += 1

# --- Save to CSV ---
df_votes = pd.DataFrame(vote_list)
df_votes.to_csv("~/Desktop/votes_ref.csv", index=False)
print(f"Saved {len(df_votes)} vote IDs to votes_past_5_years.csv")

🔁 Fetching page 1...
🔁 Fetching page 2...
🔁 Fetching page 3...
🔁 Fetching page 4...
🔁 Fetching page 5...
🔁 Fetching page 6...
🔁 Fetching page 7...
🔁 Fetching page 8...
🔁 Fetching page 9...
🔁 Fetching page 10...
⛔ Reached older than 5 years — stopping.
✅ Saved 1844 vote IDs to votes_past_5_years.csv


### master_votes.csv

Uses "https://howtheyvote.eu/api/votes/{vote_id}" function b

In [9]:
import requests
import pandas as pd
from io import StringIO

# Load vote IDs
df_votes = pd.read_csv("~/Desktop/votes_past_5_years.csv")
all_dfs = []

for i, vote_id in enumerate(df_votes['vote_id']):
    print(f"[{i+1}/{len(df_votes)}] Fetching vote ID: {vote_id}")

    url = f"https://howtheyvote.eu/api/votes/{vote_id}.csv"
    try:
        response = requests.get(url)
        if response.status_code == 200:
            vote_df = pd.read_csv(StringIO(response.text))
            vote_df["vote_id"] = vote_id
            all_dfs.append(vote_df)
        else:
            print(f"Failed to fetch vote {vote_id} (status code: {response.status_code})")
    except Exception as e:
        print(f"Error fetching vote {vote_id}: {e}")

    # Save a checkpoint every 100 votes
    if (i + 1) % 100 == 0:
        checkpoint_df = pd.concat(all_dfs, ignore_index=True)
        checkpoint_df.to_csv("~/Desktop/mep_votes_checkpoint.csv", index=False)
        print(f"Checkpoint saved after {i + 1} votes.")

# Final output
combined_df = pd.concat(all_dfs, ignore_index=True)
combined_df.to_csv("~/Desktop/mep_votes_combined.csv", index=False)
print(f"Saved full dataset to ~/Desktop/mep_votes_combined.csv")

📥 Downloading CSV for vote ID 176731...
📥 Downloading CSV for vote ID 176681...
📥 Downloading CSV for vote ID 176688...
📥 Downloading CSV for vote ID 176844...
📥 Downloading CSV for vote ID 176873...
📥 Downloading CSV for vote ID 176281...
📥 Downloading CSV for vote ID 176089...
📥 Downloading CSV for vote ID 175579...
📥 Downloading CSV for vote ID 175594...
📥 Downloading CSV for vote ID 175398...
📥 Downloading CSV for vote ID 175667...
📥 Downloading CSV for vote ID 175610...
📥 Downloading CSV for vote ID 175399...
📥 Downloading CSV for vote ID 176343...
📥 Downloading CSV for vote ID 176309...
📥 Downloading CSV for vote ID 175252...
📥 Downloading CSV for vote ID 175563...
📥 Downloading CSV for vote ID 175396...
📥 Downloading CSV for vote ID 175535...
📥 Downloading CSV for vote ID 175231...
✅ Saved 14380 MEP votes to ~/Desktop/mep_votes_combined.csv


In [1]:
import pandas as pd

# Load your full MEP list
df = pd.read_csv("meps_list.csv")

# Get all unique MEP IDs
mep_ids = df["mep_id"].dropna().astype(int).unique()

# Generate headshot URLs
headshots = []
for mep_id in mep_ids:
    headshots.append({
        "mep_id": mep_id,
        "photo_url": f"https://www.europarl.europa.eu/mepphoto/{mep_id}.jpg"
    })

# Save to Desktop
output_path = "/Users/trayna/Desktop/mep_headshots.csv"
pd.DataFrame(headshots).to_csv(output_path, index=False)

print(f"✅ Saved {len(headshots)} headshots to: {output_path}")

FileNotFoundError: [Errno 2] No such file or directory: 'meps_list.csv'