# Joe Garcia
# 12/14/25
# Professor Ledon
# Data 620 - Web Analytics



# Week 4 Assignment: Network Centrality Plan (2016 Election)
## Centrality measures + predicting outcomes for nodes  
## Dataset choice: FEC OpenFEC API (campaign finance) – 2016 cycle
  


This notebook outlines a proposed approach for analyzing network centrality measures using publicly available campaign finance data from the 2016 United States federal election. The objective of this assignment is to identify a suitable network dataset, define a meaningful graph structure, and describe how node level centrality measures specifically degree centrality could be used to predict outcomes for individual nodes. The dataset is sourced from the Federal Election Commission’s OpenFEC API and is well suited for network analysis because it captures contribution relationships between political committees and candidates, while also providing categorical attributes for nodes such as party affiliation. In accordance with the assignment requirements, this notebook serves as a planning and conceptual template; no data are loaded or models are estimated at this stage.

## Data source (web/API)

### OpenFEC API (Federal Election Commission)
We will use the **OpenFEC API** to retrieve *candidates, committees,* and *itemized receipts* and then build a network from contribution relationships.

- Official docs: https://api.open.fec.gov/developers/  
- Key idea: recipients are typically **committees** (candidates receive funds through their committees).

### Why this fits the assignment
- **Network data:** contribution links (donor committee → recipient committee/candidate committee)
- **Categorical variable per node:** candidate **party** (Democrat/Republican/Other), plus optional committee type
- **2016 election constraint:** we use `two_year_transaction_period = 2016` (the 2015–2016 cycle)


## 2) Network definition

### Graph type (bipartite)
We’ll build a bipartite graph with:

- Node set A: committees (PACs, party committees, etc.)
- Node set B: candidates (or candidate committees, depending on modeling choice)

### Edges
- An edge exists when committee A contributes to candidate B (usually via candidate committee).
- Optional edge weight = total dollars contributed (sum) over the cycle.

### Node attributes
**Candidates:**
- `party` (categorical)
- `office` (e.g., President)
- `incumbent_challenge` (optional, if available)

**Committees:**
- committee type/designation (optional categorical variables)


## 3) High-level loading plan (2016 cycle)

We are not required to run this today, but here is the plan you could execute later:

1. **Set cycle:** `two_year_transaction_period = 2016`  
2. **Pull candidate list** for the 2016 cycle (filter to office = President if you want a smaller dataset).  
3. **Pull each candidate’s authorized committees** (or a candidate-to-committee mapping).
4. **Pull contributions** that flow into those candidate committees for 2016 using `/schedules/schedule_a/`.
5. **Aggregate** contributions by `(donor_committee_id, candidate_id)`:
   - `weight = sum(contribution_receipt_amount)`  
   - `edge_exists = 1 if any contributions`
6. Build the graph in **NetworkX**:
   - compute **degree centrality** for candidates
7. Compare **degree centrality by party**.


In [3]:
import os
import time
import requests
import pandas as pd
import networkx as nx

BASE_URL = "https://api.open.fec.gov/v1"
API_KEY = os.getenv("OPENFEC_API_KEY", "DEMO_KEY")  # set env var OPENFEC_API_KEY for a personal key
CYCLE = 2016

session = requests.Session()
session.params = {"api_key": API_KEY}

def get_paginated(url, params=None, max_pages=5, sleep_s=0.2):
    """Simple page-number pagination helper for endpoints that support it.
    NOTE: /schedules/schedule_a/ uses keyset pagination (different pattern)."""
    params = params or {}
    out = []
    page = 1
    while page <= max_pages:
        r = session.get(url, params={**params, "page": page, "per_page": 100})
        r.raise_for_status()
        data = r.json()
        out.extend(data.get("results", []))
        # stop if fewer than requested
        if len(data.get("results", [])) < 100:
            break
        page += 1
        time.sleep(sleep_s)
    return out


In [4]:

candidates_df = pd.DataFrame(columns=[
    "candidate_id", "name", "party", "office", "state", "district"
])
candidates_df.head()


Unnamed: 0,candidate_id,name,party,office,state,district


In [5]:

candidate_committees = {}  


## 4) Contributions data (Schedule A)

For itemized receipts (including contributions), OpenFEC provides **Schedule A**:
- Endpoint: `/schedules/schedule_a/`
- Important: this endpoint uses **keyset pagination** (not simple page numbers).
- You can filter by `two_year_transaction_period = 2016` and by committee IDs.


In [8]:

def get_schedule_a_for_committee(committee_id, cycle=CYCLE, max_records=5000, sleep_s=0.2):
    url = f"{BASE_URL}/schedules/schedule_a/"
    params = {
        "committee_id": committee_id,
        "two_year_transaction_period": cycle,
        "per_page": 100,
        "sort": "-contribution_receipt_date"
    }
    results = []
    last_indexes = None

    while len(results) < max_records:
        if last_indexes:
            params.update(last_indexes)
        r = session.get(url, params=params)
        r.raise_for_status()
        payload = r.json()
        batch = payload.get("results", [])
        if not batch:
            break
        results.extend(batch)

        # keyset pagination cursor
        pagination = payload.get("pagination", {})
        last_indexes = pagination.get("last_indexes", None)
        if not last_indexes:
            break
        time.sleep(sleep_s)

    return results



In [10]:
edges_df = pd.DataFrame(columns=[
    "donor_committee_id", "candidate_id", "amount_sum"
])

B = nx.Graph()

for _, row in candidates_df.iterrows():
    B.add_node(row["candidate_id"], node_type="candidate", party=row["party"], office=row["office"])

for _, e in edges_df.iterrows():
    donor = e["donor_committee_id"]
    cand = e["candidate_id"]
    B.add_node(donor, node_type="committee")
    B.add_edge(donor, cand, weight=float(e["amount_sum"]))  # weight optional

cand_nodes = [n for n, d in B.nodes(data=True) if d.get("node_type") == "candidate"]
degree_centrality = nx.degree_centrality(B)
cand_degree = pd.DataFrame({
    "candidate_id": cand_nodes,
    "degree_centrality": [degree_centrality[n] for n in cand_nodes],
})



candidates_df["candidate_id"] = candidates_df["candidate_id"].astype(str).str.strip()
cand_degree["candidate_id"] = cand_degree["candidate_id"].astype(str).str.strip()

candidates_df = candidates_df[candidates_df["candidate_id"].ne("nan")]
cand_degree = cand_degree[cand_degree["candidate_id"].ne("nan")]

cand_degree = cand_degree.merge(candidates_df[["candidate_id", "name", "party"]], on="candidate_id", how="left")
cand_degree.head()


Unnamed: 0,degree_centrality,candidate_id,name,party


## 5) Hypothetical prediction outcome (degree centrality × categorical group)

For the 2016 election network, a realistic node-level outcome to predict for each candidate is Win vs. Lose (binary) or vote share (continuous). Using degree centrality—defined here as the number of unique committees that contributed to a candidate during the 2016 cycle—we would hypothesize that candidates with higher degree centrality are more likely to win because they demonstrate broader fundraising connectivity and support across the campaign-finance network (more distinct relationships rather than dependence on a few sources). Because each candidate has a categorical attribute (party: Democrat/Republican/Other), we would also expect the strength of this relationship to differ by party, since party fundraising structures can be organized differently (e.g., one party may have wider, more distributed committee support while another may be more centralized). The planned (future) comparison would be to first compare degree centrality distributions for Democrats vs. Republicans, and then fit a simple model such as Win ~ degree + party + degree:party to test whether degree centrality predicts winning overall and whether that predictive effect changes across party groups.


### Outcome to predict (node-level)
A realistic node outcome for the 2016 election is:
- **Win vs. Lose** (binary), or
- **Vote share** (continuous), for candidates.

### Hypothesis using degree centrality across party groups
If we define degree centrality for a candidate as the number of unique contributing committees connected to them, then:

- Candidates with higher degree centrality may be more likely to win because they have broader fundraising connectivity.
- The relationship may differ across the categorical variable party (Dem/Rep/Other), meaning degree centrality could be more/less predictive depending on the party’s fundraising network structure.

### Planned comparison (no need to run now)
1. Compare degree centrality distributions for **Dem vs Rep** candidates.
2. Fit a simple model later:
   - `Win ~ degree + party + degree:party`
   



In summary, this notebook presents a structured plan for constructing and analyzing a campaign finance network based on the 2016 United States election cycle, with an emphasis on comparing degree centrality across candidates with different party affiliations. By defining a clear network structure, identifying relevant node-level categorical variables, and outlining a hypothetical prediction task, this approach demonstrates how centrality measures can be used to study meaningful real-world outcomes such as electoral success. Although no data are analyzed at this stage, the framework established here provides a solid foundation for future implementation, enabling empirical evaluation of how fundraising connectivity and network position relate to candidate outcomes in political networks.