# Texas General Election Data Extractor (2018-2024) - v2

This notebook extracts Texas election and demographic data from **reliable alternative sources**:

## Primary Data Sources
1. **OpenElections (GitHub)** - Pre-processed county/precinct-level CSV files
2. **Texas Capitol Data Portal** - Official VTD-level election data
3. **MIT Election Data Lab (GitHub)** - County-level returns for federal races
4. **Harvard Dataverse** - Research-ready election datasets
5. **U.S. Census CPS** - Voter demographic data

## Coverage
- Election results: 2018, 2020, 2022, 2024 general elections
- Voter demographics by age, race, sex, education

## Setup and Dependencies

In [1]:
# Install required packages
!pip install pandas requests beautifulsoup4 lxml openpyxl xlrd --quiet

In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import os
import io
import re
import zipfile
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Create output directories
OUTPUT_DIR = 'texas_election_data'
RAW_DIR = os.path.join(OUTPUT_DIR, 'raw')
CLEAN_DIR = os.path.join(OUTPUT_DIR, 'clean')
os.makedirs(RAW_DIR, exist_ok=True)
os.makedirs(CLEAN_DIR, exist_ok=True)

# Request headers
HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
}

print(f"Output directory: {os.path.abspath(OUTPUT_DIR)}")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

  from pandas.core import (


Output directory: C:\Users\spdal\Documents\Election Analysis\texas_election_data
Timestamp: 2026-01-14 11:44:49


---
## 1. OpenElections - GitHub Raw CSV Files

OpenElections provides pre-processed, standardized election results at county and precinct levels.

**Repository**: https://github.com/openelections/openelections-data-tx

**Coverage**: 2000-2018 general elections complete, 2020+ in progress

In [3]:
# OpenElections GitHub raw file URLs for Texas
# These are direct links to CSV files in the repository

GITHUB_RAW_BASE = "https://raw.githubusercontent.com/openelections/openelections-data-tx/master"

# Known statewide result files (county-level aggregates)
OPENELECTIONS_FILES = {
    # 2018 General Election
    '2018_general': f"{GITHUB_RAW_BASE}/2018/20181106__tx__general.csv",
    '2018_general_precinct': f"{GITHUB_RAW_BASE}/2018/20181106__tx__general__precinct.csv",
    
    # 2016 General Election  
    '2016_general': f"{GITHUB_RAW_BASE}/2016/20161108__tx__general.csv",
    '2016_general_precinct': f"{GITHUB_RAW_BASE}/2016/20161108__tx__general__precinct.csv",
    
    # 2020 General Election (may be in progress)
    '2020_general': f"{GITHUB_RAW_BASE}/2020/20201103__tx__general.csv",
    
    # 2014 General Election
    '2014_general': f"{GITHUB_RAW_BASE}/2014/20141104__tx__general.csv",
}

def download_openelections_file(name, url):
    """Download a CSV file from OpenElections GitHub."""
    try:
        response = requests.get(url, headers=HEADERS, timeout=60)
        response.raise_for_status()
        
        # Save raw file
        filename = f"openelections_{name}.csv"
        filepath = os.path.join(RAW_DIR, filename)
        with open(filepath, 'wb') as f:
            f.write(response.content)
        
        # Parse as DataFrame
        df = pd.read_csv(io.BytesIO(response.content), low_memory=False)
        print(f"‚úì {name}: {len(df):,} rows, {len(df.columns)} columns")
        return df, filepath
        
    except requests.exceptions.HTTPError as e:
        if e.response.status_code == 404:
            print(f"‚úó {name}: File not found (may not exist yet)")
        else:
            print(f"‚úó {name}: HTTP error {e.response.status_code}")
        return None, None
    except Exception as e:
        print(f"‚úó {name}: {e}")
        return None, None

# Download OpenElections data
print("Downloading OpenElections data...")
print("=" * 50)
openelections_data = {}

for name, url in OPENELECTIONS_FILES.items():
    df, path = download_openelections_file(name, url)
    if df is not None:
        openelections_data[name] = df

Downloading OpenElections data...
‚úó 2018_general: File not found (may not exist yet)
‚úì 2018_general_precinct: 463,336 rows, 13 columns
‚úó 2016_general: File not found (may not exist yet)
‚úì 2016_general_precinct: 218,644 rows, 9 columns
‚úó 2020_general: File not found (may not exist yet)
‚úì 2014_general: 1,402 rows, 7 columns


In [4]:
# Preview OpenElections data structure
if openelections_data:
    sample_key = list(openelections_data.keys())[0]
    df = openelections_data[sample_key]
    print(f"Sample: {sample_key}")
    print(f"\nColumns: {list(df.columns)}")
    print(f"\nSample rows:")
    display(df.head(10))

Sample: 2018_general_precinct

Columns: ['county', 'precinct', 'office', 'district', 'candidate', 'party', 'votes', 'absentee', 'election_day', 'early_voting', 'mail', 'provisional', 'limited']

Sample rows:


Unnamed: 0,county,precinct,office,district,candidate,party,votes,absentee,election_day,early_voting,mail,provisional,limited
Childress,101,Attorney General,,Justin Nelson,DEM,45.0,,23.0,22,,,,
Childress,101,Attorney General,,Ken Paxton,REP,335.0,,141.0,194,,,,
Childress,101,Attorney General,,Michael Ray Harris,LIB,6.0,,3.0,3,,,,
Childress,201,Attorney General,,Justin Nelson,DEM,49.0,,19.0,30,,,,
Childress,201,Attorney General,,Ken Paxton,REP,405.0,,174.0,231,,,,
Childress,201,Attorney General,,Michael Ray Harris,LIB,8.0,,2.0,6,,,,
Childress,301,Attorney General,,Justin Nelson,DEM,89.0,,46.0,43,,,,
Childress,301,Attorney General,,Ken Paxton,REP,313.0,,133.0,180,,,,
Childress,301,Attorney General,,Michael Ray Harris,LIB,8.0,,1.0,7,,,,
Childress,401,Attorney General,,Justin Nelson,DEM,62.0,,26.0,36,,,,


---
## 2. Texas Capitol Data Portal - Official VTD Data

The Texas Legislative Council provides official election data at the Voter Tabulation District (VTD) level.

**Portal**: https://data.capitol.texas.gov/topic/elections

**Coverage**: 2018-2024 elections with VTD-level granularity

In [5]:
# Texas Capitol Data Portal - CKAN API
CAPITOL_BASE = "https://data.capitol.texas.gov"
CAPITOL_API = f"{CAPITOL_BASE}/api/3/action"

def get_capitol_datasets(search_term="general"):
    """Search for election datasets on Texas Capitol Data Portal."""
    try:
        url = f"{CAPITOL_API}/package_search"
        params = {
            'q': search_term,
            'fq': 'groups:elections',
            'rows': 50
        }
        response = requests.get(url, params=params, headers=HEADERS, timeout=30)
        response.raise_for_status()
        data = response.json()
        
        if data.get('success'):
            return data['result']['results']
        return []
    except Exception as e:
        print(f"Error searching Capitol Data Portal: {e}")
        return []

def get_dataset_resources(dataset_name):
    """Get downloadable resources for a specific dataset."""
    try:
        url = f"{CAPITOL_API}/package_show"
        params = {'id': dataset_name}
        response = requests.get(url, params=params, headers=HEADERS, timeout=30)
        response.raise_for_status()
        data = response.json()
        
        if data.get('success'):
            return data['result'].get('resources', [])
        return []
    except Exception as e:
        print(f"Error getting dataset resources: {e}")
        return []

# Search for general election datasets
print("Searching Texas Capitol Data Portal...")
print("=" * 50)

datasets = get_capitol_datasets("general")
print(f"Found {len(datasets)} election datasets\n")

# Filter for relevant general elections
target_elections = ['2024_general', '2022_general', '2020_general', '2018_general']
capitol_datasets = {}

for ds in datasets:
    name = ds.get('name', '')
    title = ds.get('title', '')
    
    # Check if it's a general election we want
    for target in target_elections:
        year = target.split('_')[0]
        if year in name and 'general' in name.lower():
            capitol_datasets[name] = {
                'title': title,
                'name': name,
                'resources': ds.get('resources', [])
            }
            print(f"‚úì {name}: {title}")
            break

Searching Texas Capitol Data Portal...
Found 50 election datasets

‚úì 2024_general: 2024 General
‚úì 2022_general: 2022 General
‚úì 2020_general: 2020 General
‚úì 2020_city_general_2: 2020 City General 2


In [6]:
# Download comprehensive election datasets (ZIP files with all elections)
def download_capitol_resource(resource_url, filename):
    """Download a resource from Capitol Data Portal."""
    try:
        response = requests.get(resource_url, headers=HEADERS, timeout=120, stream=True)
        response.raise_for_status()
        
        filepath = os.path.join(RAW_DIR, filename)
        with open(filepath, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        
        print(f"‚úì Downloaded: {filename}")
        return filepath
    except Exception as e:
        print(f"‚úó Failed to download {filename}: {e}")
        return None

# Try to get the comprehensive datasets
comprehensive_resources = get_dataset_resources('comprehensive-election-datasets-compressed-format')

print("\nAvailable comprehensive datasets:")
for res in comprehensive_resources:
    name = res.get('name', 'Unknown')
    url = res.get('url', '')
    format_type = res.get('format', '')
    print(f"  - {name} ({format_type})")
    print(f"    URL: {url[:80]}..." if len(url) > 80 else f"    URL: {url}")


Available comprehensive datasets:
  - 2024 General VTDs Election Data (CSV)
    URL: https://data.capitol.texas.gov/dataset/35b16aee-0bb0-4866-b1ec-859f1f044241/reso...
  - 2024 Primary VTDs Election Data.zip (CSV)
    URL: https://data.capitol.texas.gov/dataset/35b16aee-0bb0-4866-b1ec-859f1f044241/reso...
  - 2022 General VTDs Election Data.zip (CSV)
    URL: https://data.capitol.texas.gov/dataset/35b16aee-0bb0-4866-b1ec-859f1f044241/reso...
  - 2022 Primary VTDs Election Data.zip (CSV)
    URL: https://data.capitol.texas.gov/dataset/35b16aee-0bb0-4866-b1ec-859f1f044241/reso...
  - 2020 General VTDs Election Data (2020).zip (CSV)
    URL: https://data.capitol.texas.gov/dataset/35b16aee-0bb0-4866-b1ec-859f1f044241/reso...


---
## 3. MIT Election Data Lab - GitHub Repository

MIT MEDSL maintains cleaned, standardized election data on GitHub.

**Repositories**:
- https://github.com/MEDSL/2024-elections-official
- https://github.com/MEDSL/2022-elections-official
- https://github.com/MEDSL/2020-elections-official
- https://github.com/MEDSL/2018-elections-official

In [7]:
# MIT Election Data Lab GitHub raw files
MIT_GITHUB_BASE = "https://raw.githubusercontent.com/MEDSL"

# Known data file locations (these may change as repos are updated)
MIT_DATA_FILES = {
    # County-level presidential returns
    '2020_president_county': f"{MIT_GITHUB_BASE}/2020-elections-official/main/PRESIDENT/president_county.csv",
    '2020_senate_county': f"{MIT_GITHUB_BASE}/2020-elections-official/main/SENATE/senate_county.csv",
    
    # 2024 data (structure may vary)
    '2024_president_county': f"{MIT_GITHUB_BASE}/2024-elections-official/main/PRESIDENT/president_county.csv",
    
    # 2022 data
    '2022_senate_county': f"{MIT_GITHUB_BASE}/2022-elections-official/main/SENATE/senate_county.csv",
    '2022_governor_county': f"{MIT_GITHUB_BASE}/2022-elections-official/main/GOVERNOR/governor_county.csv",
}

def download_mit_file(name, url):
    """Download a CSV file from MIT MEDSL GitHub."""
    try:
        response = requests.get(url, headers=HEADERS, timeout=60)
        response.raise_for_status()
        
        # Save raw file
        filename = f"mit_{name}.csv"
        filepath = os.path.join(RAW_DIR, filename)
        with open(filepath, 'wb') as f:
            f.write(response.content)
        
        # Parse and filter to Texas
        df = pd.read_csv(io.BytesIO(response.content), low_memory=False)
        
        # Filter to Texas
        if 'state' in df.columns:
            df_tx = df[df['state'].str.upper() == 'TEXAS'].copy()
        elif 'state_po' in df.columns:
            df_tx = df[df['state_po'] == 'TX'].copy()
        else:
            df_tx = df  # Return all if no state column
        
        print(f"‚úì {name}: {len(df_tx):,} Texas rows (from {len(df):,} total)")
        return df_tx, filepath
        
    except requests.exceptions.HTTPError as e:
        if e.response.status_code == 404:
            print(f"‚úó {name}: File not found (repo structure may differ)")
        else:
            print(f"‚úó {name}: HTTP error {e.response.status_code}")
        return None, None
    except Exception as e:
        print(f"‚úó {name}: {e}")
        return None, None

# Download MIT data
print("Downloading MIT Election Data Lab files...")
print("=" * 50)
mit_data = {}

for name, url in MIT_DATA_FILES.items():
    df, path = download_mit_file(name, url)
    if df is not None:
        mit_data[name] = df

Downloading MIT Election Data Lab files...
‚úó 2020_president_county: File not found (repo structure may differ)
‚úó 2020_senate_county: File not found (repo structure may differ)
‚úó 2024_president_county: File not found (repo structure may differ)
‚úó 2022_senate_county: File not found (repo structure may differ)
‚úó 2022_governor_county: File not found (repo structure may differ)


---
## 4. Harvard Dataverse - Direct Downloads

Harvard Dataverse hosts research-ready election datasets with DOI identifiers.

In [8]:
# Harvard Dataverse API endpoints for election data
# These use persistent DOI identifiers

DATAVERSE_FILES = {
    # County Presidential Returns 2000-2020
    'president_county_2000_2020': 'https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/VOQCHQ/HEIJCQ',
    
    # U.S. Senate County Returns
    'senate_county': 'https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/PEJ5QU/ESFZKF',
    
    # U.S. House District Returns
    'house_district': 'https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/IG0UN2/ZDGAJZ',
}

def download_dataverse_file(name, url):
    """Download a file from Harvard Dataverse."""
    try:
        response = requests.get(url, headers=HEADERS, timeout=120, allow_redirects=True)
        response.raise_for_status()
        
        # Determine file type from content
        content_type = response.headers.get('Content-Type', '')
        if 'tab-separated' in content_type or 'tsv' in content_type:
            ext = 'tsv'
            sep = '\t'
        else:
            ext = 'csv'
            sep = ','
        
        # Save raw file
        filename = f"dataverse_{name}.{ext}"
        filepath = os.path.join(RAW_DIR, filename)
        with open(filepath, 'wb') as f:
            f.write(response.content)
        
        # Parse as DataFrame
        try:
            df = pd.read_csv(io.BytesIO(response.content), sep=sep, low_memory=False)
        except:
            # Try tab-separated if comma fails
            df = pd.read_csv(io.BytesIO(response.content), sep='\t', low_memory=False)
        
        # Filter to Texas
        if 'state' in df.columns:
            df_tx = df[df['state'].str.upper() == 'TEXAS'].copy()
        elif 'state_po' in df.columns:
            df_tx = df[df['state_po'] == 'TX'].copy()
        else:
            df_tx = df
        
        print(f"‚úì {name}: {len(df_tx):,} Texas rows")
        return df_tx, filepath
        
    except Exception as e:
        print(f"‚úó {name}: {e}")
        return None, None

# Download Dataverse files
print("Downloading Harvard Dataverse files...")
print("=" * 50)
dataverse_data = {}

for name, url in DATAVERSE_FILES.items():
    df, path = download_dataverse_file(name, url)
    if df is not None:
        dataverse_data[name] = df

Downloading Harvard Dataverse files...
‚úì president_county_2000_2020: 4,064 Texas rows
‚úó senate_county: 404 Client Error: Not Found for url: https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/PEJ5QU/ESFZKF
‚úó house_district: 404 Client Error: Not Found for url: https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/IG0UN2/ZDGAJZ


---
## 5. Census CPS Voting Supplement - Demographics

U.S. Census Bureau Current Population Survey provides voter demographics by state.

In [9]:
# Census CPS Voting Supplement Data URLs
# Table 4a = Race/Sex, Table 4b = Age

CENSUS_URLS = {
    # 2024 data (November 2024 election)
    '2024_race_sex': 'https://www2.census.gov/programs-surveys/cps/tables/p20/590/table04a.xlsx',
    '2024_age': 'https://www2.census.gov/programs-surveys/cps/tables/p20/590/table04b.xlsx',
    
    # 2022 data (November 2022 election)
    '2022_race_sex': 'https://www2.census.gov/programs-surveys/cps/tables/p20/586/table04a.xlsx',
    '2022_age': 'https://www2.census.gov/programs-surveys/cps/tables/p20/586/table04b.xlsx',
    
    # 2020 data (November 2020 election)
    '2020_race_sex': 'https://www2.census.gov/programs-surveys/cps/tables/p20/585/table04a.xlsx',
    '2020_age': 'https://www2.census.gov/programs-surveys/cps/tables/p20/585/table04b.xlsx',
    
    # 2018 data (November 2018 election)
    '2018_race_sex': 'https://www2.census.gov/programs-surveys/cps/tables/p20/583/table04a.xlsx',
    '2018_age': 'https://www2.census.gov/programs-surveys/cps/tables/p20/583/table04b.xlsx',
}

def download_census_file(name, url):
    """Download Census CPS voting supplement Excel file."""
    try:
        response = requests.get(url, headers=HEADERS, timeout=60)
        response.raise_for_status()
        
        # Save raw file
        filename = f"census_{name}.xlsx"
        filepath = os.path.join(RAW_DIR, filename)
        with open(filepath, 'wb') as f:
            f.write(response.content)
        
        # Try to parse (Census tables have complex headers)
        try:
            df = pd.read_excel(io.BytesIO(response.content), header=[3, 4])
        except:
            df = pd.read_excel(io.BytesIO(response.content))
        
        print(f"‚úì {name}: {len(df)} rows")
        return df, filepath
        
    except Exception as e:
        print(f"‚úó {name}: {e}")
        return None, None

# Download Census data
print("Downloading Census CPS Voting Supplement...")
print("=" * 50)
census_data = {}

for name, url in CENSUS_URLS.items():
    df, path = download_census_file(name, url)
    if df is not None:
        census_data[name] = df

Downloading Census CPS Voting Supplement...
‚úó 2024_race_sex: HTTPSConnectionPool(host='www2.census.gov', port=443): Read timed out. (read timeout=60)
‚úó 2024_age: HTTPSConnectionPool(host='www2.census.gov', port=443): Read timed out. (read timeout=60)
‚úó 2022_race_sex: HTTPSConnectionPool(host='www2.census.gov', port=443): Read timed out. (read timeout=60)
‚úó 2022_age: HTTPSConnectionPool(host='www2.census.gov', port=443): Read timed out. (read timeout=60)
‚úó 2020_race_sex: Pandas requires version '3.1.0' or newer of 'openpyxl' (version '3.0.10' currently installed).
‚úó 2020_age: Pandas requires version '3.1.0' or newer of 'openpyxl' (version '3.0.10' currently installed).
‚úó 2018_race_sex: Pandas requires version '3.1.0' or newer of 'openpyxl' (version '3.0.10' currently installed).
‚úó 2018_age: HTTPSConnectionPool(host='www2.census.gov', port=443): Read timed out. (read timeout=60)


In [10]:
# Extract Texas rows from Census data
def extract_texas_census(df, name):
    """Extract Texas rows from Census CPS data."""
    for col in df.columns:
        col_str = str(col).lower()
        col_values = df[col].astype(str).str.lower()
        if col_values.str.contains('texas').any():
            tx_mask = col_values.str.contains('texas')
            return df[tx_mask].copy()
    return df  # Return full if Texas not found

texas_census = {}
print("\nExtracting Texas from Census data:")
for name, df in census_data.items():
    tx_df = extract_texas_census(df, name)
    texas_census[name] = tx_df
    print(f"  {name}: {len(tx_df)} rows")


Extracting Texas from Census data:


---
## 6. Export Clean Data for Modeling

In [11]:
def safe_export(df, name, directory):
    """Export DataFrame to CSV with safe naming."""
    safe_name = re.sub(r'[^\w\s-]', '', name).strip().replace(' ', '_')
    filepath = os.path.join(directory, f"{safe_name}.csv")
    df.to_csv(filepath, index=False)
    return filepath

exported = []

print("Exporting clean CSVs...")
print("=" * 50)

# Export OpenElections data
for name, df in openelections_data.items():
    path = safe_export(df, f"openelections_{name}", CLEAN_DIR)
    exported.append(path)
    print(f"‚úì {os.path.basename(path)}")

# Export MIT data
for name, df in mit_data.items():
    path = safe_export(df, f"mit_{name}_texas", CLEAN_DIR)
    exported.append(path)
    print(f"‚úì {os.path.basename(path)}")

# Export Dataverse data
for name, df in dataverse_data.items():
    path = safe_export(df, f"dataverse_{name}_texas", CLEAN_DIR)
    exported.append(path)
    print(f"‚úì {os.path.basename(path)}")

# Export Census data
for name, df in texas_census.items():
    path = safe_export(df, f"census_{name}_texas", CLEAN_DIR)
    exported.append(path)
    print(f"‚úì {os.path.basename(path)}")

print(f"\n‚úì Exported {len(exported)} files to {CLEAN_DIR}")

Exporting clean CSVs...
‚úì openelections_2018_general_precinct.csv
‚úì openelections_2016_general_precinct.csv
‚úì openelections_2014_general.csv
‚úì dataverse_president_county_2000_2020_texas.csv

‚úì Exported 4 files to texas_election_data\clean


---
## 7. Summary and Data Preview

In [12]:
print("=" * 60)
print("DOWNLOAD SUMMARY")
print("=" * 60)

print(f"\nüìÅ OpenElections: {len(openelections_data)} datasets")
for name, df in openelections_data.items():
    print(f"   - {name}: {len(df):,} rows")

print(f"\nüìÅ MIT Election Lab: {len(mit_data)} datasets")
for name, df in mit_data.items():
    print(f"   - {name}: {len(df):,} rows")

print(f"\nüìÅ Harvard Dataverse: {len(dataverse_data)} datasets")
for name, df in dataverse_data.items():
    print(f"   - {name}: {len(df):,} rows")

print(f"\nüìÅ Census Demographics: {len(texas_census)} datasets")
for name, df in texas_census.items():
    print(f"   - {name}: {len(df):,} rows")

print(f"\n" + "=" * 60)
print(f"All files saved to: {os.path.abspath(OUTPUT_DIR)}")
print(f"  - Raw files: {RAW_DIR}")
print(f"  - Clean CSVs: {CLEAN_DIR}")

DOWNLOAD SUMMARY

üìÅ OpenElections: 3 datasets
   - 2018_general_precinct: 463,336 rows
   - 2016_general_precinct: 218,644 rows
   - 2014_general: 1,402 rows

üìÅ MIT Election Lab: 0 datasets

üìÅ Harvard Dataverse: 1 datasets
   - president_county_2000_2020: 4,064 rows

üìÅ Census Demographics: 0 datasets

All files saved to: C:\Users\spdal\Documents\Election Analysis\texas_election_data
  - Raw files: texas_election_data\raw
  - Clean CSVs: texas_election_data\clean


In [13]:
# Preview a sample dataset
all_data = {**openelections_data, **mit_data, **dataverse_data}

if all_data:
    sample_key = list(all_data.keys())[0]
    df = all_data[sample_key]
    
    print(f"Preview: {sample_key}")
    print(f"Shape: {df.shape}")
    print(f"\nColumns: {list(df.columns)}")
    print(f"\nFirst 10 rows:")
    display(df.head(10))

Preview: 2018_general_precinct
Shape: (463336, 13)

Columns: ['county', 'precinct', 'office', 'district', 'candidate', 'party', 'votes', 'absentee', 'election_day', 'early_voting', 'mail', 'provisional', 'limited']

First 10 rows:


Unnamed: 0,county,precinct,office,district,candidate,party,votes,absentee,election_day,early_voting,mail,provisional,limited
Childress,101,Attorney General,,Justin Nelson,DEM,45.0,,23.0,22,,,,
Childress,101,Attorney General,,Ken Paxton,REP,335.0,,141.0,194,,,,
Childress,101,Attorney General,,Michael Ray Harris,LIB,6.0,,3.0,3,,,,
Childress,201,Attorney General,,Justin Nelson,DEM,49.0,,19.0,30,,,,
Childress,201,Attorney General,,Ken Paxton,REP,405.0,,174.0,231,,,,
Childress,201,Attorney General,,Michael Ray Harris,LIB,8.0,,2.0,6,,,,
Childress,301,Attorney General,,Justin Nelson,DEM,89.0,,46.0,43,,,,
Childress,301,Attorney General,,Ken Paxton,REP,313.0,,133.0,180,,,,
Childress,301,Attorney General,,Michael Ray Harris,LIB,8.0,,1.0,7,,,,
Childress,401,Attorney General,,Justin Nelson,DEM,62.0,,26.0,36,,,,


---
## 8. Manual Download Links (Backup)

If automated downloads fail, use these direct links:

In [14]:
backup_links = """
===============================================================================
BACKUP MANUAL DOWNLOAD LINKS
===============================================================================

OPENELECTIONS (GitHub - Best for county/precinct CSVs)
------------------------------------------------------
‚Ä¢ Texas Data Repo: https://github.com/openelections/openelections-data-tx
‚Ä¢ Browse files: https://github.com/openelections/openelections-data-tx/tree/master
‚Ä¢ 2018 General: https://github.com/openelections/openelections-data-tx/tree/master/2018
‚Ä¢ 2020 General: https://github.com/openelections/openelections-data-tx/tree/master/2020

TEXAS CAPITOL DATA PORTAL (Official VTD-level data)
---------------------------------------------------
‚Ä¢ Elections Topic: https://data.capitol.texas.gov/topic/elections
‚Ä¢ 2024 General: https://data.capitol.texas.gov/dataset/2024_general
‚Ä¢ 2022 General: https://data.capitol.texas.gov/dataset/2022_general
‚Ä¢ Comprehensive ZIP: https://data.capitol.texas.gov/dataset/comprehensive-election-datasets-compressed-format

MIT ELECTION DATA LAB (GitHub)
------------------------------
‚Ä¢ Main Page: https://electionlab.mit.edu/data
‚Ä¢ 2024 Repo: https://github.com/MEDSL/2024-elections-official
‚Ä¢ 2022 Repo: https://github.com/MEDSL/2022-elections-official
‚Ä¢ 2020 Repo: https://github.com/MEDSL/2020-elections-official
‚Ä¢ 2018 Repo: https://github.com/MEDSL/2018-elections-official

HARVARD DATAVERSE (Research datasets)
-------------------------------------
‚Ä¢ MEDSL Collection: https://dataverse.harvard.edu/dataverse/medsl
‚Ä¢ Presidential County: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
‚Ä¢ Senate County: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PEJ5QU
‚Ä¢ VEST Election Science: https://dataverse.harvard.edu/dataverse/electionscience

REDISTRICTING DATA HUB (Precinct + boundaries)
----------------------------------------------
‚Ä¢ Texas Data: https://redistrictingdatahub.org/state/texas/
‚Ä¢ About Data: https://redistrictingdatahub.org/data/about-our-data/

U.S. CENSUS - VOTING DEMOGRAPHICS
---------------------------------
‚Ä¢ 2024: https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-590.html
‚Ä¢ 2022: https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-586.html
‚Ä¢ 2020: https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-585.html
‚Ä¢ 2018: https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-583.html

TEXAS SECRETARY OF STATE (Official but harder to download)
----------------------------------------------------------
‚Ä¢ Historical Results: https://www.sos.state.tx.us/elections/historical/index.shtml
‚Ä¢ Voter Registration: https://www.sos.state.tx.us/elections/historical/vrinfo.shtml
"""

print(backup_links)


BACKUP MANUAL DOWNLOAD LINKS

OPENELECTIONS (GitHub - Best for county/precinct CSVs)
------------------------------------------------------
‚Ä¢ Texas Data Repo: https://github.com/openelections/openelections-data-tx
‚Ä¢ Browse files: https://github.com/openelections/openelections-data-tx/tree/master
‚Ä¢ 2018 General: https://github.com/openelections/openelections-data-tx/tree/master/2018
‚Ä¢ 2020 General: https://github.com/openelections/openelections-data-tx/tree/master/2020

TEXAS CAPITOL DATA PORTAL (Official VTD-level data)
---------------------------------------------------
‚Ä¢ Elections Topic: https://data.capitol.texas.gov/topic/elections
‚Ä¢ 2024 General: https://data.capitol.texas.gov/dataset/2024_general
‚Ä¢ 2022 General: https://data.capitol.texas.gov/dataset/2022_general
‚Ä¢ Comprehensive ZIP: https://data.capitol.texas.gov/dataset/comprehensive-election-datasets-compressed-format

MIT ELECTION DATA LAB (GitHub)
------------------------------
‚Ä¢ Main Page: https://electio

---
## Data Dictionary

In [15]:
data_dict = """
===============================================================================
DATA DICTIONARY
===============================================================================

OPENELECTIONS FORMAT
--------------------
‚Ä¢ county: Texas county name
‚Ä¢ precinct: Precinct identifier (if precinct-level)
‚Ä¢ office: Race (President, U.S. Senate, Governor, etc.)
‚Ä¢ district: District number (for House races)
‚Ä¢ party: Political party (DEM, REP, LIB, GRN, etc.)
‚Ä¢ candidate: Candidate name
‚Ä¢ votes: Total votes received
‚Ä¢ early_voting: Early votes (if available)
‚Ä¢ election_day: Election day votes (if available)
‚Ä¢ mail: Mail-in votes (if available)

MIT/DATAVERSE FORMAT
--------------------
‚Ä¢ year: Election year
‚Ä¢ state: State name
‚Ä¢ state_po: State postal code (TX)
‚Ä¢ state_fips: State FIPS code (48)
‚Ä¢ county_name: County name
‚Ä¢ county_fips: County FIPS code
‚Ä¢ office: Race type
‚Ä¢ candidate: Candidate name
‚Ä¢ party_simplified: Simplified party (DEMOCRAT, REPUBLICAN, LIBERTARIAN, OTHER)
‚Ä¢ candidatevotes: Votes for candidate
‚Ä¢ totalvotes: Total votes in race
‚Ä¢ mode: Voting mode (TOTAL, ELECTION DAY, EARLY, MAIL, PROVISIONAL)

CENSUS CPS FORMAT
-----------------
‚Ä¢ State/Geography identifier
‚Ä¢ Total citizen population 18+
‚Ä¢ Total registered
‚Ä¢ Percent registered
‚Ä¢ Total voted
‚Ä¢ Percent voted (of citizen pop)
‚Ä¢ Breakdowns by: Age, Race/ethnicity, Sex, Education

TEXAS VTD FORMAT (Capitol Data Portal)
--------------------------------------
‚Ä¢ VTD_ID: Voter Tabulation District identifier
‚Ä¢ County: County name
‚Ä¢ Election-specific vote columns by candidate/party
"""

print(data_dict)


DATA DICTIONARY

OPENELECTIONS FORMAT
--------------------
‚Ä¢ county: Texas county name
‚Ä¢ precinct: Precinct identifier (if precinct-level)
‚Ä¢ office: Race (President, U.S. Senate, Governor, etc.)
‚Ä¢ district: District number (for House races)
‚Ä¢ party: Political party (DEM, REP, LIB, GRN, etc.)
‚Ä¢ candidate: Candidate name
‚Ä¢ votes: Total votes received
‚Ä¢ early_voting: Early votes (if available)
‚Ä¢ election_day: Election day votes (if available)
‚Ä¢ mail: Mail-in votes (if available)

MIT/DATAVERSE FORMAT
--------------------
‚Ä¢ year: Election year
‚Ä¢ state: State name
‚Ä¢ state_po: State postal code (TX)
‚Ä¢ state_fips: State FIPS code (48)
‚Ä¢ county_name: County name
‚Ä¢ county_fips: County FIPS code
‚Ä¢ office: Race type
‚Ä¢ candidate: Candidate name
‚Ä¢ party_simplified: Simplified party (DEMOCRAT, REPUBLICAN, LIBERTARIAN, OTHER)
‚Ä¢ candidatevotes: Votes for candidate
‚Ä¢ totalvotes: Total votes in race
‚Ä¢ mode: Voting mode (TOTAL, ELECTION DAY, EARLY, MAIL, PROVI

---
## Next Steps

1. **Verify downloads**: Check `texas_election_data/clean/` for usable CSVs
2. **Merge datasets**: Join election results with demographics by year
3. **Feature engineering**: Create turnout rates, vote shares, demographic ratios
4. **Model training**: Use clean CSVs as model input

### Suggested merges:
- Election results (by county/year) + Census demographics (by state/year)
- Presidential returns + Senate returns (same election)
- Multi-year trends (2018 ‚Üí 2020 ‚Üí 2022 ‚Üí 2024)