# Data Fetching: FJC and Congress.gov API

This notebook is responsible for fetching and initially processing data from our primary sources:

1. Federal Judicial Center (FJC) CSV and Excel files
2. Congress.gov API judicial nomination data

According to the project architecture, this notebook will:
1. Download or use cached data from the FJC and Congress.gov API
2. Perform minimal transformations to convert to dataframes
3. Save the resulting dataframes to `data/raw` for further processing by downstream notebooks

## Setup

In [None]:
import os
import sys
from pathlib import Path

import pandas as pd
from loguru import logger

# Add the project root to the path so we can import our modules
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

from nomination_predictor.congress_api import CongressAPIClient

# Setup logging
logger.remove()  # Remove default handler
logger.add(sys.stderr, format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level}</level> | <cyan>{function}</cyan> - <level>{message}</level>", level="INFO")

6

## 1. Federal Judicial Center (FJC) Data

The FJC data is our canonical source for judicial seat timelines, judge demographics, and nomination failures.

### Check if FJC data exists or download if needed

In [None]:
# Check if required FJC data files exist and download any missing ones
from nomination_predictor.config import EXTERNAL_DATA_DIR
from nomination_predictor.fjc_data import (REQUIRED_FJC_FILES,
                                           ensure_fjc_data_files,
                                           load_fjc_data)

# Check for missing files and download them if needed
downloaded, failed = ensure_fjc_data_files()

# Report status
if downloaded:
    print(f"✓ Downloaded {len(downloaded)} previously missing files: {', '.join(downloaded)}")
if failed:
    print(f"❌ Failed to download {len(failed)} files: {', '.join(failed)}")
    
# Also report on which files are present
present_files = [f for f in REQUIRED_FJC_FILES if (EXTERNAL_DATA_DIR / f).exists()]
if len(present_files) == len(REQUIRED_FJC_FILES):
    print(f"✓ All required FJC data files are available in {EXTERNAL_DATA_DIR}")
else:
    missing = set(REQUIRED_FJC_FILES) - set(present_files)
    print(f"⚠️ Still missing {len(missing)} required files: {', '.join(missing)}")

[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mensure_fjc_data_files[0m - [1mEnsuring FJC data files are available[0m


✓ All required FJC data files are available in /home/wsl2ubuntuuser/nomination_predictor/data/external


### Load FJC Data

In [None]:
# Load all FJC data files (with auto-download enabled by default)
fjc_data = load_fjc_data()

# Access individual DataFrames
print(f"Loaded FJC data files:")
for key, df in fjc_data.items():
    print(f"- {key}: {len(df)} records")

# Store references to commonly used DataFrames for easier access
judges_df = fjc_data.get('judges')
demographics_df = fjc_data.get('demographics')
education_df = fjc_data.get('education')
federal_judicial_service_df = fjc_data.get('federal_judicial_service')
other_nominations_recess_df = fjc_data.get('other_nominations_recess')
other_federal_judicial_service_df = fjc_data.get('other_federal_judicial_service')
professional_career_df = fjc_data.get('professional_career')

# Create a dictionary of all FJC dataframes for easy iteration
all_dataframes = {
    'judges': judges_df,
    'demographics': demographics_df,
    'education': education_df,
    'federal_judicial_service': federal_judicial_service_df,
    'other_nominations_recess': other_nominations_recess_df,
    'other_federal_judicial_service': other_federal_judicial_service_df,
    'professional_career': professional_career_df
}

[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mload_fjc_data[0m - [1mLoading FJC data files[0m
[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mensure_fjc_data_files[0m - [1mEnsuring FJC data files are available[0m
[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mload_fjc_csv[0m - [1mLoading FJC data file: demographics.csv[0m
[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mload_fjc_data[0m - [1mLoaded demographics data with 4022 records[0m
[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mload_fjc_csv[0m - [1mLoading FJC data file: education.csv[0m
[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mload_fjc_data[0m - [1mLoaded education data with 8040 records[0m
[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mload_fjc_csv[0m - [1mLoading FJC data file: federal-judicial-service.csv[0m
[32m2025-07-12 16:46:41[0m | [1mINFO[0m | [36mload_fjc_data[0m - [1mLoaded federal_judicial_service data with 4720 records[0m
[32m2025-07-12 16:46:41[0m | [1m

Loaded FJC data files:
- demographics: 4022 records
- education: 8040 records
- federal_judicial_service: 4720 records
- judges: 4022 records
- other_nominations_recess: 828 records
- other_federal_judicial_service: 611 records
- professional_career: 19003 records


### Build a "seat timeline" inferred from FJC's data about when judges were in service:

In [None]:
from nomination_predictor.dataset import build_and_validate_seat_timeline

try:
    seat_timeline_df = build_and_validate_seat_timeline(federal_judicial_service_df)
    print(f"✅ Successfully built seat timeline with {len(seat_timeline_df):,} records")
    all_dataframes['seat_timeline'] = seat_timeline_df
except Exception as e:
    print(f"❌ Error: {e}")
    raise

[32m2025-07-12 16:46:42[0m | [1mINFO[0m | [36mbuild_seat_timeline[0m - [1mBuilding seat timeline table[0m
[32m2025-07-12 16:47:17[0m | [1mINFO[0m | [36mbuild_and_validate_seat_timeline[0m - [1mSuccessfully built seat timeline with 4,720 records[0m


✅ Successfully built seat timeline with 4,720 records


## 2. Congress.gov API Data

The Congress.gov API provides detailed information about judicial nominations, including:
- Nomination date
- Nominee information
- Confirmation status and date
- Committee actions

### Setup API Access

In [None]:
# Check if API key is available
api_key = os.environ.get("CONGRESS_API_KEY")
if not api_key:
    print("❌ Error: CONGRESS_API_KEY environment variable not set")
    print("Please set the CONGRESS_API_KEY environment variable to your Congress.gov API key")
    print("You can request an API key at: https://api.congress.gov/sign-up/")
else:
    print("✓ Congress API key found in environment variables")
    # Initialize the API client
    congress_client = CongressAPIClient(api_key)
    print("✓ Congress API client initialized")

✓ Congress API key found in environment variables
✓ Congress API client initialized


### Fetch Judicial Nominations from Recent Congresses

In [None]:
# Fetch judicial nominations from recent congresses
# Congress numbering: 116th (2019-2021), 117th (2021-2023), 118th (2023-2025)
import os
from lzma import MODE_FAST
from pathlib import Path

from nomination_predictor.config import RAW_DATA_DIR

MOST_RECENT_CONGRESS_TERM_TO_GET = 118
OLDEST_CONGRESS_TERM_TO_GET = 118#90

# Define cache file path for nominations
nominations_cache_file = os.path.join(RAW_DATA_DIR, "nominations.csv")
congresses = range(MOST_RECENT_CONGRESS_TERM_TO_GET, OLDEST_CONGRESS_TERM_TO_GET-1, -1)

# Check if we have cached data
if os.path.exists(nominations_cache_file):
    logger.info(f"Found cached nominations data at {nominations_cache_file}")
    nominations_df = pd.read_csv(nominations_cache_file, parse_dates=['receiveddate', 'authoritydate'])
    logger.info(f"Loaded {len(nominations_df)} nominations from cache")
else:
    # If no cache, fetch from API
    all_nominations = []
    
    for congress in congresses:
        try:
            logger.info(f"Fetching judicial nominations for the {congress}th Congress...")
            nominations = congress_client.get_judicial_nominations(congress, auto_paginate=False) # can choose to disable auto-pagination if you want less data, but faster, for development
            logger.info(f"  ✓ Retrieved {len(nominations)} judicial nominations")
            all_nominations.extend(nominations)
        except Exception as e:
            logger.error(f"  ❌ Error fetching nominations for {congress}th Congress: {str(e)}")
    
    # Convert to DataFrame
    nominations_df = pd.DataFrame(all_nominations)
    logger.info(f"\nTotal nominations retrieved: {len(nominations_df)}")
    

[32m2025-07-12 16:46:47[0m | [1mINFO[0m | [36m<module>[0m - [1mFetching judicial nominations for the 118th Congress...[0m
[32m2025-07-12 16:46:47[0m | [1mINFO[0m | [36mget_judicial_nominations[0m - [1mFetching judicial nominations for Congress 118[0m
[32m2025-07-12 16:46:47[0m | [1mINFO[0m | [36mget_nominations[0m - [1mFetching nominations for 118th Congress with auto-pagination option set to False[0m
[32m2025-07-12 16:46:47[0m | [1mINFO[0m | [36mget_nominations[0m - [1mFetching page 1 for 118th Congress nominations[0m


[32m2025-07-12 16:46:53[0m | [1mINFO[0m | [36mget_nominations[0m - [1mRetrieved 250 nominations from page 1[0m
[32m2025-07-12 16:46:53[0m | [1mINFO[0m | [36mget_nominations[0m - [1mTotal nominations retrieved after pagination: 250[0m
[32m2025-07-12 16:46:53[0m | [1mINFO[0m | [36mget_judicial_nominations[0m - [1mFound 250 civilian nominations in Congress 118[0m
[32m2025-07-12 16:46:53[0m | [1mINFO[0m | [36mget_judicial_nominations[0m - [1mFound 80 judicial nominations based on summary data[0m
[32m2025-07-12 16:46:53[0m | [1mINFO[0m | [36mget_judicial_nominations[0m - [1mJudicial nomination 1: 2012 - James Graham Lake, of the District of Columbia, to be an Associate Judge of the Superior Court of the District of Columbia for a term of fifteen years, vice Jennifer M. Anderson, retired.[0m
[32m2025-07-12 16:46:53[0m | [1mINFO[0m | [36mget_judicial_nominations[0m - [1mJudicial nomination 2: 2013 - Nicholas George Miranda, of the District of Col

In [None]:
# Preview the nominations
print(nominations_df.head())
all_dataframes['nominations'] = nominations_df

                                             actions authoritydate citation  \
0  {'count': 6, 'url': 'https://api.congress.gov/...    2025-05-12   PN2012   
1  {'count': 6, 'url': 'https://api.congress.gov/...    2025-05-12   PN2013   
2  {'count': 6, 'url': 'https://api.congress.gov/...    2025-03-28    PN814   
3  {'count': 11, 'url': 'https://api.congress.gov...    2025-03-28    PN771   
4  {'count': 12, 'url': 'https://api.congress.gov...    2025-03-28    PN769   

                                          committees  congress  \
0  {'count': 1, 'url': 'https://api.congress.gov/...       118   
1  {'count': 1, 'url': 'https://api.congress.gov/...       118   
2  {'count': 1, 'url': 'https://api.congress.gov/...       118   
3  {'count': 1, 'url': 'https://api.congress.gov/...       118   
4  {'count': 1, 'url': 'https://api.congress.gov/...       118   

                                         description  \
0  James Graham Lake, of the District of Columbia...   
1  Nicholas Geor

### Fetch nominees for just-retrieved nominations

In [None]:
nominees_cache_file = os.path.join(RAW_DATA_DIR, "nominees.csv")

# Check if we have cached data
if os.path.exists(nominees_cache_file):
    print(f"Found cached nominees data at {nominees_cache_file}")
    nominees_df = pd.read_csv(nominees_cache_file)
    print(f"Loaded {len(nominees_df)} nominee records from cache")
elif 'nominee_url' not in nominations_df.columns:
    print("⚠️ No nominee_url column found in nominations_df")
else:
    print(f"Fetching nominee data for {len(nominations_df)} nominations...")

    # Filter out records without nominee_url
    valid_nominations = nominations_df[~nominations_df['nominee_url'].isna()]
    print(f"Found {len(valid_nominations)} nominations with valid nominee_url")

    # Fetch nominee data for all nominations
    nominees_data = congress_client.get_all_nominees_data(valid_nominations)

    # Convert to DataFrame
    nominees_df = pd.DataFrame(nominees_data)
    print(f"\nTotal nominees retrieved: {len(nominees_df)}")

[32m2025-07-12 16:47:35[0m | [1mINFO[0m | [36mget_all_nominees_data[0m - [1mFetching nominee data for 80 nominations[0m
[32m2025-07-12 16:47:35[0m | [1mINFO[0m | [36mget_all_nominees_data[0m - [1mProcessing nominee 1/80: PN2012[0m
[32m2025-07-12 16:47:35[0m | [1mINFO[0m | [36mget_nominee_data_from_url[0m - [1mFetching nominee data from URL: https://api.congress.gov/v3/nomination/118/2012/1?format=json[0m


Fetching nominee data for 80 nominations...
Found 80 nominations with valid nominee_url


[32m2025-07-12 16:47:35[0m | [1mINFO[0m | [36mget_all_nominees_data[0m - [1mAdded nominee data for PN2012[0m
[32m2025-07-12 16:47:35[0m | [1mINFO[0m | [36mget_all_nominees_data[0m - [1mProcessing nominee 2/80: PN2013[0m
[32m2025-07-12 16:47:35[0m | [1mINFO[0m | [36mget_nominee_data_from_url[0m - [1mFetching nominee data from URL: https://api.congress.gov/v3/nomination/118/2013/1?format=json[0m
[32m2025-07-12 16:47:36[0m | [1mINFO[0m | [36mget_all_nominees_data[0m - [1mAdded nominee data for PN2013[0m
[32m2025-07-12 16:47:36[0m | [1mINFO[0m | [36mget_all_nominees_data[0m - [1mProcessing nominee 3/80: PN814[0m
[32m2025-07-12 16:47:36[0m | [1mINFO[0m | [36mget_nominee_data_from_url[0m - [1mFetching nominee data from URL: https://api.congress.gov/v3/nomination/118/814/1?format=json[0m
[32m2025-07-12 16:47:36[0m | [1mINFO[0m | [36mget_all_nominees_data[0m - [1mAdded nominee data for PN814[0m
[32m2025-07-12 16:47:36[0m | [1mINFO[0m 


Total nominees retrieved: 80


In [None]:
# Preview the nominees
print(nominees_df.head())
all_dataframes['nominees'] = nominees_df

  firstName lastName middleName ordinal state congress number  \
0     James     Lake     Graham       1    DC      118   2012   
1  Nicholas  Miranda     George       1    DC      118   2013   
2      Lisa     Wang         W.       1    DC      118    814   
3   Brandon     Long         S.       1    LA      118    771   
4     Jerry  Edwards        NaN       1    LA      118    769   

                                         nominee_url citation  nominee_id  \
0  https://api.congress.gov/v3/nomination/118/201...   PN2012  118-2012-1   
1  https://api.congress.gov/v3/nomination/118/201...   PN2013  118-2013-1   
2  https://api.congress.gov/v3/nomination/118/814...    PN814   118-814-1   
3  https://api.congress.gov/v3/nomination/118/771...    PN771   118-771-1   
4  https://api.congress.gov/v3/nomination/118/769...    PN769   118-769-1   

                data_source              retrieval_date suffix  
0  congress.gov_api_nominee  2025-07-12T16:47:35.972746    NaN  
1  congress.gov_

In [None]:
## TODO: determine whether safe to move this to other notebook, or if other code already depends on it happening this early
## Normalize column names, leaving data values as-is
#nominees_df.columns = [col.casefold().replace(' ', '_') for col in nominees_df.columns]
#print("\nNominees DataFrame columns:")
#for col in sorted(nominees_df.columns):
#     print(f"- {col}: {nominees_df[col].nunique()} unique values")

## 3. Confirm "nid" and "citation" uniqueness to use as FJC and Congress indexes, respectively

In [None]:
# Check for uniqueness in ID fields before saving to the raw data folder
from nomination_predictor.dataset import validate_dataframe_ids

print("Checking ID uniqueness in dataframes before saving...")

uniqueness_results = validate_dataframe_ids({name: df for name, df in all_dataframes.items() if name != 'seat_timeline'}) # seat timeline is generated by us, not FJC, and nids are not unique in it by design

# Check if any dataframes have duplicate IDs
problematic_dfs = [name for name, result in uniqueness_results.items() 
                   if not result.get('is_unique', True)]

if problematic_dfs:
    logger.warning(f"⚠️ Found non-unique IDs in: {', '.join(problematic_dfs)}")
    for df_name in problematic_dfs:
        result = uniqueness_results[df_name]
        logger.warning(f"\nDuplicates in {df_name}:")
        display(result['duplicate_rows'])
else:
    print("✓ All ID fields are unique across all dataframes.")

[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [36mvalidate_dataframe_ids[0m - [1mChecking 'nid' uniqueness for dataframe 'judges'[0m
[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [36mcheck_id_uniqueness[0m - [1mAll nid values are unique[0m
[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [36mvalidate_dataframe_ids[0m - [1mChecking 'nid' uniqueness for dataframe 'demographics'[0m
[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [36mcheck_id_uniqueness[0m - [1mAll nid values are unique[0m
[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [36mvalidate_dataframe_ids[0m - [1mChecking 'nid' uniqueness for dataframe 'education'[0m
[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [36mvalidate_dataframe_ids[0m - [1mChecking 'nid' uniqueness for dataframe 'federal_judicial_service'[0m
[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [36mvalidate_dataframe_ids[0m - [1mChecking 'nid' uniqueness for dataframe 'other_nominations_recess'[0m
[32m2025-07-12 16:58:19[0m | [1mINFO[0m | [

Checking ID uniqueness in dataframes before saving...


Unnamed: 0,nid,sequence,judge_name,school,degree,degree_year
0,13761857,1,"Abelson, Adam Ben",Princeton University,B.A.,2005
1,13761857,2,"Abelson, Adam Ben",New York University School of Law,J.D.,2010
2,1393931,1,"Abrams, Ronnie",Cornell University,B.A.,1990
3,1393931,2,"Abrams, Ronnie",Yale Law School,J.D.,1993
5,13651551,1,"Abudu, Nancy Gbana",Columbia University,B.A.,1996
...,...,...,...,...,...,...
8035,1390291,2,"Zloch, William J.",Notre Dame Law School,J.D.,1974
8036,1390301,1,"Zobel, Rya Weickert",Radcliffe College,A.B.,1953
8037,1390301,2,"Zobel, Rya Weickert",Harvard Law School,LL.B.,1956
8038,1392366,1,"Zouhary, Jack",Dartmouth College,B.A.,1973


Duplicates in federal_judicial_service:[0m


Unnamed: 0,nid,sequence,judge_name,court_type,court_name,appointment_title,appointing_president,party_of_appointing_president,reappointing_president,party_of_reappointing_president,...,ayes/nays,confirmation_date,commission_date,"service_as_chief_judge,_begin","service_as_chief_judge,_end","2nd_service_as_chief_judge,_begin","2nd_service_as_chief_judge,_end",senior_status_date,termination,termination_date
4,1376981,1,"Acheson, Marcus Wilson",U.S. District Court,U.S. District Court for the Western District o...,Judge,Rutherford B. Hayes,Republican,,,...,,1880-01-14,1880-01-14,,,,,,Appointment to Another Judicial Position,1891-02-09
5,1376981,2,"Acheson, Marcus Wilson",U.S. Circuit Court (1869-1911),U.S. Circuit Courts for the Third Circuit,Judge,Benjamin Harrison,Republican,,,...,,1891-02-03,1891-02-03,,,,,,Death,1906-06-21
6,1376981,3,"Acheson, Marcus Wilson",U.S. Court of Appeals,U.S. Court of Appeals for the Third Circuit,Judge,None (assignment),None (assignment),,,...,,,1891-06-16,,,,,,Death,1906-06-21
9,1376996,1,"Ackerman, James Waldo",U.S. District Court,U.S. District Court for the Southern District ...,Judge,Gerald Ford,Republican,,,...,,1976-07-02,1976-07-02,,,,,,Reassignment,1979-03-31
10,1376996,2,"Ackerman, James Waldo",U.S. District Court,U.S. District Court for the Central District o...,Judge,None (reassignment),None (reassignment),,,...,,,1979-03-31,1982.0,1984.0,,,,Death,1984-11-23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4688,1390201,1,"Yankwich, Leon Rene",U.S. District Court,U.S. District Court for the Southern District ...,Judge,Franklin D. Roosevelt,Democratic,,,...,,1935-08-23,1935-08-24,1951.0,1959.0,,,1964-04-28,Reassignment,1966-09-18
4689,1390201,2,"Yankwich, Leon Rene",U.S. District Court,U.S. District Court for the Central District o...,Judge,None (reassignment),None (reassignment),,,...,,,1966-09-18,,,,,,Death,1975-02-09
4695,1390221,1,"Young, George Cressler",U.S. District Court,U.S. District Court for the Northern District ...,Judge,John F. Kennedy,Democratic,,,...,,1961-09-14,1961-09-18,,,,,,Reassignment,1966-09-17
4696,1390221,2,"Young, George Cressler",U.S. District Court,U.S. District Court for the Southern District ...,Judge,John F. Kennedy,Democratic,,,...,,1961-09-14,1961-09-18,,,,,,Reassignment,1966-09-17


Duplicates in other_nominations_recess:[0m


Unnamed: 0,nid,sequence,judge_name,other_nominations/recess_appointments
7,1377131,1,"Allred, James V.",Received recess appointment to U.S. District C...
8,1377131,2,"Allred, James V.",Nominated to U.S. Court of Appeals for the Fif...
13,1390306,1,"Andrews, Maurice Neil",Nominated to U.S. District Court for the North...
14,1390306,2,"Andrews, Maurice Neil",Nominated to U.S. District Court for the North...
17,6385001,1,"Arias-Marxuach, Raúl Manuel",Nominated to U.S. District Court for the Distr...
...,...,...,...,...
811,1389986,2,"Withey, Solomon Lewis",Nominated to U.S. Circuit Courts for the Sixth...
822,1393301,1,"Wynn, James Andrew, Jr.",Nominated to U.S. Court of Appeals for the Fou...
823,1393301,2,"Wynn, James Andrew, Jr.",Nominated to U.S. Court of Appeals for the Fou...
826,6839686,1,"Younge, John Milton",Nominated to U.S. District Court for the Easte...


Duplicates in other_federal_judicial_service:[0m


Unnamed: 0,nid,sequence,judge_name,type,other_federal_judicial_service,unnamed:_5,unnamed:_6,unnamed:_7,unnamed:_8,unnamed:_9,...,unnamed:_21,unnamed:_22,unnamed:_23,unnamed:_24,unnamed:_25,unnamed:_26,unnamed:_27,unnamed:_28,unnamed:_29,unnamed:_30
6,1377121,1,"Alley, Wayne Edward",Military Courts,"Military Judge, U.S. Army, Saigon, Republic of...",,,,,,...,,,,,,,,,,
7,1377121,2,"Alley, Wayne Edward",Military Courts,"Military Judge, U.S. Army, Schofield Barracks,...",,,,,,...,,,,,,,,,,
8,1377121,3,"Alley, Wayne Edward",Military Courts,"Appellate Military Judge, U.S. Army Court of M...",,,,,,...,,,,,,,,,,
9,1377121,4,"Alley, Wayne Edward",Military Courts,"Chief Military Trial Judge, U.S. Army, 1975",,,,,,...,,,,,,,,,,
12,1377216,1,"Anderson, Robert Palmer",U.S. Commissioner,"U.S. Commissioner, U.S. District Court for the...",,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
539,1388436,2,"Sullivan, Emmet G.",District of Columbia Courts,"Judge, District of Columbia Court of Appeals, ...",,,,,,...,,,,,,,,,,
572,1391301,1,"Walton, Reggie B.",District of Columbia Courts,"Judge, Superior Court of the District of Colum...",,,,,,...,,,,,,,,,,
573,1391301,2,"Walton, Reggie B.",Foreign Intelligence Surveillance Court,"Judge, Foreign Intelligence Surveillance Court...",,,,,,...,,,,,,,,,,
590,1389836,1,"Williams, Glen Morgan",U.S. Commissioner,"U.S. Commissioner, U.S. District Court for the...",,,,,,...,,,,,,,,,,


Duplicates in professional_career:[0m


Unnamed: 0,nid,sequence,judge_name,professional_career
0,13761857,1,"Abelson, Adam Ben","Law clerk, Hon. Catherine C. Blake, U.S. Distr..."
1,13761857,2,"Abelson, Adam Ben","Law clerk, Hon. Andre M. Davis, U.S. Court of ..."
2,13761857,3,"Abelson, Adam Ben","Private practice, Washington, D.C., 2012-2014"
3,13761857,4,"Abelson, Adam Ben","Private practice, Baltimore, Maryland, 2014-2023"
4,1393931,1,"Abrams, Ronnie","Law clerk, Hon. Thomas P. Griesa, U.S. Distric..."
...,...,...,...,...
18998,1390301,2,"Zobel, Rya Weickert","Private practice, Boston, Massachusetts, 1967-..."
18999,1390301,3,"Zobel, Rya Weickert","Director, Federal Judicial Center, 1995-1999"
19000,1392366,1,"Zouhary, Jack","Private practice, Toledo, Ohio, 1976-1999, 200..."
19001,1392366,2,"Zouhary, Jack","Senior vice president and general counsel, S.E..."


## 4. Save Data to Raw Directory

Save the datasets to the raw data directory for use by downstream notebooks.

In [None]:
# Save data to the raw data directory
import os
from datetime import datetime

from nomination_predictor.config import RAW_DATA_DIR

# Create the raw data directory if it doesn't exist
os.makedirs(RAW_DATA_DIR, exist_ok=True)

# Add a timestamp for the manifest
timestamp = datetime.now().strftime("%Y%m%d")

# Save each FJC dataframe
# Save all dataframes to the raw data directory
print(f"Saving dataframes to {RAW_DATA_DIR}...")
saved_files = []

# Ensure the output directory exists
RAW_DATA_DIR.mkdir(parents=True, exist_ok=True)

# Save all dataframes from the all_dataframes collection
for name, df in all_dataframes.items():
    if df is not None and not df.empty:
        try:
            # Create filename
            output_file = RAW_DATA_DIR / f"{name}.csv"
            
            # Save to CSV
            df.to_csv(output_file, index=False)
            saved_files.append(f"{name}.csv")
            print(f"  ✓ Saved {len(df):,} records to {output_file}")
        except Exception as e:
            print(f"  ✗ Error saving {name}: {str(e)}")

# Print summary
if saved_files:
    print(f"\n✅ Successfully saved {len(saved_files)} dataframes to {RAW_DATA_DIR}")
else:
    print("\n⚠️ No dataframes were saved - check if all_dataframes is populated correctly")

# Create a manifest file to track what was saved and when
manifest_content = f"""# FJC Data Processing Manifest
Processed on: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
Note: Only column names are normalized (lowercase with underscores), data values remain unchanged
Files saved:
{chr(10).join(['- ' + file for file in saved_files])}
"""

with open(RAW_DATA_DIR / f"fjc_data_manifest_{timestamp}.txt", "w") as f:
    f.write(manifest_content)

print(f"✓ Saved {len(saved_files)} files to {RAW_DATA_DIR}")
print(f"✓ Created manifest: fjc_data_manifest_{timestamp}.txt")

Saving dataframes to /home/wsl2ubuntuuser/nomination_predictor/data/raw...


  ✓ Saved 4,022 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/judges.csv
  ✓ Saved 4,022 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/demographics.csv
  ✓ Saved 8,040 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/education.csv
  ✓ Saved 4,720 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/federal_judicial_service.csv
  ✓ Saved 828 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/other_nominations_recess.csv
  ✓ Saved 611 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/other_federal_judicial_service.csv
  ✓ Saved 19,003 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/professional_career.csv
  ✓ Saved 4,720 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/seat_timeline.csv
  ✓ Saved 80 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/nominations.csv
  ✓ Saved 80 records to /home/wsl2ubuntuuser/nomination_predictor/data/raw/nominees.csv

✅ Successfully saved 1

In [None]:
# Save Congress API retrieved nominations to cache file
if nominations_df is not None and not nominations_df.empty:
    # Ensure directory exists
    os.makedirs(os.path.dirname(nominations_cache_file), exist_ok=True)
    print(f"Saving nominations to cache file: {nominations_cache_file}")
    nominations_df.to_csv(nominations_cache_file, index=False)
    print(f"✓ Saved {len(nominations_df)} nominations to cache")
    
if nominees_df is not None and not nominees_df.empty:
    # Ensure directory exists
    os.makedirs(os.path.dirname(nominees_cache_file), exist_ok=True)
    print(f"Saving nominees to cache file: {nominees_cache_file}")
    nominees_df.to_csv(nominees_cache_file, index=False)
    print(f"✓ Saved {len(nominees_df)} nominees to cache")

Saving nominations to cache file: /home/wsl2ubuntuuser/nomination_predictor/data/raw/nominations.csv
✓ Saved 80 nominations to cache
Saving nominees to cache file: /home/wsl2ubuntuuser/nomination_predictor/data/raw/nominees.csv
✓ Saved 80 nominees to cache


## Summary

In this notebook, we have:

1. Loaded Federal Judicial Center (FJC) data, the canonical source for judicial seats and judges
2. Built the seat timeline dataframe
3. Fetched judicial nominations from the Congress.gov API
4. Saved all datasets to the raw data directory for further processing by downstream notebooks

The next notebook (e.g. 1.##-nw-feature-engineering.ipynb) will load these datasets, clean them, and engineer features for modeling.