### 🐶 Load & Inspect Agency Mapping for D.O.G.E.

> The agency-to-regulation mapping is retrieved from the [agencies.json](https://www.ecfr.gov/developers/documentation/api/v1) endpoint of the Electronic Code of Federal Regulations (eCFR).

> While no explicit timestamp is provided in the API response, the data structure appears to reflect 2024 mappings, according to [DOGE's regulations sources](https://doge.gov/regulations).

> This mapping includes all top-level agencies (sorted by name), along with their respective child agencies, and serves as the foundation for linking CFR titles, chapters, and parts to their governing authorities.


### 🔎 Inspect API Output via CLI

> For a quick command-line inspection of the eCFR `agencies.json` response:  

> ```bash
> curl -X GET "https://www.ecfr.gov/api/admin/v1/agencies.json" -H "accept: application/json" | jq .
> ```  

> This fetches the agency mapping JSON directly and pretty-prints it using `jq`.  
> Grep for keywords, detect patterns, or understand the structure before processing it in Python.


```json
{
  "agencies": [
    {
      "name": "Department of Agriculture",
      "slug": "agriculture-department",
      "children": [
        {
          "name": "Agricultural Marketing Service",
          "slug": "agricultural-marketing-service",
          "cfr_references": [
            {
              "title": 7,
              "chapter": "I"
            },
            ...  // child cfr_references
          ]     
        },      
        ...     // siblings
      ],
      "cfr_references": [
        {
          "title": 2,
          "chapter": "IV"
        },
        {
          "title": 5,
          "chapter": "LXXIII"
        },
        ...   // parent cfr_references
      ]
    },
    ...   // agencies
  ]
}
```

#### API Structure Overview

> The root key is `agencies`, a list of agency dictionaries. Each agency has:
> - Metadata (`name`, `slug`, etc.)  
> - `cfr_references` → used to extract regulation text  
> - Optional `children` → same structure, no nested children

> Nested hierarchy to process:  
> `agencies` → `children` (if any) → `cfr_references`  

> 🔥🔥 Flattening `agencies` + `children` + their `cfr_references` builds the dataset for downstream analysis.


In [11]:
from doge_data_challenge.helpers.env_setup import setup_env
from doge_data_challenge.helpers.print_helpers import shorten_path

# Initialize environment config
try:
    paths, config = setup_env()
except FileNotFoundError as e:
    print(f"Error: {e}")
    print("Ensure .env file exists in the project root with required configuration")
    raise

# Access paths/config
#snapshot_date        = config["SNAPSHOT_DATE"]       # i.e. 2025-03-27
agency_metadata_path = paths["AGENCY_METADATA_PATH"]  # i.e. ~/repo/doge-data-challenge/agency_metadata/{SNAPSHOT_DATE}

# Define output file path
filename_json = agency_metadata_path / "agencies_snapshot.json"

# Print config for verification
#print(f"Snapshot Date: {snapshot_date}")
#print(f"Agency Metadata Path: {shorten_path(agency_metadata_path)}")
print(f"Agency Metadata Filename: {shorten_path(filename_json)}")

# Debug: Print sys.path 
#print("\nSystem Paths:")
#for p in sys.path:
#    print(shorten_path(p))


Agency Metadata Filename: ~/repo/doge-data-challenge/agency_metadata/2025-03-27/agencies_snapshot.json


In [12]:
import os
import sys
import json
import requests
from pathlib import Path
from datetime import datetime


# API endpoint for agency metadata
url = "https://www.ecfr.gov/api/admin/v1/agencies.json"


# Fetch agency metadata
try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()  # Raises HTTPError for 4xx/5xx status codes
    data = response.json()
except requests.exceptions.HTTPError as e:
    print(f"HTTP Error fetching agency metadata: {e}\n")
    raise
except requests.exceptions.ConnectionError as e:
    print(f"Connection Error fetching agency metadata: {e}\n")
    raise
except requests.exceptions.Timeout as e:
    print(f"Timeout Error fetching agency metadata: {e}\n")
    raise
except requests.exceptions.RequestException as e:
    print(f"Error fetching agency metadata: {e}\n")
    raise


# Save JSON to file
try:
    with open(filename_json, "w") as f:
        json.dump(data, f, indent=2)
    print(f"Saved snapshot of agency JSON to {shorten_path(filename_json)}\n")
except OSError as e:
    print(f"Error saving JSON to {shorten_path(filename_json)}: {e}\n")
    raise

# Optional: Print JSON for debugging (uncomment if needed)
print(json.dumps(data, indent=2))

Saved snapshot of agency JSON to ~/repo/doge-data-challenge/agency_metadata/2025-03-27/agencies_snapshot.json

{
  "agencies": [
    {
      "name": "Administrative Conference of the United States",
      "short_name": "ACUS",
      "display_name": "Administrative Conference of the United States",
      "sortable_name": "Administrative Conference of the United States",
      "slug": "administrative-conference-of-the-united-states",
      "children": [],
      "cfr_references": [
        {
          "title": 1,
          "chapter": "III"
        }
      ]
    },
    {
      "name": "Advisory Council on Historic Preservation",
      "short_name": "ACHP",
      "display_name": "Advisory Council on Historic Preservation",
      "sortable_name": "Advisory Council on Historic Preservation",
      "slug": "advisory-council-on-historic-preservation",
      "children": [],
      "cfr_references": [
        {
          "title": 36,
          "chapter": "VIII"
        }
      ]
    },
    {
     

In [13]:
flattened_rows = []
#print(data)
#print(data['agencies'])

for agency in data['agencies']:
    parent_name = agency.get('name') # key:value
    #print("parent_name=", parent_name)
    short_name  = agency.get('short_name')
    slug_name   = agency.get('slug')
    children    = agency.get('children')

    # Try to get 'cfr_references' and 'children' from the agency dictionary
    # If it's missing or None, assign an empty list to avoid iteration errors
    parent_cfr_refs = agency.get('cfr_references', [])
    children        = agency.get('children', [])

    # Loop over parent CFR refs, safe to iterate because it's a guaranteed list
    for ref in parent_cfr_refs:
        #print("..cfr_references title=", ref.get('title'), " chapter=", ref.get('chapter'))
        flattened_rows.append({"name": parent_name, 
                              "short_name": short_name, 
                              "slug": slug_name,
                              "title": ref.get('title'),
                              "subtitle": ref.get('subtitle'),
                              "chapter": ref.get('chapter'),
                              "subchapter": ref.get('subchapter'),
                              "part": ref.get("part")
                             })

    # Loop over children CFR refs
    for child in children:
        # child_name = child.get('name')
        for ref in child.get('cfr_references', []):
            flattened_rows.append({"name": parent_name, 
                                  "short_name": short_name, 
                                  "slug": slug_name,
                                  "title": ref.get('title'),
                                  "subtitle": ref.get('subtitle'),
                                  "chapter": ref.get('chapter'),
                                  "subchapter": ref.get('subchapter'),
                                  "part": ref.get("part")
                                 })


In [14]:
import pandas as pd


agencies_df = pd.DataFrame(flattened_rows)

# Preview the result
print(f"Total CFR references across all agencies and children: {len(agencies_df)}")

filename_csv = agency_metadata_path / "agencies_snapshot.csv"

# Save data frame to a csv file
agencies_df.to_csv(filename_csv, index=False)
print("Saved data frame to", shorten_path(filename_csv))

Total CFR references across all agencies and children: 487
Saved data frame to ~/repo/doge-data-challenge/agency_metadata/2025-03-27/agencies_snapshot.csv


In [15]:
agencies_df.head(len(agencies_df))

Unnamed: 0,name,short_name,slug,title,subtitle,chapter,subchapter,part
0,Administrative Conference of the United States,ACUS,administrative-conference-of-the-united-states,1,,III,,
1,Advisory Council on Historic Preservation,ACHP,advisory-council-on-historic-preservation,36,,VIII,,
2,Special Inspector General for Afghanistan Reco...,SIGAR,special-inspector-general-for-afghanistan-reco...,5,,LXXXIII,,
3,African Development Foundation,USADF,african-development-foundation,22,,XV,,
4,African Development Foundation,USADF,african-development-foundation,48,,57,,
...,...,...,...,...,...,...,...,...
482,Department of Veterans Affairs,VA,veterans-affairs-department,38,,I,,
483,Department of Veterans Affairs,VA,veterans-affairs-department,48,,8,,
484,Office of Vice President of the United States,,office-of-vice-president-of-the-united-states,32,,XXVIII,,
485,Water Resources Council,,water-resources-council,18,,VI,,


In [7]:
import sys
print(shorten_path(sys.executable))

~/Library/Caches/pypoetry/virtualenvs/doge-data-challenge-t_Z9FBnC-py3.10/bin/python
