### 🐶 Load & Inspect Agency Mapping for D.O.G.E.

> The agency-to-regulation mapping is retrieved from the [agencies.json](https://www.ecfr.gov/developers/documentation/api/v1) endpoint of the Electronic Code of Federal Regulations (eCFR).

> While no explicit timestamp is provided in the API response, the data structure appears to reflect 2024 mappings, according to [DOGE's regulations sources](https://doge.gov/regulations).

> This mapping includes all top-level agencies (sorted by name), along with their respective child agencies, and serves as the foundation for linking CFR titles, chapters, and parts to their governing authorities.


### 🔎 Inspect API Output via CLI

> For a quick command-line inspection of the eCFR `agencies.json` response:  

> ```bash
> curl -X GET "https://www.ecfr.gov/api/admin/v1/agencies.json" -H "accept: application/json" | jq .
> ```  

> This fetches the agency mapping JSON directly and pretty-prints it using `jq`.  
> Grep for keywords, detect patterns, or understand the structure before processing it in Python.


```json
{
  "agencies": [
    {
      "name": "Department of Agriculture",
      "slug": "agriculture-department",
      "children": [
        {
          "name": "Agricultural Marketing Service",
          "slug": "agricultural-marketing-service",
          "cfr_references": [
            {
              "title": 7,
              "chapter": "I"
            },
            ...  // child cfr_references
          ]     
        },      
        ...     // siblings
      ],
      "cfr_references": [
        {
          "title": 2,
          "chapter": "IV"
        },
        {
          "title": 5,
          "chapter": "LXXIII"
        },
        ...   // parent cfr_references
      ]
    },
    ...   // agencies
  ]
}
```

#### API Structure Overview

> The root key is `agencies`, a list of agency dictionaries. Each agency has:
> - Metadata (`name`, `slug`, etc.)  
> - `cfr_references` → used to extract regulation text  
> - Optional `children` → same structure, no nested children

> Nested hierarchy to process:  
> `agencies` → `children` (if any) → `cfr_references`  

> 🔥🔥 Flattening `agencies` + `children` + their `cfr_references` builds the dataset for downstream analysis.


In [14]:
from datetime import datetime
import requests
import json
import os

# API endpoint for agency metadata  
url = "https://www.ecfr.gov/api/admin/v1/agencies.json"
response = requests.get(url)   # HTTP 200 = success; 404 = invalid input
#print(response)
data = response.json()
# Pretty print the JSON
#print(json.dumps(data, indent=2))

# Format today's date
today_str = datetime.today().strftime("%Y-%m-%d")

# Define archive directory and ensure it exists
archive_dir = "../archive"
os.makedirs(archive_dir, exist_ok=True)

# Define full path with date-stamped filename
filename = os.path.join(archive_dir, f"agencies_snapshot_{today_str}.json")

# Save to file
#with open(filename, "w") as f:
#    json.dump(data, f, indent=2)
print(f"Saved snapshot of agency json to {filename}")

Saved snapshot of agency json to ../archive/agencies_snapshot_2025-04-05.json


Next Steps:

- Data pipeline
- Ingest Data and Flatten data structure (group the children references into the parent references)
- Load into data frame

In [39]:
flattened_rows = []
#print(data)
#print(data['agencies'])

for agency in data['agencies']:
    parent_name = agency.get('name') # key:value
    #print("parent_name=", parent_name)
    short_name  = agency.get('short_name')
    slug_name   = agency.get('slug')
    children    = agency.get('children')

    # Try to get 'cfr_references' and 'children' from the agency dictionary
    # If it's missing or None, assign an empty list to avoid iteration errors
    parent_cfr_refs = agency.get('cfr_references', [])
    children        = agency.get('children', [])

    # Loop over parent CFR refs, safe to iterate because it's a guaranteed list
    for ref in parent_cfr_refs:
        #print("..cfr_references title=", ref.get('title'), " chapter=", ref.get('chapter'))
        flattened_rows.append({"name": parent_name, 
                              "short_name": short_name, 
                              "slug": slug_name,
                              "title": ref.get('title'),
                              "subtitle": ref.get('subtitle'),
                              "chapter": ref.get('chapter'),
                              "subchapter": ref.get('subchapter'),
                              "part": ref.get("part")
                             })

    # Loop over children CFR refs
    for child in children:
        # child_name = child.get('name')
        for ref in child.get('cfr_references', []):
            flattened_rows.append({"name": parent_name, 
                                  "short_name": short_name, 
                                  "slug": slug_name,
                                  "title": ref.get('title'),
                                  "subtitle": ref.get('subtitle'),
                                  "chapter": ref.get('chapter'),
                                  "subchapter": ref.get('subchapter'),
                                  "part": ref.get("part")
                                 })


In [44]:
import pandas as pd

agencies_df = pd.DataFrame(flattened_rows)

# Preview the result
#agencies_df.head(len(agencies_df))
#print(f"Total CFR references across all agencies and children: {len(agencies_df)}")

# Format today's date
today_str = datetime.today().strftime("%Y-%m-%d")

# Define archive directory and ensure it exists
archive_dir = "../archive"
os.makedirs(archive_dir, exist_ok=True)

# Define full path with date-stamped filename
filename = os.path.join(archive_dir, f"flattened_agencies_list_{today_str}.csv")

# Save data frame to a csv file
agencies_df.to_csv(filename, index=False)
print(f"Saved data frame to {filename}")


Saved data frame to ../archive/flattened_agencies_list_2025-04-06.csv
