### 🐶 Load & Inspect Agency Mapping for D.O.G.E.

> The agency-to-regulation mapping is retrieved from the [agencies.json](https://www.ecfr.gov/developers/documentation/api/v1) endpoint of the Electronic Code of Federal Regulations (eCFR).

> While no explicit timestamp is provided in the API response, the data structure appears to reflect 2024 mappings, according to [DOGE's regulations sources](https://doge.gov/regulations).

> This mapping includes all top-level agencies (sorted by name), along with their respective child agencies, and serves as the foundation for linking CFR titles, chapters, and parts to their governing authorities.


### 🔎 Inspect API Output via CLI

> For a quick command-line inspection of the eCFR `agencies.json` response:  

> ```bash
> curl -X GET "https://www.ecfr.gov/api/admin/v1/agencies.json" -H "accept: application/json" | jq .
> ```  

> This fetches the agency mapping JSON directly and pretty-prints it using `jq`.  
> Grep for keywords, detect patterns, or understand the structure before processing it in Python.


```json
{
  "agencies": [
    {
      "name": "Department of Agriculture",
      "slug": "agriculture-department",
      "children": [
        {
          "name": "Agricultural Marketing Service",
          "slug": "agricultural-marketing-service",
          "cfr_references": [
            {
              "title": 7,
              "chapter": "I"
            },
            ...  // child cfr_references
          ]     
        },      
        ...     // siblings
      ],
      "cfr_references": [
        {
          "title": 2,
          "chapter": "IV"
        },
        {
          "title": 5,
          "chapter": "LXXIII"
        },
        ...   // parent cfr_references
      ]
    },
    ...   // agencies
  ]
}
```

#### API Structure Observations & Analysis
> The root key is `agencies`, a list of agency dictionaries.

> Each agency contains:
> - Metadata (`name`, `short_name`, `slug`, etc.)
> - A `cfr_references` list — the **critical key** used to extract regulation text based on fields like `title`, `chapter`, `part`, etc.
> - An optional `children` list — each child is structured like the parent but **does not include further children**
> 
> Nested hierarchy to process:
> `agencies` → `children` (if any) → `cfr_references`

> 🔥🔥 This structure must be unraveled to create a flat, downstream dataset for regulation text retrieval and word count analysis  


In [5]:
from datetime import datetime
import requests
import json
import os

# API endpoint for agency metadata  
url = "https://www.ecfr.gov/api/admin/v1/agencies.json"
response = requests.get(url)   # HTTP 200 = success; 404 = invalid input
#print(response)
data = response.json()
# Pretty print the JSON
#print(json.dumps(data, indent=2))

# Format today's date
today_str = datetime.today().strftime("%Y-%m-%d")

# Define archive directory and ensure it exists
archive_dir = "../archive"
os.makedirs(archive_dir, exist_ok=True)

# Define full path with date-stamped filename
filename = os.path.join(archive_dir, f"agencies_snapshot_{today_str}.json")

# Save to file
with open(filename, "w") as f:
    json.dump(data, f, indent=2)
print("Saved snapshot of agency json to {filename}")

Saved snapshot to {filename}


Next Steps:

- Data pipeline
- Ingest Data and Flatten data structure (group the children references into the parent references)
- Load into data frame

In [33]:
# Parse a json file, grab specific fields, one field has nested info to extract and group to the parent

flattened_row = []
#data = response.json()
#print(data)
#print(data['agencies'])


In [35]:
import pandas as pd

# Parse JSON
for agency in data['agencies']:
    #print(agency)
    parent_name       = agency.get('name')
    parent_short_name = agency.get('short_name')
    parent_slug       = agency.get('slug')

    # row 
    flattened_row.append({'name':parent_name, 'short_name':parent_short_name})

#for i in range(0, len(flattened_row)):
    #print(flattened_row[i])

#df = pd.json_normalize(data)
#df = pd.DataFrame(data)
#df.head(10)
# Save output to CSV

#extract short_name, display_name, slug, children, cfr_references, title, subtitle, chapter, subchapter, part, etc.


Next Step:
- Download the data from [here](https://www.govinfo.gov/bulkdata/CFR)