# Obtaining Data from data.gouv.fr API

To get real financial data for French municipalities and EPCIs:

1. Go to https://www.data.gouv.fr/fr/dataservices/explore-api-v2-50/
2. Click on "Explore API" button
3. In the interface:
   - Select dataset, for example "ofgl-base-communes-consolidee" 
   - Add filters like year, siren, etc.
   - Click "Execute" to test the query
4. Copy the generated request URL, which will look like:
   ```
   https://data.ofgl.fr/api/explore/v2.1/catalog/datasets/ofgl-base-communes-consolidee/records?where=...
   ```
5. Use this URL in code to fetch real data

The API returns detailed financial metrics through multiple datasets for frech administrative units including:
- Basic info (name, population, etc.)
- Operating revenues and expenses
- Investment data
- Debt metrics
- Various financial ratios


## Remove duplicates from the JSON file

In [5]:
import json
from collections import defaultdict

# Load the JSON file
with open('/Users/marcolorenz/SIA/ofgl-base-communes-new.json', 'r') as f:
    data = json.load(f)

# Create a dictionary to track unique entries based on siren
unique_entries = {}

# Keep only the latest entry for each siren
for entry in data:
    siren = entry['siren']
    if siren not in unique_entries:
        unique_entries[siren] = entry
    else:
        # If we find a duplicate, keep the entry with the most recent population data
        if entry.get('ptot_n') is not None:
            unique_entries[siren] = entry

# Convert back to list
deduplicated_data = list(unique_entries.values())

# Save the deduplicated data
output_filename = '/Users/marcolorenz/SIA/populations-ofgl-communes-deduplicated.json'
with open(output_filename, 'w') as f:
    json.dump(deduplicated_data, f, indent=4)

print(f"Original number of entries: {len(data)}")
print(f"Number of entries after deduplication: {len(deduplicated_data)}")
print(f"Removed {len(data) - len(deduplicated_data)} duplicate entries")
print(f"Saved deduplicated data to: {output_filename}")


Original number of entries: 37376
Number of entries after deduplication: 287
Removed 37089 duplicate entries
Saved deduplicated data to: /Users/marcolorenz/SIA/populations-ofgl-communes-deduplicated.json


## Add reference_sirens to the JSON file

In [8]:
import json
from collections import defaultdict

def load_and_process_populations(file_path):
    # Load the JSON data
    with open(file_path, 'r') as f:
        data = json.load(f)
    
    # First deduplicate entries by keeping most recent data for each commune
    unique_communes = {}
    for entry in data:
        siren = entry['siren']
        if siren not in unique_communes or entry['ptot'] > unique_communes[siren]['ptot']:
            unique_communes[siren] = entry
    
    # Convert back to list
    deduped_data = list(unique_communes.values())
    
    # Group communes by region code
    reg_communes = defaultdict(list)
    for entry in deduped_data:
        reg_code = entry['reg_code']
        reg_communes[reg_code].append(entry)
    
    # Process each entry to add reference sirens
    processed_data = []
    for entry in deduped_data:
        reg_code = entry['reg_code']
        current_ptot = entry['ptot']
        current_siren = entry['siren']
        
        # Get all communes in same region except current one
        reg_entries = [
            e for e in reg_communes[reg_code] 
            if e['siren'] != current_siren
        ]
        
        # Sort by population difference
        reg_entries.sort(key=lambda x: abs(x['ptot'] - current_ptot))
        
        # Take the 3 closest ones
        reference_sirens = [
            {
                'siren': e['siren'],
            }
            for e in reg_entries[:3]
        ]
        
        # Create new entry with reference sirens
        processed_entry = {
            'siren': entry['siren'],
            'com_code': entry['com_code'],
            'com_name': entry['com_name'],
            'epci_code': entry['epci_code'],
            'epci_name': entry['epci_name'],
            'dep_code': entry['dep_code'],
            'dep_name': entry['dep_name'],
            'reg_code': entry['reg_code'],
            'reg_name': entry['reg_name'],
            'ptot': entry['ptot'],
            'ptot_n': entry['ptot_n'],
            'reference_sirens': reference_sirens
        }
        processed_data.append(processed_entry)
    
    return processed_data

# Process the file
processed_data = load_and_process_populations('/Users/marcolorenz/SIA/ofgl-base-communes-new.json')

# Save to new file
output_path = 'populations-ofgl-communes-postprocessed.json'
with open(output_path, 'w', encoding='utf-8') as f:
    json.dump(processed_data, f, ensure_ascii=False, indent=2)

# Print example
print(f"\nProcessed {len(processed_data)} communes")
print("\nExample entry:")
print(json.dumps(processed_data[0], indent=2, ensure_ascii=False))


Processed 287 communes

Example entry:
{
  "siren": "210100533",
  "com_code": "01053",
  "com_name": "Bourg-en-Bresse",
  "epci_code": "200071751",
  "epci_name": "CA du Bassin de Bourg-en-Bresse",
  "dep_code": "01",
  "dep_name": "Ain",
  "reg_code": "84",
  "reg_name": "Auvergne-Rhône-Alpes",
  "ptot": 43363,
  "ptot_n": 43363,
  "reference_sirens": [
    {
      "siren": "216900290"
    },
    {
      "siren": "216900340"
    },
    {
      "siren": "212601983"
    }
  ]
}
