## Provider Organisation Info Loader

This script processes NHS provider location data from a CSV file and populates the `org_info` table in the PostgreSQL database.

### 🔧 How It Works:
- **User Input**: Configure the `FILE_PATH` to the CSV and set the `GEOCODE_SOURCE` label for traceability.
- **Preprocessing**: 
  - Renames relevant columns to standard schema names.
  - Filters and selects only necessary columns.
  - Adds metadata such as `org_type` (fixed as 'Provider') and geocode source.
- **Data Formatting**: Converts date fields to proper datetime format and ensures numeric types for coordinates.
- **Database Load**: Inserts all records into the `org_info` table within a transaction to avoid partial writes.

This script is essential for maintaining a clean and structured view of provider organisations, including their geographic and administrative metadata.


In [None]:
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.exc import SQLAlchemyError

In [None]:
# === INPUT ===
FILE_PATH = '../data/providers/trust_and_foundation_locations.csv'
GEOCODE_SOURCE = 'trust_and_foundation_locations.csv'

# === DB CONNECTION ===
engine = create_engine("postgresql://postgres:<password>@localhost:5432/nhs_dashboard")

# === LOAD DATA ===
df = pd.read_csv(FILE_PATH, dtype=str)

# === SELECT + RENAME COLUMNS ===
df = df.rename(columns={
    'Organisation Code': 'org_code',
    'Name': 'org_name',
    'National Grouping': 'region_code',
    'High Level Health Geography': 'health_geography_code',
    'Postcode': 'postcode',
    'Open Date': 'open_date',
    'Close Date': 'close_date',
    'Latitude': 'latitude',
    'Longitude': 'longitude'
})

# === FORMAT + CLEAN ===
df = df[[
    'org_code', 'org_name', 'region_code', 'health_geography_code',
    'postcode', 'open_date', 'close_date', 'latitude', 'longitude'
]]
df['org_type'] = 'Provider'
df['geocode_source'] = GEOCODE_SOURCE

# === CONVERT DATES + NUMERIC TYPES ===
df['open_date'] = pd.to_datetime(df['open_date'], errors='coerce', format='%Y%m%d')
df['close_date'] = pd.to_datetime(df['close_date'], errors='coerce', format='%Y%m%d')
df['latitude'] = pd.to_numeric(df['latitude'], errors='coerce')
df['longitude'] = pd.to_numeric(df['longitude'], errors='coerce')

# === LOAD TO DB ===
try:
    with engine.begin() as conn:
        df.to_sql('org_info', conn, if_exists='append', index=False)
    print(f"Loaded {len(df)} provider records into org_info")
except SQLAlchemyError as e:
    print("ERROR loading data into org_info")
    print(str(e))
