Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/python-validator-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ jobs:
with:
python-version: '3.11'

- name: Install dev dependencies
run: pip install -r requirements-dev.txt

- name: Run validator unit tests (PowerShell)
shell: pwsh
run: powershell -NoProfile -ExecutionPolicy Bypass -File "tests/run_python_tests.ps1"
4 changes: 3 additions & 1 deletion ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,15 @@ The three layers can break independently, so we keep them physically separate. T
|------|-----------|
| `build/te_core_schema.sql` | PostgreSQL master schema (legacy entry point) |
| `build/te_seed_data.sql` | Seed data |
| `build/csv/` | Python CSV validator (`validator.py`) + per-engine shell loaders (`loader_*.sh`) |
| `build/csv/` | Python CSV validator (`validator.py`), per-engine shell loaders (`loader_*.sh`), and `samples/` |
| `build/adapters/` | Per-engine deployment adapters (`adapter_postgresql.sh`, `adapter_mariadb.sh`, etc.) |
| `build/schema/` | Engine-specific DDL and seed data |
| `build/environments/` | PostgreSQL per-environment launchers (`env_dev.sql`, `env_test.sql`, etc.) |
| `build/terraform-github-repos/` | GitHub repository management as Infrastructure-as-Code |
| `build/setup.sh` | Interactive multi-database configuration wizard |
| `build/deploy_all.sh` | Multi-engine deployment router |
| `build/csv_loader.sh` | Schema-agnostic CSV ingestion: any CSV → auto-created table |
| `build/csv_utilise.sh` | Companion to the loader: list / describe / peek / export / drop CSV-loaded tables (PostgreSQL) |

### `tests/` — correctness coverage

Expand Down
27 changes: 26 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.PHONY: test-free test-gate test-evals test-e2e test-all \
lint lint-diff health \
eval-list eval-compare eval-summary select-tests
eval-list eval-compare eval-summary select-tests \
csv-load csv-list csv-demo

# ── Test tiers ────────────────────────────────────────────────────────────────

Expand Down Expand Up @@ -69,3 +70,27 @@ eval-summary:
# Mirrors: bun run eval:select
select-tests:
python3 scripts/select_tests.py

# ── CSV loader / utiliser ─────────────────────────────────────────────────────

# Load any CSV file into the target environment's database.
# Usage: make csv-load FILE=path/to.csv [ENV=dev] [ENGINE=postgresql]
csv-load:
@if [ -z "$(FILE)" ]; then \
echo "Usage: make csv-load FILE=path/to.csv [ENV=dev] [ENGINE=postgresql]"; \
exit 1; \
fi
bash build/csv_loader.sh "$(FILE)" --env $(or $(ENV),dev) $(if $(ENGINE),--engine $(ENGINE),)

# List CSV-loaded tables in the target environment.
# Usage: make csv-list [ENV=dev]
csv-list:
bash build/csv_utilise.sh list --env $(or $(ENV),dev)

# One-shot proof: load the three sample CSVs into dev, then list them.
# Usage: make csv-demo [ENV=dev]
csv-demo:
bash build/csv_loader.sh build/csv/samples/customers.csv --env $(or $(ENV),dev)
bash build/csv_loader.sh build/csv/samples/orders.csv --env $(or $(ENV),dev)
bash build/csv_loader.sh build/csv/samples/inventory.csv --env $(or $(ENV),dev)
bash build/csv_utilise.sh list --env $(or $(ENV),dev)
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,30 @@ CSV inputs must have a header row, use comma delimiters, and be UTF-8 encoded wi

Supported loader backends are PostgreSQL, MariaDB/MySQL, SQLite, InfluxDB, Redis, and Teradata. PostgreSQL uses `COPY`, MariaDB/MySQL uses `LOAD DATA LOCAL INFILE`, SQLite uses Python `csv` + `sqlite3`, InfluxDB writes line protocol via the `influx` CLI, Redis writes hashes through `redis-cli`, and Teradata uses BTEQ/FastLoad tooling.

### Load any CSV

The loader is schema-agnostic — drop any CSV file in front of it and a matching table is auto-created in the target environment's database. Every CSV-loaded table is tagged with two marker columns: `_csv_row_id BIGSERIAL PRIMARY KEY` and `_loaded_at TIMESTAMPTZ`. All other columns start as `TEXT`; `ALTER TABLE` afterwards if you need stricter types.

Three sample CSVs ship under `build/csv/samples/` (`customers.csv`, `orders.csv`, `inventory.csv`) — deliberately off-domain from the T&E schema to demonstrate that any shape is accepted.

```bash
# Single-command happy-path proof (loads all three samples into dev, lists them)
make csv-demo

# Load any CSV
make csv-load FILE=path/to/anything.csv # ENV defaults to dev
make csv-load FILE=path/to/anything.csv ENV=test ENGINE=postgresql

# Use loaded data — companion script: build/csv_utilise.sh (PostgreSQL only)
./build/csv_utilise.sh list # all CSV-loaded tables in the env
./build/csv_utilise.sh describe customers # columns + row count
./build/csv_utilise.sh peek orders --limit 5 # first N rows
./build/csv_utilise.sh export inventory dump.csv # round-trip back to CSV
./build/csv_utilise.sh drop customers --yes # remove a CSV-loaded table
```

`csv_utilise.sh` only sees tables that carry the marker columns, so it cannot accidentally touch the rigid te_core_schema tables.

---

## How Parameterisation Works
Expand Down
6 changes: 6 additions & 0 deletions build/csv/samples/customers.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
id,name,email,signup_date
1,Alice Nguyen,alice@example.com,2024-01-15
2,Brian O'Connor,brian@example.com,2024-02-03
3,Chen Wei,chen.wei@example.com,2024-03-22
4,Diana Patel,diana.patel@example.com,2024-05-10
5,Eduardo Silva,eduardo@example.com,2024-07-01
7 changes: 7 additions & 0 deletions build/csv/samples/inventory.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
sku,description,stock,location
SKU-001,Wireless headphones – black,42,Warehouse A
SKU-002,USB-C charger,128,Warehouse B
SKU-003,Notebook A5 hardcover,75,Warehouse A
SKU-004,Mechanical keyboard,18,Warehouse C
SKU-005,Mouse pad – large,200,Warehouse B
SKU-006,Webcam 1080p,33,Warehouse A
9 changes: 9 additions & 0 deletions build/csv/samples/orders.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
order_id,customer_id,product,qty,price
1001,1,"Wireless headphones, black",1,89.95
1002,2,USB-C charger,2,19.50
1003,1,"Notebook, A5 hardcover",3,12.00
1004,3,Mechanical keyboard,1,145.00
1005,4,"Mouse pad, large",1,24.99
1006,2,Webcam 1080p,1,59.00
1007,5,"Cable organiser, 6-pack",1,15.75
1008,3,Desk lamp,1,42.50
259 changes: 259 additions & 0 deletions build/csv_utilise.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
#!/usr/bin/env bash
# =============================================================================
# csv_utilise.sh — Utilise CSV-loaded tables
# =============================================================================
# Companion to csv_loader.sh. Lists, describes, peeks at, exports, or drops
# tables that were created by the CSV loader. CSV-loaded tables are
# identified by the marker columns the loader always adds:
# _csv_row_id BIGSERIAL PRIMARY KEY
# _loaded_at TIMESTAMPTZ
#
# Usage:
# ./csv_utilise.sh list [--env ENV] [--engine ENG]
# ./csv_utilise.sh describe <table> [--env ENV] [--engine ENG]
# ./csv_utilise.sh peek <table> [--limit N] [--env ENV] [--engine ENG]
# ./csv_utilise.sh export <table> <out.csv> [--env ENV] [--engine ENG]
# ./csv_utilise.sh drop <table> --yes [--env ENV] [--engine ENG]
#
# ENG defaults to value from config (or postgresql).
# ENV defaults to dev.
#
# Only the postgresql engine is implemented in this script. Other engines
# return a clear "not implemented" message and exit 2.
# =============================================================================

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CONFIG_LOCAL="${SCRIPT_DIR}/config.local.env"
CONFIG_DEFAULT="${SCRIPT_DIR}/config.env"

GREEN=$'\033[0;32m'; RED=$'\033[0;31m'; YELLOW=$'\033[1;33m'
CYAN=$'\033[0;36m'; BOLD=$'\033[1m'; NC=$'\033[0m'

log() { echo -e "${GREEN}[✓]${NC} $*"; }
warn() { echo -e "${YELLOW}[⚠]${NC} $*"; }
error() { echo -e "${RED}[✗]${NC} $*" >&2; }
info() { echo -e "${CYAN}[i]${NC} $*"; }

usage() {
cat <<EOF

${BOLD}Usage:${NC}
./csv_utilise.sh <command> [args] [--env ENV] [--engine ENG]

${BOLD}Commands:${NC}
list List CSV-loaded tables in the target schema
describe <table> Show columns and row count for a table
peek <table> [--limit N] Show first N rows (default 10)
export <table> <out.csv> Export the table back to CSV
drop <table> --yes Drop a CSV-loaded table (requires --yes)

${BOLD}Options:${NC}
--env <env> dev | test | staging | prod (default: dev)
--engine <engine> postgresql (default: from config)
--limit <N> row limit for 'peek' (default: 10)
--yes confirmation flag for 'drop'
--help, -h show this message

${BOLD}Examples:${NC}
./csv_utilise.sh list --env dev
./csv_utilise.sh describe customers
./csv_utilise.sh peek orders --limit 5
./csv_utilise.sh export inventory /tmp/inventory_dump.csv
./csv_utilise.sh drop orders --yes --env test

EOF
}

# ── Parse arguments ───────────────────────────────────────────────────────────
COMMAND=""
TABLE=""
OUT_FILE=""
TARGET_ENV="dev"
ENGINE_OVERRIDE=""
PEEK_LIMIT="10"
CONFIRM_DROP="false"

if [[ $# -eq 0 ]]; then usage; exit 1; fi

case "$1" in
--help|-h) usage; exit 0 ;;
list|describe|peek|export|drop) COMMAND="$1"; shift ;;
*) error "Unknown command: $1"; usage; exit 1 ;;
esac

# Positional args for some commands
case "$COMMAND" in
describe|peek|drop)
if [[ $# -eq 0 || "${1:0:2}" == "--" ]]; then
error "Command '${COMMAND}' requires a <table> argument."; usage; exit 1
fi
TABLE="$1"; shift
;;
export)
if [[ $# -lt 2 || "${1:0:2}" == "--" || "${2:0:2}" == "--" ]]; then
error "Command 'export' requires <table> and <out.csv> arguments."; usage; exit 1
fi
TABLE="$1"; OUT_FILE="$2"; shift 2
;;
esac

while [[ $# -gt 0 ]]; do
case "$1" in
--env) shift; TARGET_ENV="${1:-}" ;;
--engine) shift; ENGINE_OVERRIDE="${1:-}" ;;
--limit) shift; PEEK_LIMIT="${1:-10}" ;;
--yes) CONFIRM_DROP="true" ;;
--help|-h) usage; exit 0 ;;
*) error "Unknown argument: $1"; usage; exit 1 ;;
esac
shift
done

# ── Sanitise table identifier (before any side effects) ──────────────────────
if [[ -n "$TABLE" ]]; then
if [[ ! "$TABLE" =~ ^[a-zA-Z_][a-zA-Z0-9_]*$ ]]; then
error "Invalid table name: '${TABLE}'. Allowed: letters, digits, underscore; must not start with a digit."
exit 1
fi
fi

# ── Early engine check (avoids loading config for unsupported engines) ───────
if [[ -n "$ENGINE_OVERRIDE" && "$ENGINE_OVERRIDE" != "postgresql" ]]; then
error "csv_utilise.sh: engine '${ENGINE_OVERRIDE}' is not implemented."
error "Only 'postgresql' is supported. Run with --engine postgresql."
exit 2
fi

# ── Load configuration ────────────────────────────────────────────────────────
if [[ -f "$CONFIG_LOCAL" ]]; then
source "$CONFIG_LOCAL"
elif [[ -f "$CONFIG_DEFAULT" ]]; then
source "$CONFIG_DEFAULT"
warn "config.local.env not found — using defaults. Run ./setup.sh to configure."
else
error "No config found. Run ./setup.sh first."
exit 1
fi

DB_ENGINE="${ENGINE_OVERRIDE:-${DB_ENGINE:-postgresql}}"

if [[ "$DB_ENGINE" != "postgresql" ]]; then
error "csv_utilise.sh: engine '${DB_ENGINE}' is not implemented."
error "Only 'postgresql' is supported. Run with --engine postgresql."
exit 2
fi

# ── Resolve PostgreSQL connection details ────────────────────────────────────
E="${TARGET_ENV^^}"
PG_HOST="${PGHOST:-${PG_HOST:-localhost}}"
PG_PORT="${PGPORT:-${PG_PORT:-5432}}"
PG_USER="${PGUSER:-${PG_SUPERUSER:-postgres}}"
DB_NAME="$(eval echo "\$PG_DB_${E}")"
SCHEMA="$(eval echo "\$PG_SCHEMA_${E}")"

if [[ -z "$DB_NAME" || -z "$SCHEMA" ]]; then
error "Could not resolve database / schema for env '${TARGET_ENV}'."
error "Check that PG_DB_${E} and PG_SCHEMA_${E} are set in config.local.env."
exit 1
fi

[[ -n "${PG_SUPERUSER_PASSWORD:-}" ]] && export PGPASSWORD="${PG_SUPERUSER_PASSWORD}"

PSQL=(psql -h "${PG_HOST}" -p "${PG_PORT}" -U "${PG_USER}" -d "${DB_NAME}" -v ON_ERROR_STOP=1)

# ── Reachability probe (gives a friendly error when DB is down) ──────────────
if ! "${PSQL[@]}" -tA -c "SELECT 1" >/dev/null 2>&1; then
error "Cannot reach PostgreSQL at ${PG_HOST}:${PG_PORT} as ${PG_USER}/${DB_NAME}."
error "Check the database is running and config.local.env credentials are correct."
exit 3
fi

# ── Helper: assert table is a CSV-loaded table ───────────────────────────────
assert_csv_table() {
local tbl="$1"
local cnt
cnt=$("${PSQL[@]}" -tA -c "
SELECT COUNT(*)
FROM information_schema.columns
WHERE table_schema = '${SCHEMA}'
AND table_name = '${tbl}'
AND column_name IN ('_csv_row_id','_loaded_at');
")
if [[ "$cnt" != "2" ]]; then
error "Table '${SCHEMA}.${tbl}' is not a CSV-loaded table (missing marker columns)."
error "Use the original loader to create it: ./csv_loader.sh <file>.csv"
exit 1
fi
}

# ── Commands ──────────────────────────────────────────────────────────────────
case "$COMMAND" in

list)
info "CSV-loaded tables in ${DB_NAME}.${SCHEMA}:"
"${PSQL[@]}" -P pager=off -c "
SELECT t.table_name AS table,
pg_size_pretty(pg_total_relation_size(format('%I.%I', t.table_schema, t.table_name)::regclass)) AS size
FROM information_schema.tables t
WHERE t.table_schema = '${SCHEMA}'
AND EXISTS (SELECT 1 FROM information_schema.columns c
WHERE c.table_schema = t.table_schema
AND c.table_name = t.table_name
AND c.column_name = '_csv_row_id')
AND EXISTS (SELECT 1 FROM information_schema.columns c
WHERE c.table_schema = t.table_schema
AND c.table_name = t.table_name
AND c.column_name = '_loaded_at')
ORDER BY t.table_name;
"
;;

describe)
assert_csv_table "$TABLE"
info "Columns of ${SCHEMA}.${TABLE}:"
"${PSQL[@]}" -P pager=off -c "
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = '${SCHEMA}' AND table_name = '${TABLE}'
ORDER BY ordinal_position;
"
ROW_COUNT=$("${PSQL[@]}" -tA -c "SELECT COUNT(*) FROM \"${SCHEMA}\".\"${TABLE}\";")
log "Row count: ${ROW_COUNT}"
;;

peek)
assert_csv_table "$TABLE"
if [[ ! "$PEEK_LIMIT" =~ ^[0-9]+$ ]]; then
error "--limit must be a positive integer (got: '${PEEK_LIMIT}')."
exit 1
fi
info "First ${PEEK_LIMIT} row(s) of ${SCHEMA}.${TABLE}:"
"${PSQL[@]}" -P pager=off -c "SELECT * FROM \"${SCHEMA}\".\"${TABLE}\" ORDER BY _csv_row_id LIMIT ${PEEK_LIMIT};"
;;

export)
assert_csv_table "$TABLE"
# Validate output path is writable
OUT_DIR="$(dirname "$OUT_FILE")"
mkdir -p "$OUT_DIR"
info "Exporting ${SCHEMA}.${TABLE} to ${OUT_FILE}..."
"${PSQL[@]}" -c "\\COPY (SELECT * FROM \"${SCHEMA}\".\"${TABLE}\" ORDER BY _csv_row_id) TO '${OUT_FILE}' WITH (FORMAT CSV, HEADER TRUE)"
log "Export complete: ${OUT_FILE}"
;;

drop)
assert_csv_table "$TABLE"
if [[ "$CONFIRM_DROP" != "true" ]]; then
error "Refusing to drop '${SCHEMA}.${TABLE}' without --yes."
exit 1
fi
warn "Dropping ${SCHEMA}.${TABLE}..."
"${PSQL[@]}" -c "DROP TABLE \"${SCHEMA}\".\"${TABLE}\";"
log "Dropped ${SCHEMA}.${TABLE}"
;;
esac

unset PGPASSWORD
exit 0
5 changes: 3 additions & 2 deletions scripts/test.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,9 @@ try {
Record -Layer 'pytest unit' -Pass $pass -Detail "exit=$LASTEXITCODE"
Write-Host "[layer 1] $(if ($pass) {'PASS'} else {'FAIL'})" -ForegroundColor $(if ($pass) {'Green'} else {'Red'})
} else {
Write-Host "[layer 1] SKIP: pytest not installed (pip install pytest)" -ForegroundColor Yellow
Record -Layer 'pytest unit' -Pass $true -Detail 'skipped: pytest not installed'
Write-Host "[layer 1] FAIL: pytest not installed" -ForegroundColor Red
Write-Host " Run: pip install -r requirements-dev.txt" -ForegroundColor Yellow
Record -Layer 'pytest unit' -Pass $false -Detail 'pytest missing — pip install -r requirements-dev.txt'
}

# --- Layer 2: SQL test suite ---
Expand Down
Loading
Loading