CSV IMPORTER

Broker-driven CSV import system for customer data. The local stack now implements the current end-to-end runtime plus API-driven seed tooling, Postgres-backed inspection CLIs, compose smoke coverage, and focused workflow verification so local work can start from real, inspectable state instead of reverse-engineering.

Getting Started

For the current local setup, you only need Docker with Docker Compose.

cp .env.example .env
npm run compose:up

.env.example is the local config contract. Copy it to .env, keep .env out of git, and treat it as the only local secret file for this repo.

If you want the local log navigation stack as well:

npm run compose:up:all

That profile keeps the default application stack unchanged and adds Grafana, Loki, and Alloy for local log exploration.

To stop the stack:

npm run compose:down

Additional Compose shortcuts:

npm run compose:up: start the non-observability stack
npm run compose:down: stop and remove the non-observability stack
npm run compose:up:all: start the full stack including observability
npm run compose:down:all: stop and remove the full stack including observability
npm run compose:up:observability: start only loki, grafana, and alloy
npm run compose:down:observability: stop and remove only loki, grafana, and alloy

Useful local endpoints:

API service: http://localhost:3000 by default
Grafana: http://localhost:3004 by default when the observability profile is enabled
PostgreSQL: postgresql://csv_importer:csv_importer@localhost:5432/csv_importer by default
Structurizr UI: http://localhost:8080 by default
RabbitMQ management: http://localhost:15672 by default
RabbitMQ broker: amqp://localhost:5672 by default
Dashboard: http://localhost:3005 by default

RabbitMQ uses the credentials from .env. The service containers now stay up behind local /health readiness endpoints that verify the dependencies each service actually owns: import-service, parser-service, and customer-service verify authenticated Postgres and RabbitMQ connectivity, while api-service verifies that import-service is reachable over its internal HTTP boundary. On stack startup, a one-shot flyway container applies the shared Postgres schema from db/migrations before the application services boot.

Useful database commands:

npm run db:migrate
npm run db:validate
npm run db:info

Useful developer-tooling commands:

npm run seed:import
npm run inspect:jobs -- --limit 10
npm run inspect:recovery -- --job-id <job-id>
npm run inspect:staged-rows -- --job-id <job-id>
npm run inspect:outcomes -- --job-id <job-id>
npm run recover:job -- --job-id <job-id>
npm run test:contracts
npm run test:smoke
npm run test:devtools
npm run test:e2e
npm run test:integration

For npm run test:e2e, set NO_COLOR=1 to disable ANSI colors or E2E_VERBOSE_COMPOSE=1 to stream unfiltered docker compose up output.

For the workflow-oriented script guide, see scripts/workflow/README.md.

Local Configuration

The local stack uses one root .env file shared by Docker Compose and the service containers.

Config is grouped into:

APP_ENV and IMPORT_STORAGE_ROOT
root Postgres host, port, database, user, and password for Flyway plus local admin tooling
service-owned Postgres connection variables for import-service, parser-service, and customer-service
POSTGRES_HOST_PORT, API_SERVICE_HOST_PORT, RABBITMQ_HOST_PORT, RABBITMQ_MANAGEMENT_HOST_PORT, and STRUCTURIZR_PORT for host-published local ports
RabbitMQ host, ports, user, and password
BROKER_RETRY_DELAYS_MS for the shared delayed retry budget and queue TTLs
IMPORT_SERVICE_BASE_URL for the internal api-service to import-service boundary
per-service *_SERVICE_NAME, *_SERVICE_LOG_LEVEL, and *_SERVICE_PORT
GRAFANA_PORT, GRAFANA_ADMIN_USER, and GRAFANA_ADMIN_PASSWORD for the optional local Grafana UI
PARSER_CONSUMER_ENABLED and CUSTOMER_CONSUMER_ENABLED to control which background consumers run locally

The four application services default to: api-service 3000, import-service 3001, parser-service 3002, and customer-service 3003.

import-service, parser-service, and customer-service derive their Postgres and RabbitMQ connection URLs from those primitive variables at startup so the connection settings cannot drift. api-service derives its internal base URL for import-service the same way. Startup plus /health checks validate those dependencies with real authenticated handshakes rather than only checking open ports.

The shared upload storage root defaults to /data/imports and is backed by a Docker volume in local development.

For a quick import through the current local workflow, use the seed helper:

npm run seed:import

That command uploads csv/happy-path.csv through the real API boundary, waits for terminal state, and prints the accepted job plus terminal status and summary.

Other supported fixtures:

npm run seed:import -- --fixture partial-failure
npm run seed:import -- --fixture parse-failure
npm run seed:import -- --fixture single-row

To inspect what the workflow persisted for one import job:

npm run inspect:jobs -- --job-id <job-id>
npm run inspect:recovery -- --job-id <job-id>
npm run inspect:staged-rows -- --job-id <job-id>
npm run inspect:outcomes -- --job-id <job-id>

If a job is in dead_lettered, the operator recovery helpers now let you inspect the active dead letters and trigger one job-scoped replay:

npm run inspect:recovery -- --job-id <job-id>
npm run recover:job -- --job-id <job-id>

The compatibility upload helper still exists if you want to post an arbitrary CSV directly:

npm run import:post -- ./path/to/file.csv

If a required variable is missing or malformed, the affected service exits on startup with a clear validation error instead of running with partial configuration.

If you only want to run the architecture workspace:

docker compose up structurizr

Optional Log Navigation

The repo includes an opt-in local observability profile built on:

Grafana Alloy for Docker log collection
Loki for log storage and queries
Grafana for Explore and Logs Drilldown

This profile only ingests logs from the four application services in version 1:

api-service
import-service
parser-service
customer-service

Infrastructure containers such as postgres, rabbitmq, flyway, and structurizr stay out of Loki in this first iteration so the log views stay focused on import workflow debugging.

Typical local workflow:

start the stack with docker compose --profile observability up --build
or use npm run compose:up:all for the full stack and npm run compose:up:observability for the observability services only
open Grafana at http://localhost:${GRAFANA_PORT} or http://localhost:3004
use Explore or Logs Drilldown with the Loki data source
filter first by the service, level, and event labels
inspect correlation_id and import_job_id from structured metadata or the raw JSON log body

The application services still emit the same structured JSON logs to stdout. Alloy ships those logs to Loki without changing the application log schema.

For the detailed capture path and the workflow between Alloy, Loki, and Grafana, read docs/architecture/observability-profile.md.

Current Status

The repo includes the current end-to-end runtime plus local seed, inspection, and verification tooling.
The local Docker Compose stack includes Postgres, Flyway, RabbitMQ, the four services, and Structurizr.
An optional Docker Compose observability profile adds Grafana Alloy, Loki, and Grafana for local log navigation without changing the default stack.
The shared Postgres instance is managed by Flyway SQL migrations under db/migrations and is now split into service-owned schemas: import_service, parser_service, customer_service, and operations.
api-service accepts CSV uploads, stores files in shared storage, forwards accepted import creation to import-service over an internal HTTP boundary, and proxies public status, summary, failure, and recovery reads back to import-service.
import-service now owns accepted-job persistence, the import.job.created transactional outbox, synchronous read and recovery APIs, and broker-safe operator replay orchestration.
parser-service stages normalized rows in parser_service.parsed_rows, writes import.job.parse.succeeded plus payload-carrying import.row.process messages into parser_service.outbox_messages, and relays those success-path messages to RabbitMQ in publish order.
customer-service now treats the RabbitMQ row message as the authoritative source of row content, never reads parser-owned tables, and reads or writes only customer_service tables.
import-service, parser-service, and customer-service share the current retry and recovery contract: retryable runtime failures publish delayed retry copies with broker confirms, non-retriable failures reject to service-local DLQs, and successful retried deliveries resolve their recovery rows.
RabbitMQ now uses the business topic exchange csv-importer.v1 plus the internal direct exchange csv-importer.internal.v1, with per-service .retry.1, .retry.2, .retry.3, and .dlq queues derived from BROKER_RETRY_DELAYS_MS.
PostgreSQL now also stores message_recovery_states, dead_letter_messages, operator_recovery_actions, and operator_recovery_action_messages so retry visibility, dead-letter inspection, and replay audit history survive consumer restarts.
Shared JSON message schemas, broker topology helpers, publish/consume validation, the import and parser outbox relays, and the recovery ledger are in place.
npm run seed:import now provides fixture-driven end-to-end imports through the real API path, and npm run import:post remains available as the lower-level upload helper.
npm run inspect:jobs, npm run inspect:recovery, npm run inspect:staged-rows, npm run inspect:outcomes, and npm run recover:job now cover the main local inspection and operator-recovery workflows.
npm run test:smoke now validates the full current schema baseline through Flyway version 15, including the service-owned schemas, and npm run test:devtools verifies the seed plus inspect workflow against an isolated compose stack.
docs/architecture/runtime.md is the current runtime writeup.
npm run test:contracts proves a real import.job.created publish/consume flow through RabbitMQ.
npm run test:e2e exercises the full stack through happy-path, malformed-CSV, partial-row-failure, duplicate-delivery, transient retry recovery, retry exhaustion to DLQ, and permanent business failures that stay on the normal failure path.
npm run test:integration chains the contracts, end-to-end, and developer-tooling verification scripts into one end-to-end developer workflow.
Implementation is being driven by the roadmap in docs/architecture/roadmap.md.
The proposed production-oriented follow-on plan is in docs/architecture/roadmap.v2.md.

Repository Guide

Core directories:

db/migrations: Flyway SQL migrations for the shared Postgres instance and service-owned schemas
services/api-service: upload boundary plus public proxy to import-service
services/import-service: import job persistence, state tracking, internal read API, and operator recovery orchestration
services/parser-service: Rust CSV parsing, staging, and parser outbox relay
services/customer-service: payload-driven customer matching and writes
services/shared: shared message contracts, validation, logging, and Node broker runtime

For Node HTTP services, prefer Fastify for new or significantly refactored service boundaries. api-service is the current reference implementation for that pattern; other Node services have not been migrated yet.

scripts/workflow: seed/import helpers and Postgres inspection CLIs
scripts/verification: smoke, contracts, end-to-end, and tooling verification scripts
scripts/lib: shared script helpers
scripts/compose-stack.sh: compose wrapper kept in shell because it is mostly docker compose orchestration
observability: local Grafana Alloy, Loki, and Grafana configuration
docs/architecture: architecture docs and ADRs
docs/architecture/structurizr: Structurizr workspace, generated static diagrams, and local cache

Read these next:

docs/architecture/runtime.md: current runtime flow, retry/DLQ topology, recovery ledger, and public recovery overlay
docs/architecture/observability-profile.md: local observability profile, Alloy log collection flow, and Grafana usage
docs/architecture/design-doc.md: architecture intent, scope, and constraints
docs/architecture/roadmap.md: implementation order and milestone history
docs/architecture/roadmap.v2.md: production-oriented next direction for simplification, checkpointing, recovery, and operator visibility
docs/architecture/open-questions.md: unresolved design decisions
docs/architecture/structurizr/workspace.dsl: C4 model and static architecture views
docs/architecture/adr/0001-use-single-repo-without-nx.md: current repo structure decision
docs/architecture/adr/0002-use-structured-json-logs-and-correlation-ids.md: logging and correlation conventions
docs/architecture/adr/0003-use-flyway-for-shared-postgres-schema-migrations.md: schema migration workflow for the shared Postgres database
docs/architecture/adr/0004-use-a-shared-rabbitmq-topology-and-contract-validation-at-broker-boundaries.md: RabbitMQ topology and contract validation rules
docs/architecture/adr/0005-use-a-transactional-outbox-for-api-job-acceptance.md: original durable import.job.created publication decision before the import boundary redesign
docs/architecture/adr/0007-add-broker-managed-retries-dead-letter-queues-and-a-recovery-ledger.md: delayed retry, DLQ, and recovery-ledger rules that extend ADR 0004
docs/architecture/adr/0008-add-postgres-backed-operator-recovery-inspection-and-replay.md: operator recovery inspection, replay, and superseded dead-letter rules that extend ADR 0007
docs/architecture/adr/0009-adopt-service-owned-schemas-and-payload-carrying-parser-handoffs.md: service-owned schemas, import-service workflow ownership, parser outbox, and payload-carrying import.row.process

Project Overview

The project imports customer records from CSV files into a CRM. Files are accepted by the API boundary, processed asynchronously, parsed into normalized rows, and then applied to customer records with progress and row-level outcomes tracked along the way.

The architecture is intentionally distributed and broker-driven so the project can practice service boundaries, asynchronous workflows, idempotency, and observability instead of optimizing for the simplest possible implementation.

The current runtime now covers upload acceptance through the public API, internal acceptance and read ownership in import-service, durable import.job.created publication from the import outbox, parser-side import.job.parse.started publication plus parser-side success-path outbox relay, durable parser staging, payload-carrying row handoff to customer-service, import-service-side row outcome aggregation, public failure inspection through the API proxy, consumer-side retry plus DLQ visibility for runtime failures, and job-scoped operator recovery inspection plus replay.

For v1, the system keeps the scope narrow: RabbitMQ handles the async workflow, Postgres stores durable state, the parser runs in Rust, and customer matching is email-only. The detailed domain rules and architectural rationale live in the docs rather than being duplicated here.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
csv		csv
db/migrations		db/migrations
docs		docs
observability		observability
scripts		scripts
services		services
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV IMPORTER

Getting Started

Local Configuration

Optional Log Navigation

Current Status

Repository Guide

Project Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CSV IMPORTER

Getting Started

Local Configuration

Optional Log Navigation

Current Status

Repository Guide

Project Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages