Skip to content

Chudroy/csv-importer

Repository files navigation

CSV IMPORTER

Broker-driven CSV import system for customer data. The local stack now implements the current end-to-end runtime plus API-driven seed tooling, Postgres-backed inspection CLIs, compose smoke coverage, and focused workflow verification so local work can start from real, inspectable state instead of reverse-engineering.

Getting Started

For the current local setup, you only need Docker with Docker Compose.

cp .env.example .env
npm run compose:up

.env.example is the local config contract. Copy it to .env, keep .env out of git, and treat it as the only local secret file for this repo.

If you want the local log navigation stack as well:

npm run compose:up:all

That profile keeps the default application stack unchanged and adds Grafana, Loki, and Alloy for local log exploration.

To stop the stack:

npm run compose:down

Additional Compose shortcuts:

  • npm run compose:up: start the non-observability stack
  • npm run compose:down: stop and remove the non-observability stack
  • npm run compose:up:all: start the full stack including observability
  • npm run compose:down:all: stop and remove the full stack including observability
  • npm run compose:up:observability: start only loki, grafana, and alloy
  • npm run compose:down:observability: stop and remove only loki, grafana, and alloy

Useful local endpoints:

  • API service: http://localhost:3000 by default
  • Grafana: http://localhost:3004 by default when the observability profile is enabled
  • PostgreSQL: postgresql://csv_importer:csv_importer@localhost:5432/csv_importer by default
  • Structurizr UI: http://localhost:8080 by default
  • RabbitMQ management: http://localhost:15672 by default
  • RabbitMQ broker: amqp://localhost:5672 by default
  • Dashboard: http://localhost:3005 by default

RabbitMQ uses the credentials from .env. The service containers now stay up behind local /health readiness endpoints that verify the dependencies each service actually owns: import-service, parser-service, and customer-service verify authenticated Postgres and RabbitMQ connectivity, while api-service verifies that import-service is reachable over its internal HTTP boundary. On stack startup, a one-shot flyway container applies the shared Postgres schema from db/migrations before the application services boot.

Useful database commands:

npm run db:migrate
npm run db:validate
npm run db:info

Useful developer-tooling commands:

npm run seed:import
npm run inspect:jobs -- --limit 10
npm run inspect:recovery -- --job-id <job-id>
npm run inspect:staged-rows -- --job-id <job-id>
npm run inspect:outcomes -- --job-id <job-id>
npm run recover:job -- --job-id <job-id>
npm run test:contracts
npm run test:smoke
npm run test:devtools
npm run test:e2e
npm run test:integration

For npm run test:e2e, set NO_COLOR=1 to disable ANSI colors or E2E_VERBOSE_COMPOSE=1 to stream unfiltered docker compose up output.

For the workflow-oriented script guide, see scripts/workflow/README.md.

Local Configuration

The local stack uses one root .env file shared by Docker Compose and the service containers.

Config is grouped into:

  • APP_ENV and IMPORT_STORAGE_ROOT
  • root Postgres host, port, database, user, and password for Flyway plus local admin tooling
  • service-owned Postgres connection variables for import-service, parser-service, and customer-service
  • POSTGRES_HOST_PORT, API_SERVICE_HOST_PORT, RABBITMQ_HOST_PORT, RABBITMQ_MANAGEMENT_HOST_PORT, and STRUCTURIZR_PORT for host-published local ports
  • RabbitMQ host, ports, user, and password
  • BROKER_RETRY_DELAYS_MS for the shared delayed retry budget and queue TTLs
  • IMPORT_SERVICE_BASE_URL for the internal api-service to import-service boundary
  • per-service *_SERVICE_NAME, *_SERVICE_LOG_LEVEL, and *_SERVICE_PORT
  • GRAFANA_PORT, GRAFANA_ADMIN_USER, and GRAFANA_ADMIN_PASSWORD for the optional local Grafana UI
  • PARSER_CONSUMER_ENABLED and CUSTOMER_CONSUMER_ENABLED to control which background consumers run locally

The four application services default to: api-service 3000, import-service 3001, parser-service 3002, and customer-service 3003.

import-service, parser-service, and customer-service derive their Postgres and RabbitMQ connection URLs from those primitive variables at startup so the connection settings cannot drift. api-service derives its internal base URL for import-service the same way. Startup plus /health checks validate those dependencies with real authenticated handshakes rather than only checking open ports.

The shared upload storage root defaults to /data/imports and is backed by a Docker volume in local development.

For a quick import through the current local workflow, use the seed helper:

npm run seed:import

That command uploads csv/happy-path.csv through the real API boundary, waits for terminal state, and prints the accepted job plus terminal status and summary.

Other supported fixtures:

npm run seed:import -- --fixture partial-failure
npm run seed:import -- --fixture parse-failure
npm run seed:import -- --fixture single-row

To inspect what the workflow persisted for one import job:

npm run inspect:jobs -- --job-id <job-id>
npm run inspect:recovery -- --job-id <job-id>
npm run inspect:staged-rows -- --job-id <job-id>
npm run inspect:outcomes -- --job-id <job-id>

If a job is in dead_lettered, the operator recovery helpers now let you inspect the active dead letters and trigger one job-scoped replay:

npm run inspect:recovery -- --job-id <job-id>
npm run recover:job -- --job-id <job-id>

The compatibility upload helper still exists if you want to post an arbitrary CSV directly:

npm run import:post -- ./path/to/file.csv

If a required variable is missing or malformed, the affected service exits on startup with a clear validation error instead of running with partial configuration.

If you only want to run the architecture workspace:

docker compose up structurizr

Optional Log Navigation

The repo includes an opt-in local observability profile built on:

  • Grafana Alloy for Docker log collection
  • Loki for log storage and queries
  • Grafana for Explore and Logs Drilldown

This profile only ingests logs from the four application services in version 1:

  • api-service
  • import-service
  • parser-service
  • customer-service

Infrastructure containers such as postgres, rabbitmq, flyway, and structurizr stay out of Loki in this first iteration so the log views stay focused on import workflow debugging.

Typical local workflow:

  • start the stack with docker compose --profile observability up --build
  • or use npm run compose:up:all for the full stack and npm run compose:up:observability for the observability services only
  • open Grafana at http://localhost:${GRAFANA_PORT} or http://localhost:3004
  • use Explore or Logs Drilldown with the Loki data source
  • filter first by the service, level, and event labels
  • inspect correlation_id and import_job_id from structured metadata or the raw JSON log body

The application services still emit the same structured JSON logs to stdout. Alloy ships those logs to Loki without changing the application log schema.

For the detailed capture path and the workflow between Alloy, Loki, and Grafana, read docs/architecture/observability-profile.md.

Current Status

  • The repo includes the current end-to-end runtime plus local seed, inspection, and verification tooling.
  • The local Docker Compose stack includes Postgres, Flyway, RabbitMQ, the four services, and Structurizr.
  • An optional Docker Compose observability profile adds Grafana Alloy, Loki, and Grafana for local log navigation without changing the default stack.
  • The shared Postgres instance is managed by Flyway SQL migrations under db/migrations and is now split into service-owned schemas: import_service, parser_service, customer_service, and operations.
  • api-service accepts CSV uploads, stores files in shared storage, forwards accepted import creation to import-service over an internal HTTP boundary, and proxies public status, summary, failure, and recovery reads back to import-service.
  • import-service now owns accepted-job persistence, the import.job.created transactional outbox, synchronous read and recovery APIs, and broker-safe operator replay orchestration.
  • parser-service stages normalized rows in parser_service.parsed_rows, writes import.job.parse.succeeded plus payload-carrying import.row.process messages into parser_service.outbox_messages, and relays those success-path messages to RabbitMQ in publish order.
  • customer-service now treats the RabbitMQ row message as the authoritative source of row content, never reads parser-owned tables, and reads or writes only customer_service tables.
  • import-service, parser-service, and customer-service share the current retry and recovery contract: retryable runtime failures publish delayed retry copies with broker confirms, non-retriable failures reject to service-local DLQs, and successful retried deliveries resolve their recovery rows.
  • RabbitMQ now uses the business topic exchange csv-importer.v1 plus the internal direct exchange csv-importer.internal.v1, with per-service .retry.1, .retry.2, .retry.3, and .dlq queues derived from BROKER_RETRY_DELAYS_MS.
  • PostgreSQL now also stores message_recovery_states, dead_letter_messages, operator_recovery_actions, and operator_recovery_action_messages so retry visibility, dead-letter inspection, and replay audit history survive consumer restarts.
  • Shared JSON message schemas, broker topology helpers, publish/consume validation, the import and parser outbox relays, and the recovery ledger are in place.
  • npm run seed:import now provides fixture-driven end-to-end imports through the real API path, and npm run import:post remains available as the lower-level upload helper.
  • npm run inspect:jobs, npm run inspect:recovery, npm run inspect:staged-rows, npm run inspect:outcomes, and npm run recover:job now cover the main local inspection and operator-recovery workflows.
  • npm run test:smoke now validates the full current schema baseline through Flyway version 15, including the service-owned schemas, and npm run test:devtools verifies the seed plus inspect workflow against an isolated compose stack.
  • docs/architecture/runtime.md is the current runtime writeup.
  • npm run test:contracts proves a real import.job.created publish/consume flow through RabbitMQ.
  • npm run test:e2e exercises the full stack through happy-path, malformed-CSV, partial-row-failure, duplicate-delivery, transient retry recovery, retry exhaustion to DLQ, and permanent business failures that stay on the normal failure path.
  • npm run test:integration chains the contracts, end-to-end, and developer-tooling verification scripts into one end-to-end developer workflow.
  • Implementation is being driven by the roadmap in docs/architecture/roadmap.md.
  • The proposed production-oriented follow-on plan is in docs/architecture/roadmap.v2.md.

Repository Guide

Core directories:

  • db/migrations: Flyway SQL migrations for the shared Postgres instance and service-owned schemas
  • services/api-service: upload boundary plus public proxy to import-service
  • services/import-service: import job persistence, state tracking, internal read API, and operator recovery orchestration
  • services/parser-service: Rust CSV parsing, staging, and parser outbox relay
  • services/customer-service: payload-driven customer matching and writes
  • services/shared: shared message contracts, validation, logging, and Node broker runtime

For Node HTTP services, prefer Fastify for new or significantly refactored service boundaries. api-service is the current reference implementation for that pattern; other Node services have not been migrated yet.

  • scripts/workflow: seed/import helpers and Postgres inspection CLIs
  • scripts/verification: smoke, contracts, end-to-end, and tooling verification scripts
  • scripts/lib: shared script helpers
  • scripts/compose-stack.sh: compose wrapper kept in shell because it is mostly docker compose orchestration
  • observability: local Grafana Alloy, Loki, and Grafana configuration
  • docs/architecture: architecture docs and ADRs
  • docs/architecture/structurizr: Structurizr workspace, generated static diagrams, and local cache

Read these next:

Project Overview

The project imports customer records from CSV files into a CRM. Files are accepted by the API boundary, processed asynchronously, parsed into normalized rows, and then applied to customer records with progress and row-level outcomes tracked along the way.

The architecture is intentionally distributed and broker-driven so the project can practice service boundaries, asynchronous workflows, idempotency, and observability instead of optimizing for the simplest possible implementation.

The current runtime now covers upload acceptance through the public API, internal acceptance and read ownership in import-service, durable import.job.created publication from the import outbox, parser-side import.job.parse.started publication plus parser-side success-path outbox relay, durable parser staging, payload-carrying row handoff to customer-service, import-service-side row outcome aggregation, public failure inspection through the API proxy, consumer-side retry plus DLQ visibility for runtime failures, and job-scoped operator recovery inspection plus replay.

For v1, the system keeps the scope narrow: RabbitMQ handles the async workflow, Postgres stores durable state, the parser runs in Rust, and customer matching is email-only. The detailed domain rules and architectural rationale live in the docs rather than being duplicated here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages