Lightweight batch pipeline for financial OHLCV data. The pipeline reads input market data, computes rolling statistics, generates trading signals, and writes structured JSON metrics for observability.
- Loads YAML configuration (
seed,window,version) - Loads CSV input and validates required columns
- Computes rolling mean over
close - Generates binary signal (
1whenclose > rolling_mean, else0) - Writes metrics JSON to file and prints the same JSON to stdout
- Logs pipeline lifecycle events and errors to file + console
run.py: CLI entry point and orchestrationconfig_loader.py: YAML config parsing and validationdata_loader.py: CSV loading and schema validationsignal_generator.py: rolling mean and signal generationmetrics_writer.py: success/error metric emissionlogger.py: shared logger setupgenerate_data.py: deterministic synthetic OHLCV data generatortests/: unit and property-based tests
Use Python 3.9+.
python -m venv .venv
# Windows PowerShell
.venv\Scripts\Activate.ps1
pip install -r requirements.txtpython generate_data.pyThis creates data.csv with 10,000 deterministic OHLCV rows.
Example config.yaml:
seed: 42
window: 20
version: "1.0.0"Required keys:
seed(int): random seed for deterministic behaviorwindow(int): rolling window for moving averageversion(str): pipeline/report version tag
python run.py \
--input data.csv \
--config config.yaml \
--output metrics.json \
--log-file pipeline.logArguments:
--input: path to source CSV--config: path to YAML config--output: path for metrics JSON output--log-file: path for pipeline log file
Exit codes:
0: successful execution1: execution failed (metrics still written with error payload)
Success payload example:
{
"version": "1.0.0",
"rows_processed": 10000,
"metric": "signal_rate",
"value": 0.4876,
"latency_ms": 123,
"seed": 42,
"status": "success"
}Error payload example:
{
"version": "unknown",
"status": "error",
"error_message": "Input file not found: missing.csv"
}Build image:
docker build -t mlops-batch-pipeline .Run with defaults from Dockerfile:
docker run --rm mlops-batch-pipelineThe container command defaults to:
python run.py --input data.csv --config config.yaml --output metrics.json --log-file pipeline.logRun all tests:
pytest -qRun only property-based tests:
pytest -q -k "property or determinism or persistence"