Skip to content

Aahrav/Primetrade

Repository files navigation

MLOps Batch Pipeline

Lightweight batch pipeline for financial OHLCV data. The pipeline reads input market data, computes rolling statistics, generates trading signals, and writes structured JSON metrics for observability.

What It Does

  • Loads YAML configuration (seed, window, version)
  • Loads CSV input and validates required columns
  • Computes rolling mean over close
  • Generates binary signal (1 when close > rolling_mean, else 0)
  • Writes metrics JSON to file and prints the same JSON to stdout
  • Logs pipeline lifecycle events and errors to file + console

Project Structure

  • run.py: CLI entry point and orchestration
  • config_loader.py: YAML config parsing and validation
  • data_loader.py: CSV loading and schema validation
  • signal_generator.py: rolling mean and signal generation
  • metrics_writer.py: success/error metric emission
  • logger.py: shared logger setup
  • generate_data.py: deterministic synthetic OHLCV data generator
  • tests/: unit and property-based tests

Installation

Use Python 3.9+.

python -m venv .venv
# Windows PowerShell
.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Generate Synthetic Data

python generate_data.py

This creates data.csv with 10,000 deterministic OHLCV rows.

Configuration Format

Example config.yaml:

seed: 42
window: 20
version: "1.0.0"

Required keys:

  • seed (int): random seed for deterministic behavior
  • window (int): rolling window for moving average
  • version (str): pipeline/report version tag

CLI Usage

python run.py \
  --input data.csv \
  --config config.yaml \
  --output metrics.json \
  --log-file pipeline.log

Arguments:

  • --input: path to source CSV
  • --config: path to YAML config
  • --output: path for metrics JSON output
  • --log-file: path for pipeline log file

Exit codes:

  • 0: successful execution
  • 1: execution failed (metrics still written with error payload)

Output Metrics Format

Success payload example:

{
  "version": "1.0.0",
  "rows_processed": 10000,
  "metric": "signal_rate",
  "value": 0.4876,
  "latency_ms": 123,
  "seed": 42,
  "status": "success"
}

Error payload example:

{
  "version": "unknown",
  "status": "error",
  "error_message": "Input file not found: missing.csv"
}

Docker

Build image:

docker build -t mlops-batch-pipeline .

Run with defaults from Dockerfile:

docker run --rm mlops-batch-pipeline

The container command defaults to:

python run.py --input data.csv --config config.yaml --output metrics.json --log-file pipeline.log

Testing

Run all tests:

pytest -q

Run only property-based tests:

pytest -q -k "property or determinism or persistence"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors