📊 Derived Variable Engine

A Config-Driven Composite & Index Builder for Analytics Workflows

🔎 Overview

The Derived Variable Engine is a modular, configuration-driven transformation engine designed to build composite metrics, indices, and derived KPIs from structured datasets.

It supports:

Multiple aggregation strategies (mean, sum, weighted mean, etc.)
Special code handling
Minimum valid response thresholds
Optional governance validation (scale enforcement)
Config-controlled fallback behavior
Execution reporting with JSON audit logs

The architecture intentionally separates:

Computation logic
Fallback logic
Governance validation
Configuration validation

This keeps the system extensible, auditable, and production-friendly.

🏗 Architecture

Config (YAML)
      ↓
Config Validation (engine.py)
      ↓
Optional Governance Layer (validation.py)
      ↓
Aggregation Registry (aggregations.py)
      ↓
Fallback Registry (fallback.py)
      ↓
Output Dataset + JSON Report

Each layer has a clearly defined responsibility.

🚀 Features

📌 Aggregation Strategies

mean
sum
median
min
max
std
count_valid
weighted_mean

All aggregations are registry-based and easily extensible.

📌 Special Code Handling

Special values (e.g., -98, -99) are excluded from aggregation and handled via configurable fallback strategies.

📌 Fallback Strategies

nan_if_no_valid
propagate_special
Configurable multi-special fallback values

📌 Minimum Valid Ratio

Control the proportion of valid responses required before computing a derived variable.

Example:

min_valid_ratio: 0.5

📌 Optional Governance Layer

Enable strict scale validation:

enable_validation: true

Validation Checks:

Numeric enforcement
Scale bounds (scale_min, scale_max)
Special code exclusion from scale validation

📌 Execution Reporting

Each run generates:

Execution time
Total rows processed
Derived variables created
Valid vs invalid row counts per variable
JSON audit report

📂 Project Structure

derived-variable-engine/
│
├── src/
│   ├── main.py
│   ├── engine.py
│   ├── aggregations.py
│   ├── fallback.py
│   ├── validation.py
│
├── configs/
│   └── derived_config.yaml
│
├── data/
│   ├── sample_input.csv
│   ├── sample_input_edge_case.csv
│
├── outputs/
│   ├── derived_output.csv
│   ├── derived_output_edge_case.csv
│
├── logs/
│   └── derived_report.json
│
├── requirements.txt
├── README.md
└── License

⚙ Configuration

Configuration is YAML-driven.

Example:

enable_validation: true

derived_variables:
  - name: SATIS
    source_columns:
      - q100_1
      - q100_2
      - q100_3
      - q100_4
    aggregation: mean
    special_codes: [-98, -99]
    fallback_strategy: propagate_special
    multi_special_fallback: -98
    scale_min: 1
    scale_max: 5
    min_valid_ratio: 0.5

No code changes are required to:

Add new derived variables
Change aggregation strategy
Modify fallback logic
Adjust governance strictness

🔄 Execution Flow

Load dataset (CSV)
Load YAML configuration
Validate configuration structure
Optionally run governance validation
Apply aggregation registry
Apply fallback registry
Save derived dataset
Generate execution report

🛡 Validation Layers Explained

1️⃣ Configuration Validation (engine.py)

Ensures:

No duplicate derived variables
No overwriting existing columns
Aggregation exists
Source columns exist
Weight lengths match (for weighted mean)

Stops execution if invalid.

2️⃣ Governance Validation (validation.py)

Triggered via config flag.

Ensures:

Numeric data types
Scale boundaries respected
Special codes excluded from scale checks

Stops execution on scale violations.

3️⃣ Row-Level Eligibility Logic

Inside the engine:

Counts valid responses
Enforces min_valid_ratio
Determines whether fallback applies

Does not stop execution — controls derived output behavior.

🧪 Edge Case Testing

Included:

Fully valid rows
All special code rows
Mixed special code rows
Threshold boundary rows
Weighted mean edge cases
Below-threshold cases
Multi-special fallback cases

Edge case Test files:

sample_input_edge_case.csv
derived_output_edge_case.csv

📦 Requirements

pandas>=1.5
numpy>=1.23
PyYAML>=6.0

📜 License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Derived Variable Engine

🔎 Overview

🧭 Navigation

🏗 Architecture

🚀 Features

📌 Aggregation Strategies

📌 Special Code Handling

📌 Fallback Strategies

📌 Minimum Valid Ratio

📌 Optional Governance Layer

📌 Execution Reporting

📂 Project Structure

⚙ Configuration

🔄 Execution Flow

🛡 Validation Layers Explained

2️⃣ Governance Validation (validation.py)

3️⃣ Row-Level Eligibility Logic

🧪 Edge Case Testing

📦 Requirements

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configs		configs
data		data
logs		logs
outputs		outputs
src		src
LICENSE		LICENSE
README.md		README.md
requirements		requirements

License

TheJayesh25/derived-variable-engine

Folders and files

Latest commit

History

Repository files navigation

📊 Derived Variable Engine

🔎 Overview

🧭 Navigation

🏗 Architecture

🚀 Features

📌 Aggregation Strategies

📌 Special Code Handling

📌 Fallback Strategies

📌 Minimum Valid Ratio

📌 Optional Governance Layer

📌 Execution Reporting

📂 Project Structure

⚙ Configuration

🔄 Execution Flow

🛡 Validation Layers Explained

2️⃣ Governance Validation (validation.py)

3️⃣ Row-Level Eligibility Logic

🧪 Edge Case Testing

📦 Requirements

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages