Skip to content

[Feature Request]: Data vault 2.0 wizard Industrial ETL production #7077

@mhamedbenjmaa

Description

@mhamedbenjmaa

What would you like to happen?

Data Vault 2.0 pipeline wizard — automated Hub, Satellite, and Link generation

Summary

Add a Data Vault 2.0 wizard (or JSON-spec-driven component) to Apache Hop that auto-generates Hub, Satellite, and Link pipeline fragments from a table definition and a small set of metadata inputs. This would eliminate the most repetitive scaffolding work in Raw Vault construction and significantly accelerate DV2.0 project delivery.

Motivation

Data Vault 2.0 has become a widely adopted pattern for staging and historization layers due to its scalability, auditability, and agility. However, building a Raw Vault is highly repetitive: every source table produces at least one Hub and one Satellite following identical structural rules. On a project with 50 source tables, a developer must hand-craft 100+ pipeline fragments that are mechanically similar.

Tools like Cognos Data Manager demonstrated the productivity gains possible when an ETL platform encodes modeling conventions directly — letting developers focus on business logic rather than infrastructure. A DV2.0 wizard in Apache Hop would deliver the same kind of leverage for the modern open-source ecosystem.

Proposed feature — two-phase scope

Phase 1 — Hub + Satellite generator

User provides:

Source table
Business key column(s)
Satellite attribute columns
Hash algorithm (MD5, SHA-1, SHA-256…)
Load date column name
Duplicate handling strategy (lookup-before-insert vs. destination error handling)
Output: a ready-to-run pipeline fragment containing the Hub table loader and the linked Satellite table loader.

Phase 2 — Link table generator

User provides:

Reference to Hub 1 and Hub 2 (previously defined)
Business key(s) that establish the relationship
Hash algorithm
Load date column name
Duplicate handling strategy
Output: a ready-to-run Link table pipeline fragment.

Suggested interface

Either a step-by-step wizard inside the Hop GUI, or a dedicated pipeline component that accepts a JSON specification file as input. Both approaches should produce standard Hop pipeline artifacts that can be version-controlled and re-generated if the spec changes.

Expected impact

On a 50-table project, Phase 1 alone eliminates roughly 100 hand-crafted pipeline fragments. A developer can scaffold a full Raw Vault in under 30 minutes and immediately shift focus to the Business Vault and presentation layer — the work that actually requires domain expertise.

Acceptance criteria

  • Given a source table and the required metadata, the wizard generates a syntactically valid and executable Hop pipeline for the Hub and its Satellite.
  • Hash key generation respects the chosen algorithm and follows DV2.0 conventions (concatenation order, null handling).
  • Both lookup-before-insert and error-handling duplicate strategies are supported and produce correct behaviour.
  • Phase 2 generates a Link pipeline fragment that correctly references the hash keys of the two parent Hubs.
  • Generated pipelines are standard Hop artifacts — no proprietary lock-in, fully editable after generation.

Issue Priority

Priority: 3

Issue Component

Component: Other

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions