What would you like to happen?
Data Vault 2.0 pipeline wizard — automated Hub, Satellite, and Link generation
Summary
Add a Data Vault 2.0 wizard (or JSON-spec-driven component) to Apache Hop that auto-generates Hub, Satellite, and Link pipeline fragments from a table definition and a small set of metadata inputs. This would eliminate the most repetitive scaffolding work in Raw Vault construction and significantly accelerate DV2.0 project delivery.
Motivation
Data Vault 2.0 has become a widely adopted pattern for staging and historization layers due to its scalability, auditability, and agility. However, building a Raw Vault is highly repetitive: every source table produces at least one Hub and one Satellite following identical structural rules. On a project with 50 source tables, a developer must hand-craft 100+ pipeline fragments that are mechanically similar.
Tools like Cognos Data Manager demonstrated the productivity gains possible when an ETL platform encodes modeling conventions directly — letting developers focus on business logic rather than infrastructure. A DV2.0 wizard in Apache Hop would deliver the same kind of leverage for the modern open-source ecosystem.
Proposed feature — two-phase scope
Phase 1 — Hub + Satellite generator
User provides:
Source table
Business key column(s)
Satellite attribute columns
Hash algorithm (MD5, SHA-1, SHA-256…)
Load date column name
Duplicate handling strategy (lookup-before-insert vs. destination error handling)
Output: a ready-to-run pipeline fragment containing the Hub table loader and the linked Satellite table loader.
Phase 2 — Link table generator
User provides:
Reference to Hub 1 and Hub 2 (previously defined)
Business key(s) that establish the relationship
Hash algorithm
Load date column name
Duplicate handling strategy
Output: a ready-to-run Link table pipeline fragment.
Suggested interface
Either a step-by-step wizard inside the Hop GUI, or a dedicated pipeline component that accepts a JSON specification file as input. Both approaches should produce standard Hop pipeline artifacts that can be version-controlled and re-generated if the spec changes.
Expected impact
On a 50-table project, Phase 1 alone eliminates roughly 100 hand-crafted pipeline fragments. A developer can scaffold a full Raw Vault in under 30 minutes and immediately shift focus to the Business Vault and presentation layer — the work that actually requires domain expertise.
Acceptance criteria
- Given a source table and the required metadata, the wizard generates a syntactically valid and executable Hop pipeline for the Hub and its Satellite.
- Hash key generation respects the chosen algorithm and follows DV2.0 conventions (concatenation order, null handling).
- Both lookup-before-insert and error-handling duplicate strategies are supported and produce correct behaviour.
- Phase 2 generates a Link pipeline fragment that correctly references the hash keys of the two parent Hubs.
- Generated pipelines are standard Hop artifacts — no proprietary lock-in, fully editable after generation.
Issue Priority
Priority: 3
Issue Component
Component: Other
What would you like to happen?
Data Vault 2.0 pipeline wizard — automated Hub, Satellite, and Link generation
Summary
Add a Data Vault 2.0 wizard (or JSON-spec-driven component) to Apache Hop that auto-generates Hub, Satellite, and Link pipeline fragments from a table definition and a small set of metadata inputs. This would eliminate the most repetitive scaffolding work in Raw Vault construction and significantly accelerate DV2.0 project delivery.
Motivation
Data Vault 2.0 has become a widely adopted pattern for staging and historization layers due to its scalability, auditability, and agility. However, building a Raw Vault is highly repetitive: every source table produces at least one Hub and one Satellite following identical structural rules. On a project with 50 source tables, a developer must hand-craft 100+ pipeline fragments that are mechanically similar.
Tools like Cognos Data Manager demonstrated the productivity gains possible when an ETL platform encodes modeling conventions directly — letting developers focus on business logic rather than infrastructure. A DV2.0 wizard in Apache Hop would deliver the same kind of leverage for the modern open-source ecosystem.
Proposed feature — two-phase scope
Phase 1 — Hub + Satellite generator
User provides:
Source table
Business key column(s)
Satellite attribute columns
Hash algorithm (MD5, SHA-1, SHA-256…)
Load date column name
Duplicate handling strategy (lookup-before-insert vs. destination error handling)
Output: a ready-to-run pipeline fragment containing the Hub table loader and the linked Satellite table loader.
Phase 2 — Link table generator
User provides:
Reference to Hub 1 and Hub 2 (previously defined)
Business key(s) that establish the relationship
Hash algorithm
Load date column name
Duplicate handling strategy
Output: a ready-to-run Link table pipeline fragment.
Suggested interface
Either a step-by-step wizard inside the Hop GUI, or a dedicated pipeline component that accepts a JSON specification file as input. Both approaches should produce standard Hop pipeline artifacts that can be version-controlled and re-generated if the spec changes.
Expected impact
On a 50-table project, Phase 1 alone eliminates roughly 100 hand-crafted pipeline fragments. A developer can scaffold a full Raw Vault in under 30 minutes and immediately shift focus to the Business Vault and presentation layer — the work that actually requires domain expertise.
Acceptance criteria
Issue Priority
Priority: 3
Issue Component
Component: Other