Background and Motivation
The current export formats support a wide variety of data types, making it very convenient to get started. However, incorporating custom logic into the output requires defining a Custom Exporter, which involves implementing and maintaining Python scripts—a significant burden in practice.
For instance, at my company, the product team manages hundreds of tables in an RDBMS, and the schemas change almost daily. After loading these tables into BigQuery, our data platform team sets up a staging layer in dbt.
Because schema changes occur so frequently across hundreds of tables, manually updating the staging SQL files for dbt is quite labor-intensive. We therefore use the Data Contract CLI to automate these updates.
Although Data Contract CLI offers a dbt-staging-sql output format, it doesn’t easily accommodate our needed pre-processing—such as data type–specific conversions or parsing date/time values. To address this, we introduced a Jinja template file and used a Custom Exporter to generate our own custom format.
In scenarios like this, having official support for Jinja templates within the dbt export command would greatly simplify producing outputs that include custom logic.
Proposed Feature
I propose introducing a custom-template format, as shown below:
datacontract export datacontract.yaml --format custom-template --template template.sql --output output.sql
With this command, a Jinja template file specified via the --template option would be used for the export. I'm not particularly attached to the names custom-template or --template; if there are more suitable names, I'm happy to adopt them.
Expected Benefits
By enabling this functionality, injecting unique logic into the output becomes much easier.
Beyond generating dbt resources as described above, you could, for example, produce Markdown files to help maintain a data catalog or other documentation, complete with custom headings or formatting. I believe this approach would be broadly useful for various scenarios.
Potential Concerns and Open Questions
Since this proposal differs from existing formats, I'd like to discuss any considerations or potential issues before it’s fully implemented.
If the proposal is accepted, I’d be happy to create and submit a Pull Request myself.
Thank you for your time, and I look forward to your feedback.
Background and Motivation
The current export formats support a wide variety of data types, making it very convenient to get started. However, incorporating custom logic into the output requires defining a Custom Exporter, which involves implementing and maintaining Python scripts—a significant burden in practice.
For instance, at my company, the product team manages hundreds of tables in an RDBMS, and the schemas change almost daily. After loading these tables into BigQuery, our data platform team sets up a staging layer in dbt.
Because schema changes occur so frequently across hundreds of tables, manually updating the staging SQL files for dbt is quite labor-intensive. We therefore use the Data Contract CLI to automate these updates.
Although Data Contract CLI offers a
dbt-staging-sqloutput format, it doesn’t easily accommodate our needed pre-processing—such as data type–specific conversions or parsing date/time values. To address this, we introduced a Jinja template file and used a Custom Exporter to generate our own custom format.In scenarios like this, having official support for Jinja templates within the dbt export command would greatly simplify producing outputs that include custom logic.
Proposed Feature
I propose introducing a
custom-templateformat, as shown below:datacontract export datacontract.yaml --format custom-template --template template.sql --output output.sqlWith this command, a Jinja template file specified via the
--templateoption would be used for the export. I'm not particularly attached to the namescustom-templateor--template; if there are more suitable names, I'm happy to adopt them.Expected Benefits
By enabling this functionality, injecting unique logic into the output becomes much easier.
Beyond generating dbt resources as described above, you could, for example, produce Markdown files to help maintain a data catalog or other documentation, complete with custom headings or formatting. I believe this approach would be broadly useful for various scenarios.
Potential Concerns and Open Questions
Since this proposal differs from existing formats, I'd like to discuss any considerations or potential issues before it’s fully implemented.
If the proposal is accepted, I’d be happy to create and submit a Pull Request myself.
Thank you for your time, and I look forward to your feedback.