[BUG] Using 'vault_insert_by_period' materialisation with period='MILLISECOND' results in Unhandled error: Range too big. #175

anouar-zh · 2022-12-23T08:14:42Z

Describe the bug
I want to load data which I receive in millisecond (example load_dts: 2022-11-09 05:01:23.351).
When I use the 'vault_insert_by_period' materialisation with period='MILLISECOND' it results in Unhandled error: Range too big.

14:01:48 Unhandled error while executing model.datavault.hs_store_exploitation Range too big. The sandbox blocks ranges larger than MAX_RANGE (100000).

Environment

dbt version: dbt=1.3.1
dbtvault version: 0.9.1
Database/Platform: databricks

To Reproduce
Steps to reproduce the behavior:

Use a dataset with data received in milliseconds
Use the vault_insert_by_period materialisation with period='MILLISECOND'
Execute dbt run
See error

Expected behavior
I expect data to be loaded just like it would have been loaded when period day was used

Screenshots

Log files
�[0m08:04:53.174466 [error] [MainThread]: �[33mRange too big. The sandbox blocks ranges larger than MAX_RANGE (100000).�[0m �[0m08:04:52.211738 [debug] [Thread-4 ]: Databricks adapter: NotImplemented: rollback

Additional context
The databricks datediff(unit, start, end) receives the unit of measure period as first parameter as opposed to the third parameter in https://github.com/Datavault-UK/dbtvault/blob/4dc4cab0daa9ca17d703a42d50ad651fd2ee707e/macros/materialisations/period_mat_helpers/get_period_boundaries.sql

The text was updated successfully, but these errors were encountered:

DVAlexHiggs · 2022-12-23T08:40:33Z

Hi, thanks for your report!

It's odd that datediff has different parameter order in Databricks, that's perhaps something that needs to be reported to them. We'll look into that (we can fix it our end by using name parameters though)

Generally, millisecond is ill-advised as an iteration period for most use cases, as there will potentially be thousands and thousands of iterations for dbt to perform.

This will be badly performing. How many iterations are there for dbt to do if this were to work?

Either way, you're right that this shouldn't produce an error like this of course, so we'd like to have a solution for this.

anouar-zh · 2022-12-23T09:03:57Z

In terms of iterations using milliseconds I think a maximum of 20-50 iterations. Is this issue a candidate for a minor fix/release?

DVAlexHiggs · 2022-12-23T23:22:59Z

In terms of iterations using milliseconds I think a maximum of 20-50 iterations. Is this issue a candidate for a minor fix/release?

Yes of course. Will be worked on after the holiday. Thank you for your report

DVAlexHiggs · 2023-01-25T10:01:47Z

Hi. We've done some investigation on this and essentially it's due to the range() jinja function having a maximum of 100,000 iterations. Though this is a technical limit, it does highlight the issue I previously described around millisecond periods being ill-advised.

Due to this, this is something we won't be fixing, but will instead provide some error handling and a friendly message, such as:

"Max iterations is 100,000. Consider using a different time period (e.g. day). vault_insert_by materialisations are not intended for this purpose, please see -link to docs-"

We do need some more guidance on this in the docs, but essentially the vault_insert_by materialisations are not for loading large amounts of data such as what you would have in a first time load (base load) and should not be used for highly granular millisecond-difference data. There are other approaches for loading, including those in the DV 2.0 standards.

Our aim for a while now has been to implement waterlevel macros for loading correctly, and release more guidance and documentation on the matter, as we understand how to load is a pain-point for many dbtvault users.

DVAlexHiggs · 2023-03-22T07:05:36Z

Hi! We've added a warning and some better handling around this issue in 0.9.5.

In a future update, (likely 0.9.6) we will be raising an exception when an attempt is made to use milliseconds.

This report prompted us re-visit our materialisation logic and we've got some refinements and big fixes coming up. Thank you!

anouar-zh added the bug Something isn't working label Dec 23, 2022

anouar-zh assigned DVAlexHiggs Dec 23, 2022

DVAlexHiggs added wontfix This will not be worked on and removed wontfix This will not be worked on labels Jan 25, 2023

DVAlexHiggs closed this as completed Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Using 'vault_insert_by_period' materialisation with period='MILLISECOND' results in Unhandled error: Range too big. #175

[BUG] Using 'vault_insert_by_period' materialisation with period='MILLISECOND' results in Unhandled error: Range too big. #175

anouar-zh commented Dec 23, 2022 •

edited

DVAlexHiggs commented Dec 23, 2022 •

edited

anouar-zh commented Dec 23, 2022

DVAlexHiggs commented Dec 23, 2022

DVAlexHiggs commented Jan 25, 2023 •

edited by DVTimWilson

DVAlexHiggs commented Mar 22, 2023

[BUG] Using 'vault_insert_by_period' materialisation with period='MILLISECOND' results in Unhandled error: Range too big. #175

[BUG] Using 'vault_insert_by_period' materialisation with period='MILLISECOND' results in Unhandled error: Range too big. #175

Comments

anouar-zh commented Dec 23, 2022 • edited

DVAlexHiggs commented Dec 23, 2022 • edited

anouar-zh commented Dec 23, 2022

DVAlexHiggs commented Dec 23, 2022

DVAlexHiggs commented Jan 25, 2023 • edited by DVTimWilson

DVAlexHiggs commented Mar 22, 2023

anouar-zh commented Dec 23, 2022 •

edited

DVAlexHiggs commented Dec 23, 2022 •

edited

DVAlexHiggs commented Jan 25, 2023 •

edited by DVTimWilson