-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Using 'vault_insert_by_period' materialisation with period='MILLISECOND' results in Unhandled error: Range too big. #175
Comments
Hi, thanks for your report! It's odd that datediff has different parameter order in Databricks, that's perhaps something that needs to be reported to them. We'll look into that (we can fix it our end by using name parameters though) Generally, millisecond is ill-advised as an iteration period for most use cases, as there will potentially be thousands and thousands of iterations for dbt to perform. This will be badly performing. How many iterations are there for dbt to do if this were to work? Either way, you're right that this shouldn't produce an error like this of course, so we'd like to have a solution for this. |
In terms of iterations using milliseconds I think a maximum of 20-50 iterations. Is this issue a candidate for a minor fix/release? |
Yes of course. Will be worked on after the holiday. Thank you for your report |
Hi. We've done some investigation on this and essentially it's due to the Due to this, this is something we won't be fixing, but will instead provide some error handling and a friendly message, such as: "Max iterations is 100,000. Consider using a different time period (e.g. day). vault_insert_by materialisations are not intended for this purpose, please see -link to docs-" We do need some more guidance on this in the docs, but essentially the Our aim for a while now has been to implement waterlevel macros for loading correctly, and release more guidance and documentation on the matter, as we understand how to load is a pain-point for many dbtvault users. |
Hi! We've added a warning and some better handling around this issue in 0.9.5. In a future update, (likely 0.9.6) we will be raising an exception when an attempt is made to use milliseconds. This report prompted us re-visit our materialisation logic and we've got some refinements and big fixes coming up. Thank you! |
Describe the bug
I want to load data which I receive in millisecond (example load_dts: 2022-11-09 05:01:23.351).
When I use the 'vault_insert_by_period' materialisation with period='MILLISECOND' it results in Unhandled error: Range too big.
14:01:48 Unhandled error while executing model.datavault.hs_store_exploitation Range too big. The sandbox blocks ranges larger than MAX_RANGE (100000).
Environment
dbt version: dbt=1.3.1
dbtvault version: 0.9.1
Database/Platform: databricks
To Reproduce
Steps to reproduce the behavior:
period='MILLISECOND'
dbt run
Expected behavior
I expect data to be loaded just like it would have been loaded when period day was used
Screenshots
Log files
�[0m08:04:53.174466 [error] [MainThread]: �[33mRange too big. The sandbox blocks ranges larger than MAX_RANGE (100000).�[0m �[0m08:04:52.211738 [debug] [Thread-4 ]: Databricks adapter: NotImplemented: rollback
Additional context
The databricks datediff(unit, start, end) receives the unit of measure period as first parameter as opposed to the third parameter in https://github.com/Datavault-UK/dbtvault/blob/4dc4cab0daa9ca17d703a42d50ad651fd2ee707e/macros/materialisations/period_mat_helpers/get_period_boundaries.sql
The text was updated successfully, but these errors were encountered: