Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-incremental materialized views #1255

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
be7ce06
Add allow_non_incremental_definition option to BigQuery materialized …
bnaul Nov 8, 2023
85e3426
Add changelog
bnaul Nov 9, 2023
c976c81
Fix default value for flag
bnaul Nov 9, 2023
b3e9eea
Set flag in test
bnaul Nov 9, 2023
97332e6
Recreate if allow_non_incremental_definition changed
bnaul Nov 10, 2023
e75ccac
Parse maxStaleness
bnaul Nov 12, 2023
017c1cc
Merge branch 'main' into non_incremental_mat_view
mikealfare Nov 12, 2023
02c9f7c
Merge branch 'main' into non_incremental_mat_view
colin-rogers-dbt Jan 2, 2024
d5ef7aa
Merge branch 'main' into non_incremental_mat_view
mikealfare Feb 13, 2024
063baeb
Merge branch 'main' into non_incremental_mat_view
mikealfare Jun 7, 2024
526cd87
Merge remote-tracking branch 'bnaul/non_incremental_mat_view' into no…
mikealfare Jun 11, 2024
40d3e55
implement feedback from #1011
mikealfare Jun 11, 2024
ce9724b
move options config change logic next to options config, reduce to on…
mikealfare Jun 12, 2024
d62a80d
fix the expected value for max_staleness
mikealfare Jun 12, 2024
ee54a7c
move setup into basic tests since it already exists in the changes base
mikealfare Jun 12, 2024
2caf436
remove setup from rerun class because it's not needed
mikealfare Jun 12, 2024
4858eef
move the setup back into the mixin so that all relations are refreshed
mikealfare Jun 12, 2024
309e880
Merge branch 'main' into non_incremental_mat_view
mikealfare Jun 13, 2024
856dbc2
account for interaction between enable_refresh and allow_non_incremen…
mikealfare Jun 13, 2024
d22d83b
Merge branch 'non_incremental_mat_view' of ssh://github.com/dbt-labs/…
mikealfare Jun 13, 2024
64c397f
remove inadvertent format changes
mikealfare Jun 13, 2024
fae9e4f
pass in the new options config as context instead of the diffs
mikealfare Jun 18, 2024
79b8830
add validation rules for inputs
mikealfare Jun 18, 2024
2841b14
allow for unsetting options during diffs
mikealfare Jun 18, 2024
1a9db60
add tests for refresh config changes
mikealfare Jun 18, 2024
06019c9
refactor materialized views to properly manage options changes
mikealfare Jun 20, 2024
b2f67f7
changelog
mikealfare Jun 20, 2024
05402b7
update replace to match create
mikealfare Jun 20, 2024
1f92240
update tests to reflect removing the options subconfig
mikealfare Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Features-20231108-140752.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Features
body: Support allow_non_incremental_definition option in BigQuery materialized views
time: 2023-11-08T14:07:52.28972-05:00
custom:
Author: bnaul
Issue: "672"
2 changes: 1 addition & 1 deletion dbt/adapters/bigquery/connections.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ def format_rows_number(self, rows_number):
return f"{rows_number:3.1f}{unit}".strip()

@classmethod
def get_google_credentials(cls, profile_credentials) -> GoogleCredentials:
def get_google_credentials(cls, profile_credentials) -> GoogleCredentials.Credentials:
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
method = profile_credentials.method
creds = GoogleServiceAccountCredentials.Credentials

Expand Down
1 change: 1 addition & 0 deletions dbt/adapters/bigquery/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ class BigqueryConfig(AdapterConfig):
max_staleness: Optional[str] = None
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
enable_list_inference: Optional[bool] = None
intermediate_format: Optional[str] = None
allow_non_incremental_definition: Optional[bool] = None
mikealfare marked this conversation as resolved.
Show resolved Hide resolved


class BigQueryAdapter(BaseAdapter):
Expand Down
7 changes: 4 additions & 3 deletions dbt/adapters/bigquery/relation.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,10 @@ def materialized_view_config_changeset(
new_materialized_view = cls.materialized_view_from_relation_config(relation_config)

if new_materialized_view.options != existing_materialized_view.options:
config_change_collection.options = BigQueryOptionsConfigChange(
action=RelationConfigChangeAction.alter,
context=new_materialized_view.options,
# get an options change object with only the options that have changed
Copy link
Contributor

@VersusFacit VersusFacit Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opposed to what. Pardon, just not sure what this comment refers to with the prexisting code. Were we instantiating the object in a different way? I see you're no longer using kwargs which is fine, but what's the intent here?

edit: Oh I think I get it now based on this below https://github.com/dbt-labs/dbt-bigquery/pull/1255/files#r1639268560. I really like the from style creation mechanism -- reminds me of Rust trait From, but I think we might alter how you've implemented it slightly!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were creating an options object with all options, not just those that changed, and it was creating issues when altering the options. For example, options that force a full refresh would always show up in the change even if they didn't change, hence changing an option that could be implemented via an ALTER statement would still trigger a full refresh due to the presence of the other options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to indicate that better in the comment. I don't know how it would be good to say that but something like "This is how is used to be done. This is why we were are doing it this way deliberately. This resulted in X. We want to avoid X."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After going around the block on this a few times today, I'm not adopting the approach of simply submitting the entire new options config. This comment no longer applies. However there are new comments for the new scenario. I'm looking for feedback on those comments as a result of this thread.

config_change_collection.options = BigQueryOptionsConfigChange.from_options_configs(
new_materialized_view.options,
existing_materialized_view.options,
)

if new_materialized_view.partition != existing_materialized_view.partition:
Expand Down
41 changes: 37 additions & 4 deletions dbt/adapters/bigquery/relation_configs/_options.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from dataclasses import dataclass
from dataclasses import asdict, dataclass
from datetime import datetime, timedelta
from typing import Any, Dict, Optional

Expand All @@ -9,6 +9,7 @@

from dbt.adapters.bigquery.relation_configs._base import BigQueryBaseRelationConfig
from dbt.adapters.bigquery.utility import bool_setting, float_setting, sql_escape
from dbt.adapters.relation_configs import RelationConfigChangeAction


@dataclass(frozen=True, eq=True, unsafe_hash=True)
Expand All @@ -22,6 +23,7 @@ class BigQueryOptionsConfig(BigQueryBaseRelationConfig):
refresh_interval_minutes: Optional[float] = 30
expiration_timestamp: Optional[datetime] = None
max_staleness: Optional[str] = None
allow_non_incremental_definition: Optional[bool] = None
kms_key_name: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This continues the comment above regarding defaulting Optional[bool]s to None in BigQueryConfig. The intent in this section is to match the defaulting behavior of BigQuery according to their docs. The conclusion that we reached above is that we should have a comment regarding the fact that BQ effectively allows a third falsy value (None) for some booleans (allow_non_incremental_definition). There is a link in the docstring to those configurations. While I don't want to pull too much information from that link (to avoid getting out of sync with BQ docs), I will add a generic comment to the top that indicates that some booleans are option in BQ in the sense that there is literally no setting if it's not provided (versus defaulting to false). Does that work?

description: Optional[str] = None
labels: Optional[Dict[str, str]] = None
Expand Down Expand Up @@ -58,6 +60,7 @@ def array(x):
"refresh_interval_minutes": numeric,
"expiration_timestamp": interval,
"max_staleness": interval,
"allow_non_incremental_definition": boolean,
"kms_key_name": string,
"description": escaped_string,
"labels": array,
Expand Down Expand Up @@ -85,6 +88,7 @@ def from_dict(cls, config_dict: Dict[str, Any]) -> Self:
"refresh_interval_minutes": float_setting,
"expiration_timestamp": None,
"max_staleness": None,
"allow_non_incremental_definition": bool_setting,
"kms_key_name": None,
"description": None,
"labels": None,
Expand All @@ -101,7 +105,13 @@ def formatted_setting(name: str) -> Any:
# avoid picking up defaults on dependent options
# e.g. don't set `refresh_interval_minutes` = 30 when the user has `enable_refresh` = False
if kwargs_dict["enable_refresh"] is False:
kwargs_dict.update({"refresh_interval_minutes": None, "max_staleness": None})
kwargs_dict.update(
{
"refresh_interval_minutes": None,
"max_staleness": None,
"allow_non_incremental_definition": None,
}
)

options: Self = super().from_dict(kwargs_dict) # type: ignore
return options
Expand All @@ -115,6 +125,7 @@ def parse_relation_config(cls, relation_config: RelationConfig) -> Dict[str, Any
"refresh_interval_minutes",
"expiration_timestamp",
"max_staleness",
"allow_non_incremental_definition",
"kms_key_name",
"description",
"labels",
Expand All @@ -139,7 +150,14 @@ def parse_bq_table(cls, table: BigQueryTable) -> Dict[str, Any]:
"enable_refresh": table.mview_enable_refresh,
"refresh_interval_minutes": table.mview_refresh_interval.seconds / 60,
"expiration_timestamp": table.expires,
"max_staleness": None,
"max_staleness": (
f"INTERVAL '{table._properties.get('maxStaleness')}' YEAR TO SECOND"
if table._properties.get("maxStaleness")
else None
),
"allow_non_incremental_definition": table._properties.get("materializedView", {}).get(
"allowNonIncrementalDefinition"
),
"description": table.description,
}

Expand All @@ -158,4 +176,19 @@ class BigQueryOptionsConfigChange(RelationConfigChange):

@property
def requires_full_refresh(self) -> bool:
return False
# allow_non_incremental_definition cannot be changed via an ALTER statement
return self.context.allow_non_incremental_definition is not None

@classmethod
def from_options_configs(
cls, new_options: BigQueryOptionsConfig, existing_options: BigQueryOptionsConfig
) -> Self:
new_options_dict = asdict(new_options)
existing_options_dict = asdict(existing_options)
option_diffs_dict = {
k: v
for k, v in new_options_dict.items()
if new_options_dict[k] != existing_options_dict[k]
}
option_diffs = BigQueryOptionsConfig.from_dict(option_diffs_dict)
return cls(action=RelationConfigChangeAction.alter, context=option_diffs)
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@
cluster_by=["id", "value"],
enable_refresh=True,
refresh_interval_minutes=60,
max_staleness="INTERVAL 45 MINUTE"
max_staleness="INTERVAL '0-0 0 0:45:0' YEAR TO SECOND",
allow_non_incremental_definition=True
) }}
select
id,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def my_other_base_table(self, project) -> BaseRelation:
)

@pytest.fixture(scope="function", autouse=True)
def setup(self, project, my_base_table, my_other_base_table, my_materialized_view): # type: ignore
def setup(self, project, my_materialized_view):
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
run_dbt(["seed"])
run_dbt(["run", "--full-refresh"])

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ def check_start_state(project, materialized_view):
assert isinstance(results, BigQueryMaterializedViewConfig)
assert results.options.enable_refresh is True
assert results.options.refresh_interval_minutes == 60
assert results.options.max_staleness == "INTERVAL '0-0 0 0:45:0' YEAR TO SECOND"
assert results.options.allow_non_incremental_definition is True
assert results.partition.field == "record_valid_date"
assert results.partition.data_type == "datetime"
assert results.partition.granularity == "day"
Expand All @@ -27,17 +29,17 @@ def check_start_state(project, materialized_view):
@staticmethod
def change_config_via_alter(project, materialized_view):
initial_model = get_model_file(project, materialized_view)
new_model = initial_model.replace("enable_refresh=True", "enable_refresh=False")
new_model = initial_model.replace(
"refresh_interval_minutes=60", "refresh_interval_minutes=45"
)
set_model_file(project, materialized_view, new_model)

@staticmethod
def check_state_alter_change_is_applied(project, materialized_view):
with get_connection(project.adapter):
results = project.adapter.describe_relation(materialized_view)
assert isinstance(results, BigQueryMaterializedViewConfig)
# these change when run manually
assert results.options.enable_refresh is False
assert results.options.refresh_interval_minutes == 30 # BQ returns it to the default
assert results.options.refresh_interval_minutes == 45

@staticmethod
def change_config_via_replace(project, materialized_view):
Expand All @@ -63,6 +65,9 @@ def change_config_via_replace(project, materialized_view):
initial_model.replace(old_partition, new_partition)
.replace("'my_base_table'", "'my_other_base_table'")
.replace('cluster_by=["id", "value"]', 'cluster_by="id"')
.replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also test going from allow_non_incremental_definition=False to allow_non_incremental_definition=True

"allow_non_incremental_definition=True", "allow_non_incremental_definition=False"
)
)
set_model_file(project, materialized_view, new_model)

Expand All @@ -75,6 +80,7 @@ def check_state_replace_change_is_applied(project, materialized_view):
assert results.partition.data_type == "int64"
assert results.partition.range == {"start": 0, "end": 500, "interval": 50}
assert results.cluster.fields == frozenset({"id"})
assert results.options.allow_non_incremental_definition is False


class TestBigQueryMaterializedViewChangesApply(
Expand Down
Loading