Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-968] [Feature] Disallow “view” materialization if Python #5569

Closed
1 task done
ChenyuLInx opened this issue Jul 28, 2022 · 3 comments · Fixed by #5695
Closed
1 task done

[CT-968] [Feature] Disallow “view” materialization if Python #5569

ChenyuLInx opened this issue Jul 28, 2022 · 3 comments · Fixed by #5695
Assignees
Labels
enhancement New feature or request jira python_models Team:Adapters Issues designated for the adapter area of the code

Comments

@ChenyuLInx
Copy link
Contributor

Is this your first time opening an issue?

Describe the Feature

Current dbt python model doesn't support view materialization. But instead of raise a clear error, it will try to run python code with sql together.

Describe alternatives you've considered

Add check for language in materialization code- meaning we will have to update a lot of them

Who will this benefit?

python model user

Are you interested in contributing this feature?

Yes

Anything else?

No response

@ChenyuLInx ChenyuLInx added enhancement New feature or request triage and removed triage labels Jul 28, 2022
@github-actions github-actions bot changed the title [Feature] Disallow “view” materialization if Python [CT-950] [Feature] Disallow “view” materialization if Python Jul 28, 2022
@ChenyuLInx ChenyuLInx changed the title [CT-950] [Feature] Disallow “view” materialization if Python [CT-944] [Feature] Disallow “view” materialization if Python Jul 28, 2022
@ChenyuLInx ChenyuLInx mentioned this issue Jul 28, 2022
5 tasks
@jtcohen6 jtcohen6 added Team:Adapters Issues designated for the adapter area of the code Team:Language python_models labels Jul 28, 2022
@jtcohen6
Copy link
Contributor

Copying idea shared in Slack thread

  • Each materialization macro records metadata about which languages it supports, as a property of the macro node
  • We validate here at runtime that the node's language is in the materialization macro's supported languages
  • By default, all materialization macros support SQL — and then we "opt in" table/incremental on Snowflake + BigQuery + Spark/Databricks to support Python too

This is a pattern I could see wanting to extend for other things that are implicit in their macro code today. E.g.

does this materialization on this adapter respect --full-refresh?
does this materialization on this adapter "carry over" grants?
does this materialization on this adapter create a database object that's "auto-updating" for new data (views, materialized views), or needs manual updating by dbt (table, incremental)?

We have syntax today to "tag" (?) materializations with their intended adapter support:

{% materialization table, default %}
...
{% materialization incremental, adapter="snowflake" %}

Maybe there's something we could use/extend here?

{% materialization incremental, adapter="snowflake", languages=["sql", "python"] %}

Behind the scenes, dbt uses those not-quite-arguments to construct the actual macro name (e.g. materialization_incremental_snowflake). I don't think we want to do that here! — but maybe pull it off and store it as a separate property of the node?

class MaterializationExtension(jinja2.ext.Extension):
tags = ["materialization"]
def parse(self, parser):
node = jinja2.nodes.Macro(lineno=next(parser.stream).lineno)
materialization_name = parser.parse_assign_target(name_only=True).name
adapter_name = "default"
node.args = []
node.defaults = []
while parser.stream.skip_if("comma"):
target = parser.parse_assign_target(name_only=True)
if target.name == "default":
pass
elif target.name == "adapter":
parser.stream.expect("assign")
value = parser.parse_expression()
adapter_name = value.value
else:
invalid_materialization_argument(materialization_name, target.name)
node.name = get_materialization_macro_name(materialization_name, adapter_name)
node.body = parser.parse_statements(("name:endmaterialization",), drop_needle=True)
return node

@gshank gshank added the jira label Aug 1, 2022
@github-actions github-actions bot changed the title [CT-944] [Feature] Disallow “view” materialization if Python [CT-968] [CT-944] [Feature] Disallow “view” materialization if Python Aug 1, 2022
@jtcohen6 jtcohen6 changed the title [CT-968] [CT-944] [Feature] Disallow “view” materialization if Python [CT-968] [Feature] Disallow “view” materialization if Python Aug 3, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 3, 2022

The place in the code (runtime) where the validation error should raise:

materialization_macro = manifest.find_materialization_macro_by_name(
self.config.project_name, model.get_materialization(), self.adapter.type()
)
if materialization_macro is None:
missing_materialization(model, self.adapter.type())
if "config" not in context:
raise InternalException(
"Invalid materialization context generated, missing config: {}".format(context)
)

why do this?

If we don't do this, dbt does this:

create view ... as (
    ... python code ...
)

Which is a very confusing message for end users :)

@ChenyuLInx
Copy link
Contributor Author

Updates that came from some pairing time with @stu-k today:

If we write our materialization macro as something like:

{% materialization seed, adapter='snowflake', language=['sql'] %}
actual logic
{% endmaterialization %}

we can just parse out the args and also add a default value for things that doesn't have language specified by updating part of the parsing code from

class MaterializationExtension(jinja2.ext.Extension):
tags = ["materialization"]
def parse(self, parser):
node = jinja2.nodes.Macro(lineno=next(parser.stream).lineno)
materialization_name = parser.parse_assign_target(name_only=True).name
adapter_name = "default"
node.args = []
node.defaults = []
while parser.stream.skip_if("comma"):
target = parser.parse_assign_target(name_only=True)
if target.name == "default":
pass
elif target.name == "adapter":
parser.stream.expect("assign")
value = parser.parse_expression()
adapter_name = value.value
else:
invalid_materialization_argument(materialization_name, target.name)
node.name = get_materialization_macro_name(materialization_name, adapter_name)
node.body = parser.parse_statements(("name:endmaterialization",), drop_needle=True)
return node

into

    def parse(self, parser):
        node = jinja2.nodes.Macro(lineno=next(parser.stream).lineno)
        materialization_name = parser.parse_assign_target(name_only=True).name

        adapter_name = "default"
        node.args = []
        node.defaults = []

        while parser.stream.skip_if("comma"):
            target = parser.parse_assign_target(name_only=True)

            if target.name == "default":
                pass

            elif target.name == "adapter":
                parser.stream.expect("assign")
                value = parser.parse_expression()
                adapter_name = value.value
            elif target.name == "language":
                parser.stream.expect("assign")
                value = parser.parse_expression()
                target.set_ctx("param")
                node.args.append(target)
                node.defaults.append(value)
            else:
                invalid_materialization_argument(materialization_name, target.name)
        if not node.args:
            node.args = [jinja2.nodes.Name("language", "store")]
            node.defaults = [jinja2.nodes.List([jinja2.nodes.Const('sql')])]
        node.name = get_materialization_macro_name(materialization_name, adapter_name)

        node.body = parser.parse_statements(("name:endmaterialization",), drop_needle=True)

        return node

Then node.args will contain argument 'language', node.defaults will contain the parse values.

Note that this part of the parsing actually doesn't look into any of the body of materialization macro. It just look up the function definition and originally only used to get the name.

The code that calls the parsing logic above is the following

ast = jinja.parse(block.full_block)
except ParsingException as e:
e.add_node(base_node)
raise
macro_nodes = list(ast.find_all(jinja2.nodes.Macro))
if len(macro_nodes) != 1:
# things have gone disastrously wrong, we thought we only
# parsed one block!
raise ParsingException(
f"Found multiple macros in {block.full_block}, expected 1", node=base_node
)
macro_name = macro_nodes[0].name
if not macro_name.startswith(MACRO_PREFIX):
continue
name: str = macro_name.replace(MACRO_PREFIX, "")
node = self.parse_macro(block, base_node, name)

You can see macro_name is being passed to create a ParsedMacro.

We can add an optional attribute in ParsedMacro to store the supported language, then check that and model's language at where @jtcohen6 linked above.

@gshank any suggestions on better ways of doing this is welcomed. I feel like we are abusing those attributes a bit but not too sure what's the better way to do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jira python_models Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants