Conversation
|
|
||
| ### @PHYSICAL_LOCATION | ||
|
|
||
| `@PHYSICAL_LOCATION` is a helper macro intended to be used in situations where you need to gain acccess to the *components* of the physical table name. It's intended for use in the following situations: |
There was a problem hiding this comment.
As always, naming things is one of the hardest problems. I considered:
@physical_table- the object being created is not always a table, such askind: view@this_snapshot- uses the terminologysnapshotwhich I can see in some places is being eschewed for "model version"@this_model_version- kind of long winded
26085a9 to
52624f7
Compare
52624f7 to
cbc7fbf
Compare
| **common_render_kwargs, | ||
| ) | ||
|
|
||
| rendered_physical_properties = snapshot.model.render_physical_properties( |
There was a problem hiding this comment.
Perhaps this could be moved inside apply, so it doesn't occur when limit is None where the properties aren't needed. Though this doesn't seem a particularly expensive render, so it's likely not that significant
There was a problem hiding this comment.
Oh, the reason I had it outside apply was so that we didnt occur the hit for every index being applied since we only need to render it once.
I see what you're saying though. Perhaps it could be rendered right after the limit check and passed to apply instead? The problem is, apply is called in a bunch of places so all the call sites would need to be updated
There was a problem hiding this comment.
Yes you're right, I hadn't considered that. Maybe then something like this:
rendered_physical_properties = snapshot.model.render_physical_properties(
**render_statements_kwargs
) if limit is None else NoneBut arguably I don't think it is that much of a slowdown, so the way you have it now should be fine
sqlmesh/core/macros.py
Outdated
| mode: exp.Literal = exp.Literal.string("literal"), | ||
| ) -> t.Union[exp.Literal, exp.Table]: | ||
| """ | ||
| Generates a either a String literal or an exp.Table representing a physical table location, based on rendering the provided template String literal. |
There was a problem hiding this comment.
Small typo: Generates either a String literal...
… physical_properties
cbc7fbf to
0f03c7b
Compare
| raise SQLMeshError(f"Expected one expression but got {len(rendered_exprs)}") | ||
| return rendered_exprs[0].transform(d.replace_merge_table_aliases) | ||
|
|
||
| def render_physical_properties(self, **render_kwargs: t.Any) -> t.Dict[str, exp.Expression]: |
There was a problem hiding this comment.
Don't we want the same behavior for virtual properties?
There was a problem hiding this comment.
Recording the result of our internal discussion here: maybe, but I couldnt think of a use-case and was trying to keep this PR small.
Since the load time rendering context is a subset of the runtime rendering context, if we were to later find a use-case to render virtual_properties at runtime then we could make the switch then without breaking backwards compatibility
sqlmesh/core/macros.py
Outdated
| Example: | ||
| >>> from sqlglot import parse_one, exp | ||
| >>> from sqlmesh.core.macros import MacroEvaluator, RuntimeStage | ||
| >>> sql = "@physical_location('s3://data-bucket/prod/@{catalog_name}/@{schema_name}/@{table_name}')" |
sqlmesh/core/macros.py
Outdated
|
|
||
|
|
||
| @macro() | ||
| def physical_location( |
There was a problem hiding this comment.
IMHO, the name is somewhat confusing and doesn't accurately reflect what it does. We already have resolve_table can this be resolve_template? Or perhaps we can extend resolve_table with an optional template instead? If it's provided then we follow the template, if not we return fqn.
There was a problem hiding this comment.
Yep, I don't like the current name either.
The problem with extending resolve_table is that it's not a macro per se - it's a lambda that gets passed into the template context. And it's also available directly in Python from Python models and is defined to return a str instead of an AST node so I think changing it will cause breakage.
I like @resolve_template - it doesnt limit the terminiology to tables only (since it's really "physical layer object" which could be a view for kind: VIEW models) and it indicates that templating is occurring
sqlmesh/core/macros.py
Outdated
| >>> evaluator.transform(parse_one(sql)).sql() | ||
| "'s3://data-bucket/prod/test_catalog/sqlmesh__test/test__test_model__2517971505'" | ||
| """ | ||
| if evaluator.runtime_stage != "loading": |
There was a problem hiding this comment.
FYI, there's also the "testing" runtime stage. How would these macro calls be handled in the context of unit tests? Can we actually unit test models that use this?
There was a problem hiding this comment.
Good call, i'll write a test to verify the behaviour during unit tests.
Theoretically it should be ignored entirely unless @this_model is available so maybe I adjust the condition to just check for the presence of that and ignore the runtime stage
There was a problem hiding this comment.
I've adjusted the logic to be dependent on the presence of @this_model instead and written a test to show it works from unit tests
sqlmesh/core/macros.py
Outdated
| else: | ||
| return exp.Literal.string(result) |
There was a problem hiding this comment.
[Nit]
| else: | |
| return exp.Literal.string(result) | |
| return exp.Literal.string(result) |
sqlmesh/core/macros.py
Outdated
| ) | ||
|
|
||
| if mode.this.lower() == "table": | ||
| return exp.to_table(result) |
There was a problem hiding this comment.
Should we pass the dialect in this to_table call?
There was a problem hiding this comment.
Man this trap for young players always gets me 🤦
Thanks for noticing it, will fix
| # Discard the potentially half-rendered versions of these properties and replace them with the | ||
| # original unrendered versions. They will get rendered properly at evaluation time | ||
| meta_fields.update(unrendered_properties) |
There was a problem hiding this comment.
I think this is breaking because it impacts the data hash. If someone used @gateway or other load-time-renderable variables within physical_properties, then they should get a diff due to this change.
This is similar to the recent when_matched refactor, IIRC.
There was a problem hiding this comment.
Ah, yes, I was meant to ask about this.
I'm still not 100% with how to write migrations and how to test they're working, they seem to be quite difficult / error prone to write with lots of pitfalls.
Also, is there any default upgrade logic that we get "for free" with a sqlglot version change?
There was a problem hiding this comment.
I spoke with @izeigerman and there isnt a way to migrate this automatically.
The problem is that the state would contain already-rendered macros and we wouldnt know what the unrendered version would look like without the project files.
When a migration occurs, access to the project files is not guaranteed. So we are just hoping that not many people were using macros / variables in physical_properties.
The workaround if this triggers a bunch of metadata changes and tries to rebuild everything would be to run a --forward-only plan to update state but not rebuild any tables
There was a problem hiding this comment.
Sounds good– sorry for not getting back to this thread earlier.
|
|
||
| ### @RESOLVE_TEMPLATE | ||
|
|
||
| `@resolve_template` is a helper macro intended to be used in situations where you need to gain access to the *components* of the physical object name. It's intended for use in the following situations: |
There was a problem hiding this comment.
why template? i'm not sure i understand the name, seems a bit too generic
There was a problem hiding this comment.
maybe @render_template would be more appropriate?
It's essentially rendering an in-line template with @this_model broken out into its components and made available as @{catalog_name}, @{schema_name} and @{table_name} so the user can recombine those as required
My original implementation added these as variables to the main evaluation context so users could access them directly, something like:
physical_properties (
location = @'s3://bucket/@{catalog_name}/@{schema_name}/@{table_name}'
)
but there was a preference to create a macro instead?
| this_model = exp.to_table(evaluator.locals["this_model"]) | ||
| template_str: str = template.this | ||
| result = ( | ||
| template_str.replace("@{catalog_name}", this_model.catalog) |
There was a problem hiding this comment.
is this just doing string replacement?, is this safe?
There was a problem hiding this comment.
In short, yes. It's doing the same thing that this is doing to keep consistency. It deliberately mimics our macro syntax, is that what you meant by safety?
I was holding off "upgrading" it to actually run the template string through the macro evaluator via evaluator.evaluate(), figuring that could be easily added later if the need arose
sqlmesh/core/macros.py
Outdated
| def resolve_template( | ||
| evaluator: MacroEvaluator, | ||
| template: exp.Literal, | ||
| mode: exp.Literal = exp.Literal.string("literal"), |
There was a problem hiding this comment.
this can just be a literal string, the type convert will handle it
mode: str
* Chore: replace mysql-connector-python with pymysql (TobikoData#3788) * feat: don't force db connect if using serverless (TobikoData#3786) * Feat: Add 'auditing' and 'promoting' runtime stages (TobikoData#3791) * fix: databricks set default catalog for both connections (TobikoData#3793) * Docs: Prod Env Observability updates (TobikoData#3790) Co-authored-by: Trey Spiller <1831878+treysp@users.noreply.github.com> Co-authored-by: Trey Spiller <treyspiller@gmail.com> * Docs: update jinja gateway variable syntax (TobikoData#3795) * Fix: Only promote all snapshots if the target environment expired (TobikoData#3797) * Chore: remove [missing dates] from CLI model backfills title (TobikoData#3796) * Fix: Make sure that physical tables exist for promoted snapshots (TobikoData#3798) * fix: db properly support `with_log_level` (TobikoData#3799) * fix: run integration tests in session (TobikoData#3801) * Fix: specify init duckdb database so quickstart works (TobikoData#3800) * Docs: clarify in faq that run ignores local definitions (TobikoData#3794) * feat: add airflow operator and hook for ClickHouse (TobikoData#3699) * Feat: Allow macros in python model properties (TobikoData#3740) * Docs: update CLI quickstart's CLI output (TobikoData#3802) * Fix: Ensure diff sample displays when table names have been uppercased (TobikoData#3806) * Fix: update github links for pdoc api docs (TobikoData#3807) * Feat: Run audits as non-blocking on dev previews (TobikoData#3809) * Chore!: bump sqlglot to v26.6.0 (TobikoData#3810) * Feat!: Adjust physical_properties evaluation and add macro to resolve physical table names (TobikoData#3772) * Chore: Use 'dev' suffix instead of 'temp' for non-deployable physical tables (TobikoData#3803) * Fix!: Propagate the grain attribute when converting dbt models (TobikoData#3804) * Docs(dagster): Fix installation instructions (TobikoData#3812) * fix: add kwargs to build_table_properties_exp (TobikoData#3817) * Feat: include alter statements in destructive change error message (TobikoData#3805) * Dagster demo and tutorial video (TobikoData#3822) * fix: databricks with_log_level (TobikoData#3823) * Feat: improve audit error message formatting (TobikoData#3818) * Fix(postgres): Quote role names if required when running SET ROLE on cursor init (TobikoData#3825) * Fix: handle quoted projects properly in bigquery adapter (TobikoData#3820) * Chore: Fix flaky test (TobikoData#3828) * docs: adding self hosted executor docs (TobikoData#3816) * Revert "Fix(postgres): Quote role names if required when running SET ROLE on cursor init" (TobikoData#3834) * Fix: Don't fail because of an unrestorable change if the target model is forward-only (TobikoData#3835) * Chore: improve metadata update console printing (TobikoData#3824) * fix: respect disable_restatement remove intervals across env (TobikoData#3838) * fix: respect disable restate dev unpaused snapshots (TobikoData#3840) * Feat: add [WARNING] to console warning messages (TobikoData#3826) * Feat: Extend support of project wide model properties (TobikoData#3832) * Fix: Streamline execution of pre- / post- statements when creating a physical table (TobikoData#3837) * fix: signals that return an empty list are considered ready (TobikoData#3841) * Chore: Add the ingress section to the self-hosted executor docs * Fix: Snapshots promoted in prod shouldn't be restated in dev (TobikoData#3843) * Fix: Inference of python model names from the file system (TobikoData#3844) * feat: add support for datetime/date in macros (TobikoData#3846) * fix: only expand restatement range if incremental (TobikoData#3847) * feat: improve gcp postgres connection config options (TobikoData#3842) * Fix: Pin PyGithub to 2.5.0 so tests can run (TobikoData#3851) * Fix: Warn when SQLMesh automatically adjusts a restatement range to cover the whole model (TobikoData#3850) * fix!: normalize catalog override name (TobikoData#3849) * Feat: allow different warning messages for logger and console (TobikoData#3836) * Chore: fix audit doc typos (TobikoData#3856) * Add airflow tutorial video (TobikoData#3860) * Feat!: add model blueprinting (TobikoData#3848) * Fix: Make sure that pending restatement intervals are always recorded last during compaction (TobikoData#3862) * Chore: Break up the plan_builder method in Context (TobikoData#3867) * fix: give better error message when object not serializable (TobikoData#3861) * Feat: make date_spine macro less strict to allow dynamic behavior (TobikoData#3865) * Add Tcloud SSO docs (TobikoData#3827) * Fix: Unexpected backfill of a parent of a changed forward-only child when the child runs before the parent (TobikoData#3871) * Feat: Allow CustomKind subclasses for custom materializations (TobikoData#3863) * Chore: consolidate `make install-*` (TobikoData#3874) --------- Co-authored-by: Trey Spiller <1831878+treysp@users.noreply.github.com> Co-authored-by: Ryan Eakman <6326532+eakmanrq@users.noreply.github.com> Co-authored-by: Iaroslav Zeigerman <zeigerman.ia@gmail.com> Co-authored-by: Marisa Smith <66020208+mesmith027@users.noreply.github.com> Co-authored-by: Trey Spiller <treyspiller@gmail.com> Co-authored-by: Anton Parfenyuk <spar9a@gmail.com> Co-authored-by: Themis Valtinos <73662635+themisvaltinos@users.noreply.github.com> Co-authored-by: Gerasimos Kounadis <gerasimoskounadis@gmail.com> Co-authored-by: Erin Drummond <erin.dru@gmail.com> Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com> Co-authored-by: Sung Won Chung <sungwonchung3@gmail.com> Co-authored-by: Ben <9087625+benfdking@users.noreply.github.com> Co-authored-by: Philippe Laflamme <484152+plaflamme@users.noreply.github.com> Co-authored-by: Toby Mao <toby.mao@gmail.com> Co-authored-by: Afzal Jasani <amj355@nyu.edu>
This PR:
physical_propertiesto run time instead of load time@this_model@physical_locationthat allows a user to create arbitraryexp.Literalorexp.Tableobjects in the AST based on the parts of the physical table nameThe motivation is twofold:
physical_properties, becausephysical_propertiesis currently rendered at load time which means no snapshot info is available@this_modelreturns a fully rendered FQN string which makes it difficult to work with. We can't change this without breaking backwards compatibility, but we can introduce something else that allows users to build new strings based on the components of the FQN, rather than the FQN itself