Skip to content

Feat: Add virtual properties and rename table_properties to physical_properties#2561

Merged
izeigerman merged 11 commits intoSQLMesh:mainfrom
Kayrnt:virtual_properties
May 7, 2024
Merged

Feat: Add virtual properties and rename table_properties to physical_properties#2561
izeigerman merged 11 commits intoSQLMesh:mainfrom
Kayrnt:virtual_properties

Conversation

@Kayrnt
Copy link
Copy Markdown
Contributor

@Kayrnt Kayrnt commented May 3, 2024

closes #2014

  • A new property virtual_properties is introduced: it is a dictionary that propagate to the engine the provided properties to the virtual layer (the "target view")
  • physical_properties replaces table_properties (we keep the value as an alias for backward compatibility)
  • A deprecation warning will be issued to use physical_propertiesinstead of table_properties

To be figured out:

  • What should we do in the case of dbt adapter? Should we apply the label to:
    • Just the physical layer
    • Just the virtual layer
    • Both
      ? So far, it's just on the physical layer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this and do a migration to move all physical_properties in the database to virtual_properties?

but accept models in the file system from the model file to map to physical_properties?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@Kayrnt Kayrnt May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't these fields meant for Pydantic field parsing?
My goal is to be able to parse:

  • table_properties ("deprecated")
  • physical_properties (the new table_properties)
  • virtual_properties (the new properties for virtual layer)

Should that append with a custom Field to factor table_propertiesand physical_properties.

can we remove this and do a migration to move all physical_properties in the database to virtual_properties?

Do you mean all table_properties to physical_properties to migrate?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i imagine pydantic will only have two fields, physical/virtual_properties. we migrate in the state all usage of table_properties -> physical properties.

at parse time, if we detect table_properties, we log a message and remap that to physical_properties

Copy link
Copy Markdown
Collaborator

@izeigerman izeigerman May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to migrate anything. Can't we just add a table_properties alias to the physical_properties_ field?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, we can also handle this in the root model validator. And just pop table_properties, emit a warning, and then set it as physical_properties

Copy link
Copy Markdown
Collaborator

@izeigerman izeigerman May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, only physical_properties should stay, and no updates should be necessary to the EngineAdapter layer. EngineAdapter shouldn't care whether it's "physical" or "virtual". Different level of abstraction.

Copy link
Copy Markdown
Contributor Author

@Kayrnt Kayrnt May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think a migration is needed or maybe I'm missing?
@izeigerman, in this commit I've updated the Pydantic parsing to use both with the related warning. Isn't it what you suggest?
Regarding EngineAdapter, I think we could indeed skip the physical / virtual naming and skip to the actual materialization related naming: table_properties / view_properties WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this commit I tried to set proper boundaries with physical/virtual until the evaluator that translate those into table/view (depending on the actual chosen materialization).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Kayrnt, after additional consideration, I've realized that you do need the migration after all. Since we're getting rid of the table_properties field altogether we do need to reflect this in the existing state.

The fact that you had to change the fixtures/migrations/snapshots.json file confirms this suspicion. If we don't migrate, the next time users run the plan command after migration, it will show the diff because the field name has changed.

You can test this by first initializing a project with duckdb using a previous SQLMesh version and then upgrading to your current branch and running the plan command.

Sorry about the confusion initially, I really hoped we could avoid it.

@tobymao
Copy link
Copy Markdown
Contributor

tobymao commented May 3, 2024

is the error due to a new version of pydantic?

@Kayrnt
Copy link
Copy Markdown
Contributor Author

Kayrnt commented May 3, 2024

is the error due to a new version of pydantic?

No I think it's PR code related as it works if I just use physical_properties. I suspect it's in the common code you pointed out 👀

@Kayrnt Kayrnt force-pushed the virtual_properties branch 2 times, most recently from 7506512 to 32bdcbd Compare May 3, 2024 18:39
@Kayrnt Kayrnt marked this pull request as ready for review May 3, 2024 19:15
@Kayrnt Kayrnt force-pushed the virtual_properties branch from 2fb4149 to 1632b98 Compare May 3, 2024 21:44
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use the builder api for the tests,

exp.array("('test-label', 'label-value')")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's more compact 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use sqlglot builder here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, then I'll update the existing code too to be consistent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what changed here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table_properties was renamed to physical_properties and since this is a single line of JSON the whole thing is included in the diff.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file shouldn't have changed. The fact that it did indicates that a migration script is necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could keep table_properties as the Pydantic model is backward compatible (unless it's used another way?).
I didn't look into migrations so far so I'm not sure how it works and what's needed.

@tobymao
Copy link
Copy Markdown
Contributor

tobymao commented May 4, 2024

seems almost ready, @izeigerman can you take a final look

@georgesittas georgesittas changed the title WIP - Add virtual properties and rename table_properties to physical_… Feat: Add virtual properties and rename table_properties to physical_properties May 4, 2024
Copy link
Copy Markdown
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, a few small suggestions.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A key-value of arbitrary table properties specific to the target engine applied on SQLMesh physical layer. For example:
- A key-value mapping of arbitrary table properties specific to the target engine applied on SQLMesh's physical layer. For example:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A key-value of arbitrary table properties specific to the target engine applied on SQLMesh virtual layer. For example:
- A key-value mapping of arbitrary table properties specific to the target engine applied on SQLMesh's virtual layer. For example:

Comment on lines 31 to 32
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `physical_properties` | Arbitrary table properties specific to the target engine applied on SQLMesh physical layer. Specified as key-value pairs (`key = value`)
| `virtual_properties` | Arbitrary table properties specific to the target engine applied on SQLMesh virtual layer. Specified as key-value pairs (`key = value`) | dict | N |
| `physical_properties` | Arbitrary table properties specific to the target engine applied on SQLMesh's physical layer. Specified as key-value pairs (`key = value`)
| `virtual_properties` | Arbitrary table properties specific to the target engine applied on SQLMesh's virtual layer. Specified as key-value pairs (`key = value`) | dict | N |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is expected to always exist right? Given that name is required.

Suggested change
model_name = values.get("name")
model_name = values["name"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it feels like it should always exists as it's explicitly added in the provided validation fields. I'll add that change so that it fails eagerly if that behavior changes (thus preventing a silent switch to a Python model "None"' log).

Comment on lines 351 to 354
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
physical_properties = {}
for expression in self.physical_properties_.expressions:
physical_properties[expression.this.name] = expression.expression
return physical_properties
return {e.this.name: e.expression for e in self.physical_properties_.expressions}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, those suggestions appears more "Pythonic" 👍

Comment on lines 361 to 364
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual_properties = {}
for expression in self.virtual_properties_.expressions:
virtual_properties[expression.this.name] = expression.expression
return virtual_properties
return {e.this.name: e.expression for e in self.virtual_properties_.expressions}

@Kayrnt Kayrnt force-pushed the virtual_properties branch 3 times, most recently from 4416d09 to 8bae4b4 Compare May 4, 2024 23:20
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model properties means something else throughout the code so let's not use it here. Additionally, the engine adapter is too low level of a construct to be aware of something as high-level as "model".

I'm ok keeping with keeping this as table_properties or just properties since that's what we create here. If you want to be more specific I'm also fine with something like table_or_view_properties.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see what you mean!

The goal of that change was to be consistent with the call parameters since we have both:

I would also suggest for table_or_view_properties (and table_or_view_properties_to_expressions) then as I'd rather have future maintainers understand that the function expects one or the other (and it's not a bug).

I'll make an update

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great 👍

@izeigerman
Copy link
Copy Markdown
Collaborator

Unfortunately, the migration script is required to rename table_properties to physical_properties in the existing SQLMesh state. One minor naming comment. Looks great otherwise, thanks!

@Kayrnt Kayrnt force-pushed the virtual_properties branch from 8bae4b4 to c456af3 Compare May 6, 2024 18:04
@Kayrnt
Copy link
Copy Markdown
Contributor Author

Kayrnt commented May 6, 2024

@izeigerman Looking a bit at the migration system, what you suggest is to create a new migration (something like v0050_table_properties_to_physical_properties) where:

  • I parse snapshot from snapshots table
  • I substitute the table_properties with an equivalent physical_properties
  • I save again updated snapshot from snapshots table

I could do it in SQL but I wonder if I can safely assume that all state connections can support JSON modifications or else try a replace on "table_properties" but then it could conflict with existing values. Any preference? I think the most simpler/safer would be to parse the JSON in Python but hopefully there's no installation with millions of rows to migrate else it might take ages to migrate.

@izeigerman
Copy link
Copy Markdown
Collaborator

@Kayrnt yes, the steps look accurate to me.

I think the most simpler/safer would be to parse the JSON in Python but hopefully there's no installation with millions of rows to migrate else it might take ages to migrate.

Yes, that's the safest approach which works uniformly across all supported engines. You shouldn't worry about millions of rows. It's going to be 10s of thousands at most.

Copy link
Copy Markdown
Collaborator

@izeigerman izeigerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks a lot for addressing changes!

@izeigerman izeigerman merged commit 0b9e184 into SQLMesh:main May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bigquery] add labels property to VIEWs

5 participants