Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
243 changes: 243 additions & 0 deletions docs/concepts/linter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# Linter guide

Linting is a powerful tool for improving code quality and consistency. It enables you to automatically validate model definition, ensuring they adhere to your team's best practices.

When a SQLMesh command is executed and the project is loaded, each model's code is checked for compliance with a set of rules you choose.

SQLMesh provides built-in rules, and you can define custom rules. This improves code quality and helps detect issues early in the development cycle when they are simpler to debug.

## Rules

Each linting rule is responsible for identifying a pattern in a model's code.

Some rules validate that a pattern is *not* present, such as not allowing `SELECT *` in a model's outermost query. Other rules validate that a pattern *is* present, like ensuring that every model's `owner` field is specified. We refer to both of these below as "validating a pattern".

Rules are defined in Python. Each rule is an individual Python class that inherits from SQLMesh's `Rule` base class and defines the logic for validating a pattern.

We display a portion of the `Rule` base class's code below ([full source code](https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/linter/rule.py)). Its methods and properties illustrate the most important components of the subclassed rules you define.

Each rule class you create has four vital components:

1. Name: the class's name is used as the rule's name.
2. Description: the class should define a docstring that provides a short explanation of the rule's purpose.
3. Pattern validation logic: the class should define a `check_model()` method containing the core logic that validates the rule's pattern. The method can access any `Model` attribute.
4. Rule violation logic: if a rule's pattern is not validated, the rule is "violated" and the class should return a `RuleViolation` object. The `RuleViolation` object should include the contextual information a user needs to understand and fix the problem.

``` python linenums="1"
# Class name used as rule's name
class Rule:
# Docstring provides rule's description
"""The base class for a rule."""

# Pattern validation logic goes in `check_model()` method
@abc.abstractmethod
def check_model(self, model: Model) -> t.Optional[RuleViolation]:
"""The evaluation function that checks for a violation of this rule."""

# Rule violation object returned by `violation()` method
def violation(self, violation_msg: t.Optional[str] = None) -> RuleViolation:
"""Return a RuleViolation instance if this rule is violated"""
return RuleViolation(rule=self, violation_msg=violation_msg or self.summary)
```

### Built-in rules

SQLMesh includes a set of predefined rules that check for potential SQL errors or enforce code style.

An example of the latter is the `NoSelectStar` rule, which prohibits a model from using `SELECT *` in its query's outer-most select statement.

Here is code for the built-in `NoSelectStar` rule class, with the different components annotated:

``` python linenums="1"
# Rule's name is the class name `NoSelectStar`
class NoSelectStar(Rule):
# Docstring explaining rule
"""Query should not contain SELECT * on its outer most projections, even if it can be expanded."""

def check_model(self, model: Model) -> t.Optional[RuleViolation]:
# If this model does not contain a SQL query, there is nothing to validate
if not isinstance(model, SqlModel):
return None

# Use the query's `is_star` property to detect the `SELECT *` pattern.
# If present, call the `violation()` method to return a `RuleViolation` object.
return self.violation() if model.query.is_star else None
```

Here are all of SQLMesh's built-in linting rules:

| Name | Check type | Explanation |
| -------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------ |
| ambiguousorinvalidcolumn | Correctness | SQLMesh found duplicate columns or was unable to determine whether a column is duplicated or not |
| invalidselectstarexpansion | Correctness | The query's top-level selection may be `SELECT *`, but only if SQLMesh can expand the `SELECT *` into individual columns |
| noselectstar | Stylistic | The query's top-level selection may not be `SELECT *`, even if SQLMesh can expand the `SELECT *` into individual columns |


### User-defined rules

You may define custom rules to implement your team's best practices.

For instance, you could ensure all models have an `owner` by defining the following linting rule:

``` python linenums="1" title="linter/user.py"
import typing as t

from sqlmesh.core.linter.rule import Rule, RuleViolation
from sqlmesh.core.model import Model

class NoMissingOwner(Rule):
"""Model owner should always be specified."""

def check_model(self, model: Model) -> t.Optional[RuleViolation]:
# Rule violated if the model's owner field (`model.owner`) is not specified
return self.violation() if not model.owner else None

```

Place a rule's code in the project's `linter/` directory. SQLMesh will load all subclasses of `Rule` from that directory.

If the rule is specified in the project's [configuration file](#applying-linting-rules), SQLMesh will run it when the project is loaded. All SQLMesh commands will load the project, except for `create_external_models`, `migrate`, `rollback`, `run`, `environments`, and `invalidate`.

SQLMesh will error if a model violates the rule, informing you which model(s) violated the rule. In this example, `full_model.sql` violated the `NoMissingOwner` rule:

``` bash
$ sqlmesh plan

Linter errors for .../models/full_model.sql:
- nomissingowner: Model owner should always be specified.

Error: Linter detected errors in the code. Please fix them before proceeding.
```

## Applying linting rules

Specify which linting rules a project should apply in the project's [configuration file](../guides/configuration.md).

Rules are specified as lists of rule names under the `linter` key. Globally enable or disable linting with the `enabled` key, which is `false` by default.

NOTE: you **must** set the `enabled` key to `true` key to apply the project's linting rules.

### Specific linting rules

This example specifies that the `"ambiguousorinvalidcolumn"` and `"invalidselectstarexpansion"` linting rules should be enforced:

=== "YAML"

```yaml linenums="1"
linter:
enabled: true
rules: ["ambiguousorinvalidcolumn", "invalidselectstarexpansion"]
```

=== "Python"

```python linenums="1"
from sqlmesh.core.config import Config, LinterConfig

config = Config(
linter=LinterConfig(
enabled=True,
rules=["ambiguousorinvalidcolumn", "invalidselectstarexpansion"]
)
)
```

### All linting rules

Apply every built-in and user-defined rule by specifying `"ALL"` instead of a list of rules:

=== "YAML"

```yaml linenums="1"
linter:
enabled: True
rules: "ALL"
```

=== "Python"

```python linenums="1"
from sqlmesh.core.config import Config, LinterConfig

config = Config(
linter=LinterConfig(
enabled=True,
rules="all",
)
)
```

If you want to apply all rules except for a few, you can specify `"ALL"` and list the rules to ignore in the `ignored_rules` key:

=== "YAML"

```yaml linenums="1"
linter:
enabled: True
rules: "ALL" # apply all built-in and user-defined rules and error if violated
ignored_rules: ["noselectstar"] # but don't run the `noselectstar` rule
```

=== "Python"

```python linenums="1"
from sqlmesh.core.config import Config, LinterConfig

config = Config(
linter=LinterConfig(
enabled=True,
# apply all built-in and user-defined linting rules and error if violated
rules="all",
# but don't run the `noselectstar` rule
ignored_rules=["noselectstar"]
)
)
```

### Exclude a model from linting

You can specify that a specific *model* ignore a linting rule by specifying `ignored_rules` in its `MODEL` block.

This example specifies that the model `docs_example.full_model` should not run the `invalidselectstarexpansion` rule:

```sql linenums="1"
MODEL(
name docs_example.full_model,
ignored_rules: ["invalidselectstarexpansion"] # or "ALL" to turn off linting completely
);
```

### Rule violation behavior

Linting rule violations raise an error by default, preventing the project from running until the violation is addressed.

You may specify that a rule's violation should not error and only log a warning by specifying it in the `warning_rules` key instead of the `rules` key.

=== "YAML"

```yaml linenums="1"
linter:
enabled: True
# error if `ambiguousorinvalidcolumn` rule violated
rules: ["ambiguousorinvalidcolumn"]
# but only warn if "invalidselectstarexpansion" is violated
warning_rules: ["invalidselectstarexpansion"]
```

=== "Python"

```python linenums="1"
from sqlmesh.core.config import Config, LinterConfig

config = Config(
linter=LinterConfig(
enabled=True,
# error if `ambiguousorinvalidcolumn` rule violated
rules=["ambiguousorinvalidcolumn"],
# but only warn if "invalidselectstarexpansion" is violated
warning_rules=["invalidselectstarexpansion"],
)
)
```

SQLMesh will raise an error if the same rule is included in more than one of the `rules`, `warning_rules`, and `ignored_rules` keys since they should be mutually exclusive.
6 changes: 6 additions & 0 deletions docs/concepts/models/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,12 @@ to `false` causes SQLMesh to disable query canonicalization & simplification. Th
### validate_query
: Whether the model's query will be validated at compile time. This attribute is `false` by default. Setting it to `true` causes SQLMesh to raise an error instead of emitting warnings. This will display invalid columns in your SQL statements along with models containing `SELECT *` that cannot be automatically expanded to list out all columns. This ensures SQL is verified locally before time and money are spent running the SQL in your data warehouse.

!!! warning
This flag is deprecated as of v.0.159.7+ in favor of the [linter](../linter.md). To preserve validation during compilation, the [built-in rules](../linter.md#built-in) that check for correctness should be [configured](../../guides/configuration.md#linter) to error severity.

### ignored_rules
: Specifies which linter rules should be ignored/excluded for this model.

## Incremental Model Properties

These properties can be specified in an incremental model's `kind` definition.
Expand Down
5 changes: 5 additions & 0 deletions docs/guides/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -1108,6 +1108,11 @@ def grant_schema_usage(evaluator):

As demonstrated in these examples, the `environment_naming_info` is available within the macro evaluator for macros invoked within the `before_all` and `after_all` statements. Additionally, the macro `this_env` provides access to the current environment name, which can be helpful for more advanced use cases that require fine-grained control over their behaviour.

### Linting

SQLMesh provides a linter that checks for potential issues in your models' code. Enable it and specify which linting rules to apply in the configuration file's `linter` key.

Learn more about linting configuration on the [linting concepts page](../concepts/linter.md).

### Debug mode

Expand Down
3 changes: 1 addition & 2 deletions docs/reference/model_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Configuration options for SQLMesh model properties. Supported by all model kinds
| `enabled` | Whether the model is enabled. This attribute is `true` by default. Setting it to `false` causes SQLMesh to ignore this model when loading the project. | bool | N |
| `gateway` | Specifies the gateway to use for the execution of this model. When not specified, the default gateway is used. | str | N |
| `optimize_query` | Whether the model's query should be optimized. This attribute is `true` by default. Setting it to `false` causes SQLMesh to disable query canonicalization & simplification. This should be turned off only if the optimized query leads to errors such as surpassing text limit. | bool | N |
| `validate_query` | Whether the model's query will be strictly validated at compile time. This attribute is `false` by default. Setting it to `true` causes SQLMesh to raise an error instead of emitting warnings. This will display invalid columns in your SQL statements along with models containing `SELECT *` that cannot be automatically expanded to list out all columns. | bool | N |
| `ignored_rules` | A list of linter rule names (or "ALL") to be ignored/excluded for this model | str \| array[str] | N |

### Model defaults

Expand Down Expand Up @@ -123,7 +123,6 @@ The SQLMesh project-level `model_defaults` key supports the following options, d
- on_destructive_change (described [below](#incremental-models))
- audits (described [here](../concepts/audits.md#generic-audits))
- optimize_query
- validate_query
- allow_partials
- enabled
- interval_unit
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ nav:
- SQLMesh tools:
- guides/ui.md
- guides/tablediff.md
- concepts/linter.md
- guides/observer.md
- Concepts:
- concepts/overview.md
Expand Down