Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_source_schemas function checks yaml files that are located outside of dbt project's directory #225

Open
tahseenadit opened this issue May 21, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tahseenadit
Copy link

Describe the bug
I have a subdirectory inside my root directory. The subdirectory contains all dbt files like models, macros, tests, seeds etc. I have .yaml files inside the root directory and the subdirectory. I have defined a .pre-commit-config.yaml file inside the subdirectory. One of the .yaml files in my root directory is empty (I have kept it intentionally for testing purpose).

When I invoke pre-commit programmatically from a file, I get an error because of that empty .yaml file.

Check for source column descriptions.....................................Failed
- hook id: check-source-columns-have-desc
- exit code: 1

Traceback (most recent call last):
  File "/Users/mdana/.cache/pre-commit/repoyj6tusf_/py_env-python3.9/bin/check-source-columns-have-desc", line 8, in <module>
    sys.exit(main())
  File "/Users/mdana/.cache/pre-commit/repoyj6tusf_/py_env-python3.9/lib/python3.9/site-packages/dbt_checkpoint/check_source_columns_have_desc.py", line 54, in main
    hook_properties = check_column_desc(
  File "/Users/mdana/.cache/pre-commit/repoyj6tusf_/py_env-python3.9/lib/python3.9/site-packages/dbt_checkpoint/check_source_columns_have_desc.py", line 25, in check_column_desc
    for schema in schemas:
  File "/Users/mdana/.cache/pre-commit/repoyj6tusf_/py_env-python3.9/lib/python3.9/site-packages/dbt_checkpoint/utils.py", line 343, in get_source_schemas
    for source in schema.get("sources", []):
AttributeError: 'NoneType' object has no attribute 'get'

To Reproduce
Steps to reproduce the behavior:

  1. Create an empty .yml file in any directory in your project.
  2. Run pre-commit hook check_source_columns_have_desc

Example pre-commit-config.yaml file:

repos:
- repo: https://github.com/offbi/pre-commit-dbt
  rev: v2.0.1
  hooks:
  - id: check-source-columns-have-desc
    args: ["--manifest", "dbt_subdir/target/manifest.json"]

Expected behavior
By default the hook should only check .yaml files that are inside the dbt project's directory i.e dbt_subdir and raise error if any of the .yaml files stored inside dbt project's directory is empty.

Suggestion:

def get_source_schemas(
    yml_files: Sequence[Path], include_disabled: bool = False
) -> Generator[SourceSchema, None, None]:
    # There should be a logic here to filter yml_files
    for yml_file in yml_files:
        schema = safe_load(yml_file.open())
        for source in schema.get("sources", []):
            if not include_disabled and not source.get("config", {}).get(
                "enabled", True
            ):
                continue
            source_name = source.get("name")
            tables = source.pop("tables", [])
            for table in tables:
                table_name = table.get("name")
                yield SourceSchema(
                    source_name=source_name,
                    table_name=table_name,
                    filename=yml_file.stem,
                    source_schema=source,
                    table_schema=table,
                )

Version:
v2.0.1

Additional context
Debug output:
hook.id: 'check-source-columns-have-desc'
hook.files: ''
hook.exclude: '^$'
hook.types: ['file']
hook.types_or: ['yaml']
hook.exclude_types: []

filenames
('.github/workflows/pr...checks.yml', '.empty-test-pre-commit-config.yaml', 'Taskfile.yml', 'dbt_bq/.pre-commit-config.yaml', 'dbt_bq/.user.yml', 'dbt_bq/dbt_project.yml', 'dbt_bq/dependencies.yml', 'dbt_bq/macros/schema.yml', 'dbt_bq/models/exampl...schema.yml', 'dbt_bq/profiles.yml', 'dbt_bq/seeds/example...erties.yml')
special variables
function variables
00: '.github/workflows/pre_commit_checks.yml'
01: '.empty-test-pre-commit-config.yaml'
02: 'Taskfile.yml'
03: 'dbt_bq/.pre-commit-config.yaml'
04: 'dbt_bq/.user.yml'
05: 'dbt_bq/dbt_project.yml'
06: 'dbt_bq/dependencies.yml'
07: 'dbt_bq/macros/schema.yml'
08: 'dbt_bq/models/example/schema.yml'
09: 'dbt_bq/profiles.yml'
10: 'dbt_bq/seeds/example/properties.yml'
len(): 11

@tahseenadit tahseenadit added the bug Something isn't working label May 21, 2024
@tahseenadit tahseenadit changed the title pre-commit hook checks yaml files that are located outside of dbt project's directory get_source_schemas checks yaml files that are located outside of dbt project's directory May 21, 2024
@tahseenadit tahseenadit changed the title get_source_schemas checks yaml files that are located outside of dbt project's directory get_source_schemas function checks yaml files that are located outside of dbt project's directory May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant