Skip to content

Latest commit

 

History

History
280 lines (267 loc) · 34.3 KB

insights.rst

File metadata and controls

280 lines (267 loc) · 34.3 KB

Insights

The following insights are available in DataPilot:

1. Modelling Insights

Name Description Files Required Overrides
source_staging_model_integrity
Ensures each source has a dedicated
staging model and is not directly
joined to downstream models.
Manifest None
downstream_source_dependence
Evaluates if downstream models
(marts or intermediates) are improperly
dependent directly on a source. This
check ensures that all downstream
models depend on staging models,
not directly on the source nodes.
Manifest None
Duplicate_Sources
Identifies cases where multiple source
nodes in a dbt project refer to the
same database object. Ensures that each
database object is represented by a single,
unique source node.
Manifest None
hard_coded_references
Identifies instances where SQL code
within models contains hard-coded references,
which can obscure data lineage and complicate
project maintenance.
Manifest None
rejoining_upstream_concepts
Detects scenarios where a parent’s direct
child is also a direct child of another
one of the parent’s direct children, indicating
potential loops or unnecessary complexity
in the DAG.
Manifest None
model_fanout
Assesses parent models to identify
high fanout scenarios, which may
indicate opportunities for more
efficient transformations in the
BI layer or better positioning
of common business logic upstream
in the data pipeline.
Manifest max_fanout
multiple_sources_joined
Checks if a model directly joins
multiple source tables, encouraging
the use of a single staging model
per source for downstream models
to enhance data consistency
and maintainability.
Manifest None
root_model
Identifies models without direct
parents, either sources or other
models within the dbt project.
Ensures all models can be traced
back to a source or interconnected
within the project, which is crucial
for clear data lineage and project
integrity.
Manifest None
source_fanout
Evaluates sources for high fanout,
identifying when a single source
has a large number of direct child
models. High fanout may indicate
an overly complex or source reliant
data model, potentially introducing
risks and complicating maintenance
and scalability.
Manifest max_fanout
staging_models_dependency
Checks whether staging models depend
on downstream models, rather than
on source or raw data models. Staging
models should ideally depend on
upstream data sources to maintain
a clear and logical data flow.
Manifest None
staging_models_on_staging
Checks if staging models are dependent
on other staging models instead of
on source or raw data models, ensuring
that staging models are used
appropriately to maintain a clear
and logical data flow from sources
to staging.
Manifest None
unused_sources
Identifies sources that are defined
in the project’s YML files but not
used in any models or sources. They
may have become redundant due to
model deprecation, contributing to
unnecessary complexity and clutter
in the dbt project.
Manifest None

2. Performance Insights

Name Description Files Required Overrides
chain_view_linking
Analyzes the dbt project to identify
long chains of non materialized
models (views and ephemerals).
Such long chains can result in increased
runtime for models built on top of them
due to extended computation and
memory usage.
Manifest None
exposure_parent_bad_materialization
Evaluates the materialization types of
parent models of exposures to ensure
they rely on transformed dbt models
or metrics rather than raw sources,
and checks if these parent models are
materialized efficiently for performance
Manifest None

3. Governance Insights

Name Description Files Required Overrides
documentation_on_stale_columns
Checks for columns that are documented
in the dbt project but have been removed
from their respective models.
Manifest, Catalog None
exposures_dependent_on_private_models
Detects if exposures in the dbt project
are dependent on private models. Recommends
using public, well documented, and
contracted models as trusted data
sources for downstream consumption.
Manifest None
public_models_without_contracts
Identifies public models in the dbt project
that are accessible to all downstream
consumers but lack contracts specifying
data types and columns.
Manifest None
missing_documentation
Detects columns and models that don’t
have documentation.
Manifest, Catalog None
undocumented_public_models
Identifies models in the dbt project
that are marked as public but don’t
have documentation.
Manifest None

4. Testing Insights

Name Description Files Required Overrides
missing_primary_key_tests
Identifies dbt models in the project
that lack primary key tests, which are
crucial for ensuring data integrity
and correctness.
Manifest None
dbt_low_test_coverage
Identifies dbt models in the project
that have tests coverage percentage
below the required threshold.
Manifest min_test_coverage_percent

5. Project Structure Insights

Name Description Files Required Overrides
model_directory_structure
Checks for correct placement of models
in their designated directories. Proper
directory structure is essential for ,
organization, discoverability, and maintenance
within the dbt project.
Manifest None
model_naming_convention_check
Ensures all models adhere to a predefined
naming convention. A consistent naming
convention is crucial for clarity,
understanding of the model's purpose, and
enhancing navigation within the dbt project.
Manifest None
source_directory_structure
Verifies if sources are correctly placed in
their designated directories. Proper directory
placement for sources is important for
organizationand easy searchability.
Manifest None
test_directory_structure
Checks if tests are correctly placed in the
same directories as their corresponding models.
Co locating tests with models aids in
maintainability and clarity.
Manifest None

6. Check Insights

Name Description Files Required Overrides
column_descriptions_are_same
Checks if the column descriptions in the dbt
project are consistent across the project.
Manifest None
column_name_contract
Checks if the column names in the dbt project
abide by the column name contract which
consists of a regex pattern and a series
of data types.
Manifest, Catalog None
check_macro_args_have_desc
Checks if the macro arguments in the dbt
project have descriptions.
Manifest None
check_macro_has_desc
Checks if the macros in the dbt project
have descriptions.
Manifest None
check_model_has_all_columns
Checks if the models in the dbt project
have all the columns that are present in
the data catalog.
Manifest, Catalog None
check_model_has_valid_meta_keys
Checks if the models in the dbt project
have meta keys.
Manifest None
check_model_has_properties_file
Checks if the models in the dbt project
have a properties file.
Manifest None
check_model_has_tests_by_name
Checks if the models in the dbt project
have tests by name.
Manifest None
check_model_has_tests_by_type
Checks if the models in the dbt project
have tests by type.
Manifest None
check_model_has_tests_by_group
Checks if the models in the dbt project
have tests by group.
Manifest None
check_model_materialization_by_childs
Checks if the models in the dbt project
have materialization by a given threshold
of child models.
Manifest None
model_name_by_folder
Checks if the models in the dbt project
abide by the model name contract which
consists of a regex pattern.
Manifest None
check_model_parents_and_childs
Checks if the model has min/max parents
and childs.
Manifest None
check_model_parents_database
Checks if the models in the dbt project
has parent database in whitelist and
not in blacklist.
Manifest None
check_model_parents_schema
Checks if the models in the dbt project
has parent schema in whitelist and
not in blacklist.
Manifest None
check_model_tags
Checks if the models in the dbt project
have tags in provided list of tags.
Manifest None
check_source_childs
Checks if the source has min/max childs
Manifest None
check_source_columns_have_desc
Checks if the source columns have descriptions
in the dbt project.
Manifest, Catalog None
check_source_has_all_columns
Checks if the source has all columns
present in the data catalog.
Manifest, Catalog None
check_source_has_freshness
Checks if the source has freshness
options.
Manifest None
check_source_has_loader
Checks if the source has loader
Manifest None
check_source_has_meta_keys
Checks if the source has meta keys
Manifest None
check_source_has_tests_by_name
Checks if the source has tests by name
Manifest None
check_source_has_tests_by_type
Checks if the source has tests by type
Manifest None
check_source_has_tests_by_group
Checks if the source has tests by group
Manifest None
check_source_has_tests
Checks if the source has tests
Manifest None
check_source_table_has_desc
Checks if the source table has description
Manifest None
check_source_tags
Checks if the source has tags
Manifest None