Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SL-1458] [Feature] Unable to traverse linkable dimensions beyond a certain level #945

Closed
2 tasks done
gowthamgutha opened this issue Dec 19, 2023 · 2 comments
Closed
2 tasks done
Labels
linear triage Tasks that need to be triaged

Comments

@gowthamgutha
Copy link

gowthamgutha commented Dec 19, 2023

Is this a new bug in metricflow?

  • I believe this is a new bug in metricflow
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have the following semantic model partially representing the AdventureWorks schema where the seed data is referenced from https://github.com/Data-Engineer-Camp/dbt-dimensional-modelling/tree/main/adventureworks
All the dbt models prefixed with m_ are views which are simple select * statements of their corresponding tables.

Adapter used: Snowflake 1.7.1
DBT version: 1.7.4

For the metric, sumlinetotal created from the measure salesorderdetail, I am able to group by till sales_order_id__bill_to_address_id__state_province_id (upto 3 levels), but when I try to group by sales_order_id__bill_to_address_id__state_province_id__name or sales_order_id__bill_to_address_id__state_province_id__country_region_id it fails probably because of adding __name and __country_region_id and the error is:

ERROR: Unable To Satisfy Query Error: Dimensions ['sales_order_id__bill_to_address_id__state_province_id__country_region_id'] cannot be resolved for metrics ['sumlinetotal']. The invalid dimension may not exist, require an ambiguous join (e.g. a join path that can be satisfied in multiple ways), or require a fanout join.

To validate the same, I have created another metric called sumsubtotal from the salesorderheader measure and I am able to traverse upto 3 levels, here again.
bill_to_address_id__state_province_id__country_region_id works but not bill_to_address_id__state_province_id__country_region_id__name

version: 2
semantic_models:
  - name: salesorderdetail
    defaults:
      agg_time_dimension: modified_date
    description: |
      Items contatined in each order. The grain of the table is one row per order item.
    model: ref('m_salesorderdetail')
    entities:
      - name: sales_order_detail_id
        type: primary
        expr: SalesOrderDetailID
      - name: sales_order_id
        type: foreign
        expr: SalesOrderID
    dimensions:
      - name: modified_date
        expr: modified_date
        type: time
        type_params:
          time_granularity: month
    measures:
      - name: sumlinetotal
        description: The revenue generated for each line order item.
        agg: sum
        expr: UNITPRICE

  - name: salesorderheader
    defaults:
      agg_time_dimension: order_date
    description: |
      sales order header
    model: ref('m_salesorderheader')
    entities:
      - name: sales_order_id
        type: primary
        expr: SalesOrderID
      - name: bill_to_address_id
        type: foreign
        expr: BillToAddressID
      - name: ship_to_address_id
        type: foreign
        expr: ShipToAddressID
    dimensions:
      - name: order_date
        expr: OrderDate
        type: time
        type_params:
          time_granularity: month
    measures:
      - name: sumsubtotal
        description: The revenue generated for each sub total.
        agg: sum
        expr: SubTotal

  - name: billtoaddress
    defaults:
      agg_time_dimension: modified_date
    description: |
      sales order header
    model: ref('m_address')
    entities:
      - name: bill_to_address_id
        type: primary
        expr: AddressID
      - name: state_province_id
        type: foreign
        expr: StateProvinceID
    dimensions:
      - name: city
        expr: City
        type: categorical
      - name: modified_date
        expr: ModifiedDate
        type: time
        type_params:
          time_granularity: month

  - name: shiptoaddress
    defaults:
      agg_time_dimension: modified_date
    description: |
      ship to address
    model: ref('m_address')
    entities:
      - name: ship_to_address_id
        type: primary
        expr: AddressID
      - name: state_province_id
        type: foreign
        expr: StateProvinceID
    dimensions:
      - name: modified_date
        expr: ModifiedDate
        type: time
        type_params:
          time_granularity: month

  - name: stateprovince
    defaults:
      agg_time_dimension: modified_date
    description: |
      state province
    model: ref('m_stateprovince')
    entities:
      - name: state_province_id
        type: primary
        expr: StateProvinceID
      - name: country_region_id
        type: foreign
        expr: CountryRegionCode
    dimensions:
      - name: name
        expr: Name
        type: categorical
      - name: modified_date
        expr: ModifiedDate
        type: time
        type_params:
          time_granularity: month

  - name: countryregion
    defaults:
      agg_time_dimension: modified_date
    description: |
      country region
    model: ref('m_countryregion')
    entities:
      - name: country_region_id
        type: primary
        expr: CountryRegionCode
    dimensions:
      - name: country_name
        expr: Name
        type: categorical
      - name: modified_date
        expr: ModifiedDate
        type: time
        type_params:
          time_granularity: month

metrics:
  # Simple metrics
  - name: sumlinetotal
    description: Sum of the product line total.
    type: simple
    label: SumLineTotal
    type_params:
      measure: sumlinetotal
  - name: sumsubtotal
    description: Sum of the sub total.
    type: simple
    label: SumSubTotal
    type_params:
      measure: sumsubtotal

Expected Behavior

Should be able to traverse the link-able dimensions for more than 3 levels.

Steps To Reproduce

  1. Seed the data from the below DBT project (https://github.com/Data-Engineer-Camp/dbt-dimensional-modelling/tree/main/adventureworks)
  2. Create the dbt models m_address, m_countryregion, m_salesorderdetail, m_salesorderheader, m_stateprovince each containing simple select * from <table_name_without_m_>
  3. Create semantic models as described in the current behavior (snippet).
  4. Use snowflake as adapter and run the dbt model to create the views.
  5. Use metricflow to list and query metrics with dimensions defined in the current behavior snippet.

Relevant log output

mf query --metrics sumsubtotal --group-by bill_to_address_id__state_province_id__country_region_id__name
\ Initiating query…
ERROR: Unable To Satisfy Query Error: Dimensions ['bill_to_address_id__state_province_id__country_region_id__name'] cannot be resolved for metrics ['sumsubtotal']. The invalid dimension may not exist, require an ambiguous join (e.g. a join path that can be satisfied in multiple ways), or require a fanout join.

Suggestions for invalid dimension 'bill_to_address_id__state_province_id__country_region_id__name':
    ['bill_to_address_id__state_province_id__country_region_id',
     'ship_to_address_id__state_province_id__country_region_id',
     'bill_to_address_id__state_province_id__name',
     'bill_to_address_id__state_province_id__modified_date__year',
     'bill_to_address_id__state_province_id__modified_date__quarter',
     'ship_to_address_id__state_province_id__name']

Traceback

Traceback (most recent call last):
  File "d:\dbt\aw\env\lib\site-packages\metricflow\cli\utils.py", line 113, in wrapper
    func(*args, **kwargs)
  File "d:\dbt\aw\env\lib\site-packages\metricflow\telemetry\reporter.py", line 150, in wrapped
    return func(*args, **kwargs)
  File "d:\dbt\aw\env\lib\site-packages\metricflow\cli\main.py", line 296, in query
    query_result = cfg.mf.query(mf_request=mf_request)
  File "d:\dbt\aw\env\lib\site-packages\metricflow\telemetry\reporter.py", line 150, in wrapped
    return func(*args, **kwargs)
  File "d:\dbt\aw\env\lib\site-packages\metricflow\engine\metricflow_engine.py", line 384, in query
    explain_result = self._create_execution_plan(mf_request)
  File "d:\dbt\aw\env\lib\site-packages\metricflow\engine\metricflow_engine.py", line 435, in _create_execution_plan
    query_spec = self._query_parser.parse_and_validate_query(
  File "d:\dbt\aw\env\lib\site-packages\metricflow\query\query_parser.py", line 257, in parse_and_validate_query
    return self._parse_and_validate_query(
  File "d:\dbt\aw\env\lib\site-packages\metricflow\query\query_parser.py", line 525, in _parse_and_validate_query
    self._validate_linkable_specs_for_metrics(
  File "d:\dbt\aw\env\lib\site-packages\metricflow\query\query_parser.py", line 328, in _validate_linkable_specs_for_metrics
    raise UnableToSatisfyQueryError(
metricflow.errors.errors.UnableToSatisfyQueryError: Unable To Satisfy Query Error: Dimensions ['bill_to_address_id__state_province_id__country_region_id__name'] cannot be resolved for metrics ['sumsubtotal']. The invalid dimension may not exist, require an ambiguous join (e.g. a join path that can be satisfied in multiple ways), or require a fanout join.

Environment

- OS: Windows 10 Enterprise
- Python: 3.9.6
- dbt: 1.7.4
- metricflow: 0.203.1

Which database are you using?

snowflake

Additional Context

These are the constraints

alter table countryregion add constraint COUNTRYREGIONCODE_PRIM primary key(COUNTRYREGIONCODE) enforced;

alter table stateprovince add foreign key (COUNTRYREGIONCODE) REFERENCES COUNTRYREGION(COUNTRYREGIONCODE) enforced;  
alter table stateprovince add constraint STATEPROVINCEID_prim primary key(STATEPROVINCEID) enforced;


alter table address add constraint ADDRESSID_prim primary key(ADDRESSID) enforced;
alter table address add foreign key (STATEPROVINCEID) REFERENCES STATEPROVINCE(STATEPROVINCEID) enforced;  

alter table SALESORDERDETAIL add constraint SALESORDERDETAILID_prim primary key(SALESORDERDETAILID) enforced;
alter table SALESORDERHEADER add constraint SALESORDERID_prim primary key(SALESORDERID) enforced;

alter table SALESORDERDETAIL add foreign key (SALESORDERID) REFERENCES SALESORDERHEADER(SALESORDERID) enforced;  
alter table SALESORDERHEADER add foreign key (BILLTOADDRESSID) REFERENCES person.ADDRESS(ADDRESSID) enforced;  
alter table SALESORDERHEADER add foreign key (SHIPTOADDRESSID) REFERENCES person.ADDRESS(ADDRESSID) enforced;  

SL-1458

@gowthamgutha gowthamgutha added bug Something isn't working triage Tasks that need to be triaged labels Dec 19, 2023
@Jstein77
Copy link
Contributor

Hey @gowthamgutha, thanks for opening an Issue. I'm gonna change this to a feature request since we're aware of this limitation. We currently only allow 2 hops, mainly because adding a third hop greatly increases the complexity of the join resolutions paths, and opens up a lot of ambiguous join paths.

It's something we're aware of, and we've already started doing some work to better handle ambiguous join resolution. Take a look at this PR for an example: #914.

The recommended path right now is to do some denormalization in dbt if you need metricflow to be able to join across more than 2 hops.

@Jstein77 Jstein77 added linear and removed bug Something isn't working labels Dec 19, 2023
@Jstein77 Jstein77 changed the title [Bug] Unable to traverse linkable dimensions beyond a certain level [SL-1458] [Bug] Unable to traverse linkable dimensions beyond a certain level Dec 19, 2023
@Jstein77 Jstein77 changed the title [SL-1458] [Bug] Unable to traverse linkable dimensions beyond a certain level [Feature] Unable to traverse linkable dimensions beyond a certain level Dec 19, 2023
@Jstein77 Jstein77 changed the title [Feature] Unable to traverse linkable dimensions beyond a certain level [SL-1458] [Feature] Unable to traverse linkable dimensions beyond a certain level Dec 19, 2023
@gowthamgutha
Copy link
Author

Thanks for the update @Jstein77 . I've used de-normalized views to solve my use-case.

m_address.sql

{{ config(schema='Person') }}
select sp.Country, Person.Address.* from Person.Address left outer join {{ref('m_stateprovince')}} as sp on Person.Address.StateProvinceID = sp.StateProvinceID

m_stateprovince.sql

{{ config(schema='Person') }}
select Person.CountryRegion.Name as Country, Person.StateProvince.* from Person.StateProvince left outer join Person.CountryRegion on Person.StateProvince.COUNTRYREGIONCODE = Person.CountryRegion.COUNTRYREGIONCODE

The above two models have been changed to facilitate the use-case sumlinetotal by billing country.

The actual linkable dimension path ought to be sales_order_id__bill_to_address_id__state_province_id__country_region_id__country_name.

Now, with the below semantic model changes for billtoaddress, we can make sales_order_id__bill_to_address_id__country since we added the country in the m_address as shown above, so it can become a dimension in the billtoaddress semantic model.

  - name: billtoaddress
    defaults:
      agg_time_dimension: modified_date
    description: |
      sales order header
    model: ref('m_address')
    entities:
      - name: bill_to_address_id
        type: primary
        expr: AddressID
      - name: state_province_id
        type: foreign
        expr: StateProvinceID
    dimensions:
      - name: country
        expr: Country
        type: categorical
      - name: city
        expr: City
        type: categorical
      - name: modified_date
        expr: ModifiedDate
        type: time
        type_params:
          time_granularity: month

@Jstein77 Jstein77 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear triage Tasks that need to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants