Skip to content

Build SQL for materialized cubes#608

Merged
shangyian merged 4 commits intoDataJunction:mainfrom
shangyian:build-cubes
Jul 6, 2023
Merged

Build SQL for materialized cubes#608
shangyian merged 4 commits intoDataJunction:mainfrom
shangyian:build-cubes

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Jul 6, 2023

Summary

This PR adds functionality to build queries for materialized cube nodes. This was also in the presentation, but I've gone through this a second time to handle various edge cases.

The main changes include:

  • When a cube is created, we store some additional info needed for building materialized cube queries into the default "materialization" config, namely a combiner expression that combines the measures back into the metric.
  • When building SQL for multiple metrics, we add a check to see if those metrics + dimensions can be found in an existing materialized cube
  • If there is a materialized cube that encompasses the metrics + dimensions, we build SQL that queries the cube directly by combining the measures back into the selected metrics
  • Refactor so that we have a single function build_sql_for_multiple_metrics that generates SQL for both the /sql and /data endpoints

Example

Let's say we have a cube named default.repairs_cube that contains the following metrics and dimensions:

  • default.discounted_orders_rate (metric)
  • default.num_repair_orders (metric)
  • default.avg_repair_price (metric)
  • default.hard_hat.country (dim)
  • default.hard_hat.city (dim)

After we post an availability state for this cube, whenever we query metrics or dimensions that are in the cube, we'll generate SQL that queries the materialized cube directly. For example, if we ask for default.discounted_orders_rate, default.num_repair_orders and default.avg_repair_price grouped by default.hard_hat.country, we'll get the following query:

SELECT
  sum(discount_sum) / count(placeholder_count) default_DOT_discounted_orders_rate,
  count(repair_order_id_count) default_DOT_num_repair_orders,
  sum(price_sum) / count(price_count) default_DOT_avg_repair_price,
  country
FROM repairs_cube
GROUP BY
  country

Test Plan

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify bot commented Jul 6, 2023

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit 2320251
🔍 Latest deploy log https://app.netlify.com/sites/thriving-cassata-78ae72/deploys/64a71744cf827000086a7d0f

@shangyian shangyian marked this pull request as ready for review July 6, 2023 16:41
Copy link
Copy Markdown
Member

@agorajek agorajek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Two small comments in line.

existing_cubes = session.exec(statement).unique().all()
for cube in existing_cubes:
return cube
return None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, None is the default return value.

# The cube needs to have a materialization configured and an availability state
# posted in order for us to use the materialized datasource directly
cube = find_existing_cube(session, metric_columns, dimension_columns)
if cube and cube.materializations and cube.availability:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if it makes sense to add this condition to the find_existing_cube() method itself... perhaps not, but then I feel like we could add a wrapper on that one called find_materialized_cube() and have this condition in there for reuse.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think it would be a good idea to add an option to look for materialized cubes. I added a parameter to find_existing_cube called materialized so that we can filter for materialized cubes if desired.

@shangyian shangyian merged commit 9ae4d2d into DataJunction:main Jul 6, 2023
@shangyian shangyian deleted the build-cubes branch July 6, 2023 20:03
youngman-droid pushed a commit to youngman-droid/dj that referenced this pull request Aug 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants