Hi everyone,
I'm doing some tests integrating cube with metabase(I think this applies to any tool, really) but found really odd that every query is aggregated, even if a group by clause is not used. This essentially breaks every exploration feature of the said tool. For example, I'm using trino's tpch catalog and created a simple cube around orders table like this
cubes:
- name: orders
sql_table: sf1.orders
data_source: default
dimensions:
- name: orderkey
sql: orderkey
type: number
primary_key: true
- name: orderstatus
sql: orderstatus
type: string
- name: orderpriority
sql: orderpriority
type: string
- name: clerk
sql: clerk
type: string
- name: comment
sql: comment
type: string
- name: orderdate
sql: >
CAST(orderdate AS TIMESTAMP)
type: time
measures:
- name: count
type: count
- name: totalprice
sql: totalprice
type: sum
If I go to metabase's data browser, it issues a SELECT * FROM orders LIMIT 2000 query, that gets executed as
SELECT
"orders".orderkey "orders__orderkey", "orders".orderstatus "orders__orderstatus", "orders".orderpriority "orders__orderpriority", "orders".clerk "orders__clerk", "orders".comment "orders__comment", CAST(orderdate AS TIMESTAMP)
"orders__orderdate", count("orders".orderkey) "orders__count", sum("orders".totalprice) "orders__totalprice"
FROM
sf1.orders AS "orders" GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY 7 DESC LIMIT 2000
For smaller datasets this is not a problem, but for big ones it's really wasteful. I understand that the group by is needed so that the measures are shown, but this left me wondering if there's an alternative that makes using the SQL API more "natural" for non-aggregating queries.
For workaround I though about hiding the high cardinality columns and ensuring that the default query hits a pre-aggregation but that seems strange too.
I apologize if this question seems dumb, and I would love to understand the motivation behind automatically aggregating all queries(even without GROUP BY). Some advice how to deal with this situation currently would be great, thanks y'all.
Hi everyone,
I'm doing some tests integrating cube with metabase(I think this applies to any tool, really) but found really odd that every query is aggregated, even if a group by clause is not used. This essentially breaks every exploration feature of the said tool. For example, I'm using trino's tpch catalog and created a simple cube around
orderstable like thisIf I go to metabase's data browser, it issues a
SELECT * FROM orders LIMIT 2000query, that gets executed asFor smaller datasets this is not a problem, but for big ones it's really wasteful. I understand that the group by is needed so that the measures are shown, but this left me wondering if there's an alternative that makes using the SQL API more "natural" for non-aggregating queries.
For workaround I though about hiding the high cardinality columns and ensuring that the default query hits a pre-aggregation but that seems strange too.
I apologize if this question seems dumb, and I would love to understand the motivation behind automatically aggregating all queries(even without GROUP BY). Some advice how to deal with this situation currently would be great, thanks y'all.