Skip to content

Question regarding the usability of SQL API and BI Tools integrations: ungrouped queries support #6750

@igorcalabria

Description

@igorcalabria

Hi everyone,

I'm doing some tests integrating cube with metabase(I think this applies to any tool, really) but found really odd that every query is aggregated, even if a group by clause is not used. This essentially breaks every exploration feature of the said tool. For example, I'm using trino's tpch catalog and created a simple cube around orders table like this

cubes:
  - name: orders
    sql_table: sf1.orders
    data_source: default

    dimensions:
      - name: orderkey
        sql: orderkey
        type: number
        primary_key: true

      - name: orderstatus
        sql: orderstatus
        type: string

      - name: orderpriority
        sql: orderpriority
        type: string

      - name: clerk
        sql: clerk
        type: string

      - name: comment
        sql: comment
        type: string

      - name: orderdate
        sql: >
          CAST(orderdate AS TIMESTAMP)
        type: time

    measures:
      - name: count
        type: count

      - name: totalprice
        sql: totalprice
        type: sum

If I go to metabase's data browser, it issues a SELECT * FROM orders LIMIT 2000 query, that gets executed as

  SELECT                                                                                                                                                                                                                                
      "orders".orderkey "orders__orderkey", "orders".orderstatus "orders__orderstatus", "orders".orderpriority "orders__orderpriority", "orders".clerk "orders__clerk", "orders".comment "orders__comment", CAST(orderdate AS TIMESTAMP)
 "orders__orderdate", count("orders".orderkey) "orders__count", sum("orders".totalprice) "orders__totalprice"                                                                                                                           
    FROM                                                                                                                                                                                                                                
      sf1.orders AS "orders"  GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY 7 DESC LIMIT 2000                                                                                                                                                      

For smaller datasets this is not a problem, but for big ones it's really wasteful. I understand that the group by is needed so that the measures are shown, but this left me wondering if there's an alternative that makes using the SQL API more "natural" for non-aggregating queries.

For workaround I though about hiding the high cardinality columns and ensuring that the default query hits a pre-aggregation but that seems strange too.

I apologize if this question seems dumb, and I would love to understand the motivation behind automatically aggregating all queries(even without GROUP BY). Some advice how to deal with this situation currently would be great, thanks y'all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementLEGACY. Use the Feature issue type instead

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions