Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CubeStore: rollupJoin aggregation error - requires index on foreign field? #2987

Closed
mfulop opened this issue Jun 21, 2021 · 3 comments
Closed
Assignees
Labels
backend:pre-aggregations Issues related to pre-aggregations bug Something isn't working

Comments

@mfulop
Copy link

mfulop commented Jun 21, 2021

We are using CubeStore with GCS bucket, with BigQuery data. Given the below two cube definitions (cubes joined together), rollup pre-aggregations byUpdatedAtDate work correctly. However when trying to execute a query that would use the rollupJoin pre-aggregation StatsAndAccountsByUpdatedAtDate, the following error is returned:

_Error: Internal: Error during planning: Can't find index to join table dev_pre_aggregations.user_statistics_by_updated_at_date_up3sg3sy_wkm2vuiu_1gd1jl9 on accounts___id. Consider creating index: CREATE INDEX user_statistics_by_updated_at_date_up3sg3sy_wkm2vuiu_1gd1jl9_accounts___id ON dev_pre_aggregations.user_statistics_by_updated_at_dateup3sg3sy_wkm2vuiu_1gd1jl9 (accounts___id)

It is unclear what index is missing. Seems like we need an accounts._id index on the user_statistics preaggregation? (how?)
Or we misunderstood something.

To Reproduce
See cube definitions below:

cube(`Accounts`, {
  title: ' Accounts',
  sql:`SELECT * FROM \`**********_accounts\``,

  measures: {
    count: { sql: `${CUBE}._id`, type: `count` }
  },

  dimensions: {
    _id: { sql: `${CUBE}._id`, type: `string`, primaryKey: true, shown: true },
    crmUserId: { sql: `${CUBE}.crmUserId`, type: `string`},
    username: { sql: `${CUBE}.username`, type: `string` },
    primaryKind: { sql: `${CUBE}.primaryKind`, type: `string` },
    updatedAt: { sql: `PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%E*SZ', ${CUBE}.updatedat)`, type: `time` },
    updatedAtDate: {
      sql: `TIMESTAMP_TRUNC(PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%E*SZ', ${CUBE}.updatedat), DAY)`,
      type: `time`
    }
  },

preAggregations: {
    byUpdatedAtDate: {
      type: `rollup`,
      external: true,
      scheduledRefresh: true,
      measureReferences: [count],
      dimensionReferences: [_id, crmUserId, username, primaryKind],
      timeDimensionReference: updatedAtDate,
      granularity: `day`,
      partitionGranularity: `month`,
      indexes: {
        main: {
          columns: [_id]
        }
      }
    }
  }
})
cube(`UserStatistics`, {
  title: 'User Statistics',
  sql:`SELECT * FROM \`**********_user_statistics\``,

  measures: {
    count: { sql: `${CUBE}._id`, type: `count` },
    value: { sql: `${CUBE}.value`, type: `number` },
    maxValue: { sql: `${CUBE}.value`, type: `max` },
    sumValue: { sql: `${CUBE}.value`, type: `sum` }
  },

  dimensions: {
    _id: { sql: `${CUBE}._id`, type: `string`, primaryKey: true, shown: true },
    owner: { sql: `${CUBE}.owner`, type: `string` },
    group: { sql: `${CUBE}.group`, type: `string` },
    stat: { sql: `${CUBE}.stat`, type: `string` },
    updatedAt: {
      sql: `PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%E*SZ', ${CUBE}.updatedat)`,
      type: `time`
    },
    updatedAtDate: {
      sql: `TIMESTAMP_TRUNC(PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%E*SZ', ${CUBE}.updatedat), DAY)`,
      type: `time`
    }
  },

  joins: {
    Accounts: {
      relationship: `belongsTo`,
      sql: `${Accounts._id} = ${CUBE.owner}`
    }
  },

preAggregations: {
    byUpdatedAtDate: {
      type: `rollup`,
      external: true,
      scheduledRefresh: true,
      measureReferences: [count, maxValue],
      dimensionReferences: [_id, owner, group, stat, updatedAtDate],
      timeDimensionReference: updatedAtDate,
      granularity: `day`,
      partitionGranularity: `month`,
      indexes: {
        main: {
          columns: [owner]
        }
      }
    },
    StatsAndAccountsByUpdatedAtDate: {
      type: `rollupJoin`,
      external: true,
      measureReferences: [Accounts.count, UserStatistics.count, UserStatistics.maxValue, UserStatistics.sumValue],
      dimensionReferences: [Accounts._id, Accounts.crmUserId, Accounts.username, Accounts.primaryKind,
        UserStatistics._id, UserStatistics.owner, UserStatistics.group, UserStatistics.stat, UserStatistics.updatedAtDate],
      rollupReferences: [UserStatistics.byUpdatedAtDate, Accounts.byUpdatedAtDate]
    }
  }
})

Query to run:

{
  "query": {
    "measures": [
      "UserStatistics.maxValue"
    ],
    "dimensions": [
      "Accounts.crmUserId",
      "UserStatistics.owner"
    ],
    "filters": [
      {
        "member": "UserStatistics.stat",
        "operator": "equals",
        "values": [
          "**********"
        ]
      },
      {
        "member": "UserStatistics.group",
        "operator": "equals",
        "values": [
          "**********"
        ]
      },
      {
        "member": "UserStatistics.updatedAtDate",
        "operator": "afterDate",
        "values": [
          "2021-06-16"
        ]
      }
    ]
  }
}

Version:
server-core 8.27.35
bigquery-driver 8.27.35
cubestore 8.27.35

@igorlukanin igorlukanin added the backend:pre-aggregations Issues related to pre-aggregations label Jun 21, 2021
@paveltiunov
Copy link
Member

Hey @mfulop ! Could you please try to swap join condition:

sql: `${CUBE.owner} = ${Accounts._id}`

instead of

sql: `${Accounts._id} = ${CUBE.owner}`

@paveltiunov paveltiunov added the bug Something isn't working label Jun 23, 2021
@paveltiunov paveltiunov self-assigned this Jun 23, 2021
@mfulop
Copy link
Author

mfulop commented Jun 27, 2021

Hello @paveltiunov ,

How could I miss this? I can confirm that with ${CUBE.owner} = ${Accounts._id} it works and uses both preaggregations nicely:

{
  "query": {
  // ...
  },
  "data": [
  // ...
  ],
  "usedPreAggregations": {
    "dev_pre_aggregations.accounts_by_updated_at_date": {
    // ...
    },
    "dev_pre_aggregations.user_statistics_by_updated_at_date": {
    // ...
    }
  }

Thanks!

@paveltiunov
Copy link
Member

@ilya-biryukov We need to fix this one on the Cube Store side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:pre-aggregations Issues related to pre-aggregations bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants