Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/.prettierrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"tabWidth": 2,
"useTabs": false,
"semi": true,
"singleQuote": true,
"singleQuote": false,
"arrowParens": "always",
"trailingComma": "es5",
"bracketSpacing": true,
Expand Down
10 changes: 5 additions & 5 deletions docs/content/Auth/Security-Context.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ context claims to evaluate access control rules. Inbound JWTs are decoded and
verified using industry-standard [JSON Web Key Sets (JWKS)][link-auth0-jwks].

For access control or authorization, Cube allows you to define granular access
control rules for every cube in your data schema. Cube uses both the request and
control rules for every cube in your data model. Cube uses both the request and
security context claims in the JWT token to generate a SQL query, which includes
row-level constraints from the access control rules.

Expand Down Expand Up @@ -132,11 +132,11 @@ LIMIT 10000
In the example below `user_id`, `company_id`, `sub` and `iat` will be injected
into the security context and will be accessible in both the [Security
Context][ref-schema-sec-ctx] and [`COMPILE_CONTEXT`][ref-cubes-compile-ctx]
global variable in the Cube Data Schema.
global variable in the Cube data model.

<InfoBox>

`COMPILE_CONTEXT` is used by Cube at schema compilation time, which allows
`COMPILE_CONTEXT` is used by Cube at data model compilation time, which allows
changing the underlying dataset completely; the Security Context is only used at
query execution time, which simply filters the dataset with a `WHERE` clause.

Expand All @@ -151,8 +151,8 @@ query execution time, which simply filters the dataset with a `WHERE` clause.
}
```

With the same JWT payload as before, we can modify schemas before they are
compiled. The following schema will ensure users only see results for their
With the same JWT payload as before, we can modify models before they are
compiled. The following cube will ensure users only see results for their
`company_id` in a multi-tenant deployment:

```javascript
Expand Down
30 changes: 14 additions & 16 deletions docs/content/Caching/Getting-Started-Pre-Aggregations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ layer][ref-caching-preaggs-cubestore].
## Pre-Aggregations without Time Dimension

To illustrate pre-aggregations with an example, let's use a sample e-commerce
database. We have a schema representing all our `Orders`:
database. We have a data model representing all our `Orders`:

```javascript
cube(`Orders`, {
Expand Down Expand Up @@ -106,9 +106,9 @@ cube(`Orders`, {

## Pre-Aggregations with Time Dimension

Using the same schema as before, we are now finding that users frequently query
for the number of orders completed per day, and that this query is performing
poorly. This query might look something like:
Using the same data model as before, we are now finding that users frequently
query for the number of orders completed per day, and that this query is
performing poorly. This query might look something like:

```json
{
Expand All @@ -118,7 +118,7 @@ poorly. This query might look something like:
```

In order to improve the performance of this query, we can add another
pre-aggregation definition to the `Orders` schema:
pre-aggregation definition to the `Orders` cube:

```javascript
cube(`Orders`, {
Expand Down Expand Up @@ -245,7 +245,7 @@ fields and still get a correct result:
| 2021-01-22 00:00:00.000000 | 13 | 150 |

This means that `quantity` and `price` are both **additive measures**, and we
can represent them in the `LineItems` schema as follows:
can represent them in the `LineItems` cube as follows:

```javascript
cube(`LineItems`, {
Expand Down Expand Up @@ -340,7 +340,7 @@ $$
We can clearly see that `523` **does not** equal `762.204545454545455`, and we
cannot treat the `profit_margin` column the same as we would any other additive
measure. Armed with the above knowledge, we can add the `profit_margin` field to
our schema **as a [dimension][ref-schema-dims]**:
our cube **as a [dimension][ref-schema-dims]**:

```javascript
cube(`LineItems`, {
Expand Down Expand Up @@ -437,17 +437,15 @@ To recap what we've learnt so far:
`count`, `sum`, `min`, `max` or `countDistinctApprox`

Cube looks for matching pre-aggregations in the order they are defined in a
cube's schema file. Each defined pre-aggregation is then tested for a match
cube's data model file. Each defined pre-aggregation is then tested for a match
based on the criteria in the flowchart below:

<div
style="text-align: center"
>
<div style="text-align: center">
<img
alt="Pre-Aggregation Selection Flowchart"
src="https://ucarecdn.com/f986b0cb-a9ea-47b7-a743-ca9a4644c246/"
style="border: none"
width="100%"
alt="Pre-Aggregation Selection Flowchart"
src="https://ucarecdn.com/f986b0cb-a9ea-47b7-a743-ca9a4644c246/"
style="border: none"
width="100%"
/>
</div>

Expand All @@ -470,7 +468,7 @@ Some extra considerations for pre-aggregation selection:
`['2020-01-01T00:00:00.000', '2020-01-01T23:59:59.999']`. Date ranges are
inclusive, and the minimum granularity is `second`.

- The order in which pre-aggregations are defined in schemas matter; the first
- The order in which pre-aggregations are defined in models matter; the first
matching pre-aggregation for a query is the one that is used. Both the
measures and dimensions of any cubes specified in the query are checked to
find a matching `rollup`.
Expand Down
11 changes: 5 additions & 6 deletions docs/content/Caching/Overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ more about read-only support and pre-aggregation build strategies.

</InfoBox>

Pre-aggregations are defined in the data schema. You can learn more about
defining pre-aggregations in [schema reference][ref-schema-ref-preaggs].
Pre-aggregations are defined in the data model. You can learn more about
defining pre-aggregations in [data modeling reference][ref-schema-ref-preaggs].

```javascript
cube(`Orders`, {
Expand Down Expand Up @@ -142,10 +142,9 @@ The default values for `refreshKey` are
- `every: '10 second'` for all other databases.

+You can use a custom SQL query to check if a refresh is required by changing
the [`refreshKey`][ref-schema-ref-cube-refresh-key] property in a cube's Data
Schema. Often, a `MAX(updated_at_timestamp)` for OLTP data is a viable option,
or examining a metadata table for whatever system is managing the data to see
when it last ran.
the [`refreshKey`][ref-schema-ref-cube-refresh-key] property in a cube. Often, a
`MAX(updated_at_timestamp)` for OLTP data is a viable option, or examining a
metadata table for whatever system is managing the data to see when it last ran.

### <--{"id" : "In-memory Cache"}--> Disabling the cache

Expand Down
3 changes: 2 additions & 1 deletion docs/content/Caching/Using-Pre-Aggregations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ menuOrder: 3

Pre-aggregations is a powerful way to speed up your Cube queries. There are many
configuration options to consider. Please make sure to also check [the
Pre-Aggregations reference in the data schema section][ref-schema-ref-preaggs].
Pre-Aggregations reference in the data modeling
section][ref-schema-ref-preaggs].

## Refresh Strategy

Expand Down
20 changes: 10 additions & 10 deletions docs/content/Configuration/Advanced/Multitenancy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ subCategory: Advanced
menuOrder: 3
---

Cube supports multitenancy out of the box, both on database and data schema
Cube supports multitenancy out of the box, both on database and data model
levels. Multiple drivers are also supported, meaning that you can have one
customer’s data in MongoDB and others in Postgres with one Cube instance.

Expand Down Expand Up @@ -34,7 +34,7 @@ combinations of these configuration options.

### <--{"id" : "Multitenancy"}--> Multitenancy vs Multiple Data Sources

In cases where your Cube schema is spread across multiple different data
In cases where your Cube data model is spread across multiple different data
sources, consider using the [`dataSource` cube property][ref-cube-datasource]
instead of multitenancy. Multitenancy is designed for cases where you need to
serve different datasets for multiple users, or tenants which aren't related to
Expand Down Expand Up @@ -169,7 +169,7 @@ cube(`Products`, {
### <--{"id" : "Multitenancy"}--> Running in Production

Each unique id generated by `contextToAppId` or `contextToOrchestratorId` will
generate a dedicated set of resources, including schema compile cache, SQL
generate a dedicated set of resources, including data model compile cache, SQL
compile cache, query queues, in-memory result caching, etc. Depending on your
data model complexity and usage patterns, those resources can have a pretty
sizable memory footprint ranging from single-digit MBs on the lower end and
Expand Down Expand Up @@ -219,7 +219,7 @@ module.exports = {
};
```

## Multiple DB Instances with Same Schema
## Multiple DB Instances with Same Data Model

Let's consider an example where we store data for different users in different
databases, but on the same Postgres host. The database name format is
Expand Down Expand Up @@ -249,12 +249,12 @@ select the database, based on the `appId` and `userId`:
<WarningBox>

The App ID (the result of [`contextToAppId`][ref-config-ctx-to-appid]) is used
as a caching key for various in-memory structures like schema compilation
as a caching key for various in-memory structures like data model compilation
results, connection pool. The Orchestrator ID (the result of
[`contextToOrchestratorId`][ref-config-ctx-to-orch-id]) is used as a caching key
for database connections, execution queues and pre-aggregation table caches. Not
declaring these properties will result in unexpected caching issues such as
schema or data of one tenant being used for another.
declaring these properties will result in unexpected caching issues such as the
data model or data of one tenant being used for another.

</WarningBox>

Expand Down Expand Up @@ -292,7 +292,7 @@ module.exports = {
};
```

## Multiple Schema and Drivers
## Multiple Data Models and Drivers

What if for application with ID 3, the data is stored not in Postgres, but in
MongoDB?
Expand All @@ -301,9 +301,9 @@ We can instruct Cube to connect to MongoDB in that case, instead of Postgres. To
do this, we'll use the [`driverFactory`][ref-config-driverfactory] option to
dynamically set database type. We will also need to modify our
[`securityContext`][ref-config-security-ctx] to determine which tenant is
requesting data. Finally, we want to have separate data schemas for every
requesting data. Finally, we want to have separate data models for every
application. We can use the [`repositoryFactory`][ref-config-repofactory] option
to dynamically set a repository with schema files depending on the `appId`:
to dynamically set a repository with data model files depending on the `appId`:

**cube.js:**

Expand Down
4 changes: 2 additions & 2 deletions docs/content/Configuration/Downstream/Superset.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ a new database:
Your cubes will be exposed as tables, where both your measures and dimensions
are columns.

Let's use the following Cube data schema:
Let's use the following Cube data model:

```javascript
cube(`Orders`, {
Expand Down Expand Up @@ -124,7 +124,7 @@ a time grain of `month`.

The `COUNT(*)` aggregate function is being mapped to a measure of type
[count](/schema/reference/types-and-formats#measures-types-count) in Cube's
**Orders** schema file.
**Orders** data model file.

## Additional Configuration

Expand Down
9 changes: 6 additions & 3 deletions docs/content/Deployment/Cloud/Continuous-Deployment.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,19 @@ Cube Cloud will automatically deploy from the specified production branch

<WarningBox>

Enabling this option will cause the Schema page to display the last known state of a Git-based codebase (if available), instead of reflecting the latest modifications made.
It is important to note that the logic will still be updated in both the API and the Playground.
Enabling this option will cause the <Btn>Data Model</Btn> page to display the
last known state of a Git-based codebase (if available), instead of reflecting
the latest modifications made. It is important to note that the logic will still
be updated in both the API and the Playground.

</WarningBox>

You can use the CLI to set up continuous deployment for a Git repository. You
can also use the CLI to manually deploy changes without continuous deployment.

### <--{"id" : "Deploy with CLI"}--> Manual Deploys

You can deploy your Cube project manually. This method uploads data schema and
You can deploy your Cube project manually. This method uploads data models and
configuration files directly from your local project directory.

You can obtain Cube Cloud deploy token from your deployment **Settings** page.
Expand Down
24 changes: 13 additions & 11 deletions docs/content/Deployment/Overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ API instances.

API instances and Refresh Workers can be configured via [environment
variables][ref-config-env] or the [`cube.js` configuration file][ref-config-js].
They also need access to the data schema files. Cube Store clusters can be
They also need access to the data model files. Cube Store clusters can be
configured via environment variables.

You can find an example Docker Compose configuration for a Cube deployment in
Expand All @@ -57,21 +57,22 @@ requests between multiple API instances.

The [Cube Docker image][dh-cubejs] is used for API Instance.

API instance needs to be configured via environment variables, cube.js file and
has access to the data schema files.
API instances can be configured via environment variables or the `cube.js`
configuration file, and **must** have access to the data model files (as
specified by [`schemaPath`][ref-conf-ref-schemapath].

## Refresh Worker

A Refresh Worker updates pre-aggregations and invalidates the in-memory cache in
the background. They also keep the refresh keys up-to-date for all defined
schemas and pre-aggregations. Please note that the in-memory cache is just
invalidated but not populated by Refresh Worker. In-memory cache is populated
lazily during querying. On the other hand, pre-aggregations are eagerly
populated and kept up-to-date by Refresh Worker.
the background. They also keep the refresh keys up-to-date for all data models
and pre-aggregations. Please note that the in-memory cache is just invalidated
but not populated by Refresh Worker. In-memory cache is populated lazily during
querying. On the other hand, pre-aggregations are eagerly populated and kept
up-to-date by Refresh Worker.

[Cube Docker image][dh-cubejs] can be used for creating Refresh Workers; to make
the service act as a Refresh Worker, `CUBEJS_REFRESH_WORKER=true` should be set
in the environment variables.
The [Cube Docker image][dh-cubejs] can be used for creating Refresh Workers; to
make the service act as a Refresh Worker, `CUBEJS_REFRESH_WORKER=true` should be
set in the environment variables.

## Cube Store

Expand Down Expand Up @@ -275,6 +276,7 @@ guide][blog-migration-guide].
[ref-deploy-docker]: /deployment/platforms/docker
[ref-config-env]: /reference/environment-variables
[ref-config-js]: /config
[ref-conf-ref-schemapath]: /config#options-reference-schema-path
[redis]: https://redis.io
[ref-config-redis]: /reference/environment-variables#cubejs-redis-password
[blog-details]: https://cube.dev/blog/how-you-win-by-using-cube-store-part-1
Expand Down
44 changes: 26 additions & 18 deletions docs/content/Deployment/Production-Checklist.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -97,37 +97,45 @@ deployment's health and be alerted to any issues.

## Appropriate cluster sizing

There's no one-size-fits-all when it comes to sizing Cube cluster, and its resources.
Resources required by Cube depend a lot on the amount of traffic Cube needs to serve and the amount of data it needs to process.
The following sizing estimates are based on default settings and are very generic, which may not fit your Cube use case, so you should always tweak resources based on consumption patterns you see.
There's no one-size-fits-all when it comes to sizing a Cube cluster and its
resources. Resources required by Cube significantly depend on the amount of
traffic Cube needs to serve and the amount of data it needs to process. The
following sizing estimates are based on default settings and are very generic,
which may not fit your Cube use case, so you should always tweak resources based
on consumption patterns you see.

### <--{"id" : "Appropriate cluster sizing"}--> Memory and CPU

Each Cube cluster should contain at least 2 Cube API instances.
Every Cube API instance should have at least 3GB of RAM and 2 CPU cores allocated for it.
Each Cube cluster should contain at least 2 Cube API instances. Every Cube API
instance should have at least 3GB of RAM and 2 CPU cores allocated for it.

Refresh workers tend to be much more CPU and memory intensive, so at least 6GB of RAM is recommended.
Please note that to take advantage of all available RAM, the Node.js heap size should be adjusted accordingly
by using the [`--max-old-space-size` option][node-heap-size]:
Refresh workers tend to be much more CPU and memory intensive, so at least 6GB
of RAM is recommended. Please note that to take advantage of all available RAM,
the Node.js heap size should be adjusted accordingly by using the
[`--max-old-space-size` option][node-heap-size]:

```sh
NODE_OPTIONS="--max-old-space-size=6144"
```

[node-heap-size]: https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes
[node-heap-size]:
https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes

The Cube Store router node should have at least 6GB of RAM and 4 CPU cores allocated for it.
Every Cube Store worker node should have at least 8GB of RAM and 4 CPU cores allocated for it.
The Cube Store cluster should have at least two worker nodes.
The Cube Store router node should have at least 6GB of RAM and 4 CPU cores
allocated for it. Every Cube Store worker node should have at least 8GB of RAM
and 4 CPU cores allocated for it. The Cube Store cluster should have at least
two worker nodes.

### <--{"id" : "Appropriate cluster sizing"}--> RPS and data volume

Depending on schema size, every Core Cube API instance can serve 1 to 10 requests per second.
Every Core Cube Store router node can serve 50-100 queries per second.
As a rule of thumb, you should provision 1 Cube Store worker node per one Cube Store partition or 1M of rows scanned in a query.
For example if your queries scan 16M of rows per query, you should have at least 16 Cube Store worker nodes provisioned.
`EXPLAIN ANALYZE` can be used to see partitions involved in a Cube Store query.
Cube Cloud ballpark performance numbers can differ as it has different Cube runtime.
Depending on data model size, every Core Cube API instance can serve 1 to 10
requests per second. Every Core Cube Store router node can serve 50-100 queries
per second. As a rule of thumb, you should provision 1 Cube Store worker node
per one Cube Store partition or 1M of rows scanned in a query. For example if
your queries scan 16M of rows per query, you should have at least 16 Cube Store
worker nodes provisioned. `EXPLAIN ANALYZE` can be used to see partitions
involved in a Cube Store query. Cube Cloud ballpark performance numbers can
differ as it has different Cube runtime.

[blog-migrate-to-cube-cloud]:
https://cube.dev/blog/migrating-from-self-hosted-to-cube-cloud/
Expand Down
Loading