Skip to content

Commit

Permalink
Better example of low-cardinality dist column
Browse files Browse the repository at this point in the history
  • Loading branch information
begriffs committed Jan 19, 2017
1 parent 8a8c23b commit 2347678
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 1 deletion.
Binary file modified images/sharding-poorly-distributed.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion sharding/data_modeling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ While the multi-tenant architecture introduces a hierarchical structure and uses

Real-time queries typically ask for numeric aggregates grouped by date or category. Citus sends these queries to each shard for partial results and assembles the final answer on the coordinator node. Queries run fastest when as many nodes contribute as possible, and when no individual node bottlenecks.

The more evenly a choice of entity id distributes data to shards the better. At the least the column should have a high cardinality. For comparison, a binary gender field is a poor choice because it assumes at most two values. These values will not be able to take advantage of a cluster with many shards. The row placement will skew into only two shards:
The more evenly a choice of entity id distributes data to shards the better. At the least the column should have a high cardinality. For comparison, a "status" field on an order table is a poor choice of distribution column because it assumes at most a few values. These values will not be able to take advantage of a cluster with many shards. The row placement will skew into a small handful of shards:

.. image:: ../images/sharding-poorly-distributed.png

Expand Down

0 comments on commit 2347678

Please sign in to comment.