Skip to content

Introduction to the use cases of sharding and partitioning #360

@amotl

Description

@amotl

Documentation feedback


Both this page, and the reference docs page give reasonable insights into sharding in their introduction/preamble sections. However, when asked an LLM-based documentation bot about "Tell me about sharding in CrateDB.", it also comes up with a reasonable response which touches all the important topics easily. We may want to learn fragments from that.

Sharding in CrateDB involves splitting your data into smaller units called shards. Each shard is stored on a node within the cluster, allowing CrateDB to scale horizontally by distributing data and workload across multiple nodes. If the number of shards is not specified when you create a table, CrateDB automatically sets a sensible default based on the number of nodes or available CPUs in your cluster.

CrateDB automatically redistributes shards whenever nodes are added, removed, or if the data distribution is unbalanced, helping to maintain optimal performance. The recommended shard size generally falls between 3–70 GB, depending on your specific use case. Creating the right number of shards can optimise performance and make the most efficient use of hardware resources. Each shard in CrateDB is backed by a Lucene index, which manages the actual data storage and retrieval.

Partitioning and sharding can be used together for large-scale deployments. Partitioning splits tables based on a column's value (such as date), and each partition is then further divided into shards for efficient data management and querying.

For detailed configuration and best practices, you can refer to CrateDB's official documentation on sharding and partitioning best practices.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions