Skip to content

Commit

Permalink
Remove range partitioning guide and references
Browse files Browse the repository at this point in the history
  • Loading branch information
begriffs committed Jun 14, 2016
1 parent 036ce4f commit 8feb7d6
Show file tree
Hide file tree
Showing 7 changed files with 7 additions and 64 deletions.
3 changes: 2 additions & 1 deletion admin_guide/transitioning_from_postgresql_to_citus.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ To move your data from a PostgreSQL table to a distributed table, you can copy
out the data into a csv file and then use the \\copy command to load it into a
distributed table. Alternatively, you could copy out data from the local table and
directly pipe it to a copy into the distributed table. For example:

::
psql -c "COPY local_table TO STDOUT" | psql -c "COPY distributed_table FROM STDIN"
psql -c "COPY local_table TO STDOUT" | psql -c "COPY distributed_table FROM STDIN"

One thing to note as you transition from a single node to multiple nodes is that you should create your extensions, operators, user defined functions, and custom data types on all nodes.

Expand Down
2 changes: 1 addition & 1 deletion dist_tables/querying.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ Repartition joins

In some cases, you may need to join two tables on columns other than the distribution column. For such cases, Citus also allows joining on non-distribution key columns by dynamically repartitioning the tables for the query.

In such cases, the best partition method (hash or range) and the table(s) to be partitioned is determined by the query optimizer on the basis of the distribution columns, join keys and sizes of the tables. With repartitioned tables, it can be ensured that only relevant shard pairs are joined with each other reducing the amount of data transferred across network drastically.
In such cases the table(s) to be partitioned are determined by the query optimizer on the basis of the distribution columns, join keys and sizes of the tables. With repartitioned tables, it can be ensured that only relevant shard pairs are joined with each other reducing the amount of data transferred across network drastically.

In general, colocated joins are more efficient than repartition joins as repartition joins require shuffling of data. So, you should try to distribute your tables by the common join keys whenever possible.

Expand Down
56 changes: 0 additions & 56 deletions dist_tables/range_distribution.rst

This file was deleted.

4 changes: 2 additions & 2 deletions dist_tables/working_with_distributed_tables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ The best option for the distribution column varies depending on the use case and
Distribution Method
-------------------

The next step after choosing the right distribution column is deciding the right distribution method. Citus supports two distribution methods: append and hash. Citus also provides the option for range distribution but that currently requires manual effort to set up.
The next step after choosing the right distribution column is deciding the right distribution method. Citus supports two distribution methods: append and hash.

As the name suggests, append based distribution is more suited to append-only use cases. This typically includes event based data which arrives in a time-ordered series. You can then distribute your largest tables by time, and batch load your events into Citus in intervals of N minutes. This data model can be generalized to a number of time series use cases; for example, each line in a website's log file, machine activity logs or aggregated website events. Append based distribution supports more efficient range queries. This is because given a range query on the distribution key, the Citus query planner can easily determine which shards overlap that range and send the query to only to relevant shards.

Hash based distribution is more suited to cases where you want to do real-time inserts along with analytics on your data or want to distribute by a non-ordered column (eg. user id). This data model is relevant for real-time analytics use cases; for example, actions in a mobile application, user website events, or social media analytics. In this case, Citus will maintain minimum and maximum hash ranges for all the created shards. Whenever a row is inserted, updated or deleted, Citus will redirect the query to the correct shard and issue it locally. This data model is more suited for doing co-located joins and for queries involving equality based filters on the distribution column.

Citus uses slightly different syntaxes for creation and manipulation of append and hash distributed tables. Also, the operations supported on the tables differ based on the distribution method chosen. In the sections that follow, we describe the syntax for creating append and hash distributed tables, and also describe the operations which can be done on them. We also briefly discuss how you can setup range distribution manually.
Citus uses slightly different syntaxes for creation and manipulation of append and hash distributed tables. Also, the operations supported on the tables differ based on the distribution method chosen. In the sections that follow, we describe the syntax for creating append and hash distributed tables, and also describe the operations which can be done on them.
1 change: 0 additions & 1 deletion index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ topics.
dist_tables/working_with_distributed_tables.rst
dist_tables/append_distribution.rst
dist_tables/hash_distribution.rst
dist_tables/range_distribution.rst
dist_tables/querying.rst
dist_tables/postgresql_extensions.rst

Expand Down
1 change: 0 additions & 1 deletion reference/metadata_tables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ The pg_dist_partition table stores metadata about which tables in the database a
| | | | column corresponding to different distribution methods are :- |
| | | | append: 'a' |
| | | | hash: 'h' |
| | | | range: 'r' |
+----------------+----------------------+---------------------------------------------------------------------------+
| partkey | text | | Detailed information about the distribution column including column |
| | | | number, type and other relevant information. |
Expand Down
4 changes: 2 additions & 2 deletions reference/user_defined_functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Arguments

**distribution_column:** The column on which the table is to be distributed.

**distribution_method:** The method according to which the table is to be distributed. Permissible values are append, hash or range.
**distribution_method:** The method according to which the table is to be distributed. Permissible values are append or hash.

Return Value
********************************
Expand Down Expand Up @@ -221,7 +221,7 @@ A tuple containing the following information:

**part_storage_type:** Type of storage used for the table. May be 't' (standard table), 'f' (foreign table) or 'c' (columnar table).

**part_method:** Distribution method used for the table. May be 'a' (append), 'h' (hash) or 'r' (range).
**part_method:** Distribution method used for the table. May be 'a' (append), or 'h' (hash).

**part_key:** Distribution column for the table.

Expand Down

0 comments on commit 8feb7d6

Please sign in to comment.