Skip to content

Commit

Permalink
Add "when to use" to the What is Citus page (#92)
Browse files Browse the repository at this point in the history
* Add "when to use" to the What is Citus page

* Shorten the considerations section

Note JOIN support

Note query parallelization engine
  • Loading branch information
begriffs committed Jun 22, 2016
1 parent e5df236 commit e2daf9b
Showing 1 changed file with 34 additions and 1 deletion.
35 changes: 34 additions & 1 deletion aboutcitus/what_is_citus.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,39 @@
What is Citus?
==============

Citus horizontally scales PostgreSQL across commodity servers using sharding and replication. Its query engine parallelizes incoming SQL queries across these servers to enable real-time responses on large datasets.
Citus horizontally scales PostgreSQL across multiple machines using sharding and replication. Its query engine parallelizes incoming SQL queries across these servers to enable human real-time (a.k.a. less than a second) responses on large datasets.

Citus extends the underlying database rather than forking it, which gives developers and enterprises the power and familiarity of a traditional relational database. As an extension, Citus supports new PostgreSQL releases, allowing users to benefit from new features while maintaining compatibility with existing PostgreSQL tools.

When to Use Citus
-----------------

Example use cases:

* Analytic dashboards with subsecond response times
* Exploratory queries on unfolding events, including JOIN queries
* Aggregate reports on large archives
* Analyzing sessions with funnels, segmentation, and cohorts
* Extending a data warehouse with real-time capabilities

For concrete examples check out our customer `use cases <https://www.citusdata.com/solutions/case-studies>`_. Typical Citus workloads are operational, with aggregate queries and no long-lived transactions.

Considerations for Use
----------------------

Although Citus extends PostgreSQL with distributed functionality, it is not a drop-in replacement that scales out all workloads. A performant Citus cluster involves thinking about the data model, tooling, and choice of SQL features used.

Data models that have fewer tables (<10) work much better than those that have hundreds of tables. This is a property of distributed systems: the more tables, the more distributed dependencies.

A good way to think about tool and SQL feature coverage is the following: if your workload aligns with use-cases noted in the "When to use Citus" section and you happen to run into an unsupported query or tool, then there’s usually a good workaround.

When Citus is Inappropriate
---------------------------

Workloads which require a large (non-aggregated) flow of information between nodes generally do not work as well. For instance:

* Traditional data warehousing with long, free-form SQL
* Distributed transactions across multiple shards
* Queries that return data-heavy ETL results rather than summaries

These constraints come from the fact that we operate across many nodes (as compared to a single node database), giving you easy horizontal scaling as well as high availability.

0 comments on commit e2daf9b

Please sign in to comment.