Skip to content

Commit

Permalink
Address some of the more straight forward issues found in release tes…
Browse files Browse the repository at this point in the history
…ting
  • Loading branch information
serprex authored and jonels-msft committed Sep 4, 2019
1 parent 51966fc commit fd292a8
Show file tree
Hide file tree
Showing 11 changed files with 19 additions and 106 deletions.
52 changes: 0 additions & 52 deletions admin_guide/cluster_management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -668,55 +668,3 @@ In the new db on every worker, manually run:
Now the new database will be operating as another Citus cluster.

.. _phone_home:

Checks For Updates and Cluster Statistics
=========================================

Unless you opt out, Citus checks if there is a newer version of itself during installation and every twenty-four hours thereafter. If a new version is available, Citus emits a notice to the database logs:

::

a new minor release of Citus (X.Y.Z) is available

During the check for updates, Citus also sends general anonymized information about the running cluster to Citus Data company servers. This helps us understand how Citus is commonly used and thereby improve the product. As explained below, the reporting is opt-out and does **not** contain personally identifying information about schemas, tables, queries, or data.

What we Collect
---------------

1. Citus checks if there is a newer version of itself, and if so emits a notice to the database logs.
2. Citus collects and sends these statistics about your cluster:

* Randomly generated cluster identifier
* Number of workers
* OS version and hardware type (output of ``uname -psr`` command)
* Number of tables, rounded to a power of two
* Total size of shards, rounded to a power of two
* Whether Citus is running in Docker or natively

Because Citus is an open-source PostgreSQL extension, the statistics reporting code is available for you to audit. See `statistics_collection.c <https://github.com/citusdata/citus/blob/master/src/backend/distributed/utils/statistics_collection.c>`_.

How to Opt Out
--------------

If you wish to disable our anonymized cluster statistics gathering, set the following GUC in postgresql.conf on your coordinator node:

.. code-block:: ini
citus.enable_statistics_collection = off
This disables all reporting and in fact all communication with Citus Data servers, including checks for whether a newer version of Citus is available.

If you have super-user SQL access you can also achieve this without needing to find and edit the configuration file. Just execute the following statement in psql:

.. code-block:: postgresql
ALTER SYSTEM SET citus.enable_statistics_collection = 'off';
Since Docker users won't have the chance to edit this PostgreSQL variable before running the image, we added a Docker flag to disable reports.

.. code-block:: bash
# Docker flag prevents reports
docker run -e DISABLE_STATS_COLLECTION=true citusdata/citus:latest
12 changes: 6 additions & 6 deletions develop/migration_mt_django.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ Django automatically creates a simple "id" primary key on models, so we will nee
ALTER TABLE myapp_manager
ADD CONSTRAINT myapp_manager_pkey
PRIMARY KEY (account_id, id)
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
Expand All @@ -215,7 +215,7 @@ Django automatically creates a simple "id" primary key on models, so we will nee
ALTER TABLE myapp_project
ADD CONSTRAINT myapp_product_pkey
PRIMARY KEY (account_id, id)
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
Expand All @@ -224,7 +224,7 @@ Django automatically creates a simple "id" primary key on models, so we will nee
ALTER TABLE myapp_task
ADD CONSTRAINT myapp_task_pkey
PRIMARY KEY (account_id, id)
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
Expand Down Expand Up @@ -433,9 +433,9 @@ the distribution column.
For ``ForeignKey`` and ``OneToOneField`` constraint, we have a few different cases:
- Foreign key (or One to One) between distributed tables, for which you should use the ``TenantForeignKey`` (or ``TenantOneToOneField``).
- Foreign key between a distributed table and a reference table, which don't require changed.
- Foreign key between a distributed table and a local table, which require to drop the constraint by using ``models.ForeignKey(MyModel, on_delete=models.CASCADE, db_constraint=False)``.
- Foreign keys (or One to One) between distributed tables, for which you should use the ``TenantForeignKey`` (or ``TenantOneToOneField``).
- Foreign keys between a distributed table and a reference table don't require a change.
- Foreign keys between a distributed table and a local table, which require to drop the constraint by using ``models.ForeignKey(MyModel, on_delete=models.CASCADE, db_constraint=False)``.
Finally your models should look like this:
Expand Down
4 changes: 0 additions & 4 deletions installation/multi_machine_debian.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,3 @@ At this step, you have completed the installation process and are ready to use y
::

sudo -i -u postgres psql

.. note::

Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
4 changes: 0 additions & 4 deletions installation/multi_machine_rhel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,3 @@ At this step, you have completed the installation process and are ready to use y
::

sudo -i -u postgres psql

.. note::

Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
4 changes: 0 additions & 4 deletions installation/single_machine_debian.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,3 @@ To verify that the installation has succeeded we check that the coordinator node
You should see a row for each worker node including the node name and port.

At this step, you have completed the installation process and are ready to use your Citus cluster. To help you get started, we have a :ref:`tutorial<multi_tenant_tutorial>` which has instructions on setting up a Citus cluster with sample data in minutes.

.. note::

Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
4 changes: 0 additions & 4 deletions installation/single_machine_docker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,3 @@ When you wish to stop the docker containers, use Docker Compose:
.. code-block:: bash
COMPOSE_PROJECT_NAME=citus docker-compose down -v
.. note::

Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
4 changes: 0 additions & 4 deletions installation/single_machine_rhel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,3 @@ To verify that the installation has succeeded we check that the coordinator node
You should see a row for each worker node including the node name and port.

At this step, you have completed the installation process and are ready to use your Citus cluster. To help you get started, we have a :ref:`tutorial<multi_tenant_tutorial>` which has instructions on setting up a Citus cluster with sample data in minutes.

.. note::

Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
17 changes: 1 addition & 16 deletions reference/common_errors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,25 +130,10 @@ Cannot establish a new connection for placement *n*, since DML has been executed
ERROR: 25001: cannot establish a new connection for placement 314, since DML has been executed on a connection that is in use
LOCATION: FindPlacementListConnection, placement_connection.c:612

This is a current limitation. In a single transaction Citus does not support running insert/update statements with the :ref:`router_executor` that reference multiple shards, followed by a read query that consults both of the shards.

.. note::

A similar error also occurs (misleadingly) when the :ref:`create_distributed_table` function is executed on a table by a role other than the table's owner. See this `github discussion <https://github.com/citusdata/citus/issues/2094>`_ for details. To resolve this particular problem, identify the table's owner, switch roles, and try again.

.. code-block:: sql
-- find the role
SELECT tablename, tableowner FROM pg_tables;
-- switch into it
SET ROLE table_owner_name;
Also note that ``table_owner_name`` must have LOGIN permissions on the worker nodes.

Resolution
~~~~~~~~~~

Consider moving the read query into a separate transaction.
:ref:`Upgrade <upgrading>` to Citus 8.3 or higher.

Could not connect to server: Cannot assign requested address
------------------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion use_cases/multi_tenant.rst
Original file line number Diff line number Diff line change
Expand Up @@ -467,7 +467,7 @@ To improve resource allocation and make guarantees of tenant QoS it is worthwhil

In our case, let's imagine that our old friend company id=5 is very large. We can isolate the data for this tenant in two steps. We'll present the commands here, and you can consult :ref:`tenant_isolation` to learn more about them.

First sequester the tenant's data into a bundle (called a shard) suitable to move. The CASCADE option also applies this change to the rest of our tables distributed by :code:`company_id`.
First isolate the tenant's data to a dedicated shard suitable to move. The CASCADE option also applies this change to the rest of our tables distributed by :code:`company_id`.

.. code-block:: sql
Expand Down
16 changes: 8 additions & 8 deletions use_cases/realtime_analytics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -298,16 +298,16 @@ to the query in our rollup function:
@@ -1,10 +1,12 @@
INSERT INTO http_request_1min (
site_id, ingest_time, request_count,
success_count, error_count, average_response_time_msec,
+ distinct_ip_addresses
success_count, error_count, average_response_time_msec
+ , distinct_ip_addresses
) SELECT
site_id,
minute,
COUNT(1) as request_count,
SUM(CASE WHEN (status_code between 200 and 299) THEN 1 ELSE 0 END) as success_count,
SUM(CASE WHEN (status_code between 200 and 299) THEN 0 ELSE 1 END) as error_count,
SUM(response_time_msec) / COUNT(1) AS average_response_time_msec,
+ hll_add_agg(hll_hash_text(ip_address)) AS distinct_ip_addresses
SUM(response_time_msec) / COUNT(1) AS average_response_time_msec
+ , hll_add_agg(hll_hash_text(ip_address)) AS distinct_ip_addresses
FROM http_request
Dashboard queries are a little more complicated, you have to read out the distinct
Expand Down Expand Up @@ -362,17 +362,17 @@ Next, include it in the rollups by modifying the rollup function:
@@ -1,14 +1,19 @@
INSERT INTO http_request_1min (
site_id, ingest_time, request_count,
success_count, error_count, average_response_time_msec,
+ country_counters
success_count, error_count, average_response_time_msec
+ , country_counters
) SELECT
site_id,
minute,
COUNT(1) as request_count,
SUM(CASE WHEN (status_code between 200 and 299) THEN 1 ELSE 0 END) as success_c
SUM(CASE WHEN (status_code between 200 and 299) THEN 0 ELSE 1 END) as error_cou
SUM(response_time_msec) / COUNT(1) AS average_response_time_msec,
SUM(response_time_msec) / COUNT(1) AS average_response_time_msec
- FROM http_request
+ jsonb_object_agg(request_country, country_count) AS country_counters
+ , jsonb_object_agg(request_country, country_count) AS country_counters
+ FROM (
+ SELECT *,
+ count(1) OVER (
Expand Down
6 changes: 3 additions & 3 deletions use_cases/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Keep in mind that, in the wrong situation, reading all these partitions can hurt
Scaling Timeseries Data on Citus
--------------------------------

We can mix the single-node table partitioning techniques with Citus' distributed sharding to make a scalable time-series database. It's the best of both worlds. It's especially elegant atop the declarative table partitioning in Postgres 10.
We can mix the single-node table partitioning techniques with Citus' distributed sharding to make a scalable time-series database. It's the best of both worlds. It's especially elegant atop Postgres's declarative table partitioning.

.. image:: ../images/timeseries-sharding-and-partitioning.png

Expand Down Expand Up @@ -131,7 +131,7 @@ As time progresses, pg_partman will need to do some maintenance to create new pa
-- due to aggressive locks
SELECT partman.run_maintenance(p_analyze := false);
It's best to set up a periodic job to run the maintenance function. Pg_partman can be built with support for a background worker (BGW) process to do this. Or we can use another extension like `pg_cron <https://github.com/citusdata/pg_cron>`_:
It's best to set up a periodic job to run the maintenance function. Pg_partman can be built with support for a background worker process to do this. Or we can use another extension like `pg_cron <https://github.com/citusdata/pg_cron>`_:

.. code-block:: postgresql
Expand All @@ -154,4 +154,4 @@ Now whenever maintenance runs, partitions older than a month are automatically d

.. note::

Be aware that native partitioning in Postgres is still quite new and has a few quirks. For example, you cannot directly create an in index on a partitioned table. Instead, pg_partman lets you create a template table to define indexes for new partitions. Maintenance operations on partitioned tables will also acquire aggressive locks that can briefly stall queries. There is currently a lot of work going on within the postgres community to resolve these issues, so expect time partitioning in Postgres to only get better.
Be aware that native partitioning in Postgres is still quite new and has a few quirks. For example, you cannot directly create an index on a partitioned table. Instead, pg_partman lets you create a template table to define indexes for new partitions. Maintenance operations on partitioned tables will also acquire aggressive locks that can briefly stall queries. There is currently a lot of work going on within the postgres community to resolve these issues, so expect time partitioning in Postgres to only get better.

0 comments on commit fd292a8

Please sign in to comment.