Address some of the more straight forward issues found in release tes…

…ting
citusdata · Sep 4, 2019 · fd292a8 · fd292a8
1 parent 51966fc
commit fd292a8
Show file tree

Hide file tree

Showing 11 changed files with 19 additions and 106 deletions.
diff --git a/admin_guide/cluster_management.rst b/admin_guide/cluster_management.rst
@@ -668,55 +668,3 @@ In the new db on every worker, manually run:
 
 Now the new database will be operating as another Citus cluster.
 
-.. _phone_home:
-
-Checks For Updates and Cluster Statistics
-=========================================
-
-Unless you opt out, Citus checks if there is a newer version of itself during installation and every twenty-four hours thereafter. If a new version is available, Citus emits a notice to the database logs:
-
-::
-
-  a new minor release of Citus (X.Y.Z) is available
-
-During the check for updates, Citus also sends general anonymized information about the running cluster to Citus Data company servers. This helps us understand how Citus is commonly used and thereby improve the product. As explained below, the reporting is opt-out and does **not** contain personally identifying information about schemas, tables, queries, or data.
-
-What we Collect
----------------
-
-1. Citus checks if there is a newer version of itself, and if so emits a notice to the database logs.
-2. Citus collects and sends these statistics about your cluster:
-
-   * Randomly generated cluster identifier
-   * Number of workers
-   * OS version and hardware type (output of ``uname -psr`` command)
-   * Number of tables, rounded to a power of two
-   * Total size of shards, rounded to a power of two
-   * Whether Citus is running in Docker or natively
-
-Because Citus is an open-source PostgreSQL extension, the statistics reporting code is available for you to audit. See `statistics_collection.c <https://github.com/citusdata/citus/blob/master/src/backend/distributed/utils/statistics_collection.c>`_.
-
-How to Opt Out
---------------
-
-If you wish to disable our anonymized cluster statistics gathering, set the following GUC in postgresql.conf on your coordinator node:
-
-.. code-block:: ini
-
-  citus.enable_statistics_collection = off
-
-This disables all reporting and in fact all communication with Citus Data servers, including checks for whether a newer version of Citus is available.
-
-If you have super-user SQL access you can also achieve this without needing to find and edit the configuration file. Just execute the following statement in psql:
-
-.. code-block:: postgresql
-
-  ALTER SYSTEM SET citus.enable_statistics_collection = 'off';
-
-Since Docker users won't have the chance to edit this PostgreSQL variable before running the image, we added a Docker flag to disable reports.
-
-.. code-block:: bash
-
-  # Docker flag prevents reports
-
-  docker run -e DISABLE_STATS_COLLECTION=true citusdata/citus:latest
diff --git a/develop/migration_mt_django.rst b/develop/migration_mt_django.rst
@@ -206,7 +206,7 @@ Django automatically creates a simple "id" primary key on models, so we will nee
 
         ALTER TABLE myapp_manager
         ADD CONSTRAINT myapp_manager_pkey
-        PRIMARY KEY (account_id, id)
+        PRIMARY KEY (account_id, id);
       """),
 
       migrations.RunSQL("""
@@ -215,7 +215,7 @@ Django automatically creates a simple "id" primary key on models, so we will nee
 
         ALTER TABLE myapp_project
         ADD CONSTRAINT myapp_product_pkey
-        PRIMARY KEY (account_id, id)
+        PRIMARY KEY (account_id, id);
       """),
 
       migrations.RunSQL("""
@@ -224,7 +224,7 @@ Django automatically creates a simple "id" primary key on models, so we will nee
 
         ALTER TABLE myapp_task
         ADD CONSTRAINT myapp_task_pkey
-        PRIMARY KEY (account_id, id)
+        PRIMARY KEY (account_id, id);
       """),
 
       migrations.RunSQL("""
@@ -433,9 +433,9 @@ the distribution column.
 
 For ``ForeignKey`` and ``OneToOneField`` constraint, we have a few different cases:
 
-- Foreign key (or One to One) between distributed tables, for which you should use the ``TenantForeignKey`` (or ``TenantOneToOneField``).
-- Foreign key between a distributed table and a reference table, which don't require changed.
-- Foreign key between a distributed table and a local table, which require to drop the constraint by using ``models.ForeignKey(MyModel, on_delete=models.CASCADE, db_constraint=False)``.
+- Foreign keys (or One to One) between distributed tables, for which you should use the ``TenantForeignKey`` (or ``TenantOneToOneField``).
+- Foreign keys between a distributed table and a reference table don't require a change.
+- Foreign keys between a distributed table and a local table, which require to drop the constraint by using ``models.ForeignKey(MyModel, on_delete=models.CASCADE, db_constraint=False)``.
 
 Finally your models should look like this:
 

diff --git a/installation/multi_machine_debian.rst b/installation/multi_machine_debian.rst
@@ -113,7 +113,3 @@ At this step, you have completed the installation process and are ready to use y
 ::
 
   sudo -i -u postgres psql
-
-.. note::
-
-  Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
diff --git a/installation/multi_machine_rhel.rst b/installation/multi_machine_rhel.rst
@@ -119,7 +119,3 @@ At this step, you have completed the installation process and are ready to use y
 ::
 
   sudo -i -u postgres psql
-
-.. note::
-
-  Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
diff --git a/installation/single_machine_debian.rst b/installation/single_machine_debian.rst
@@ -87,7 +87,3 @@ To verify that the installation has succeeded we check that the coordinator node
 You should see a row for each worker node including the node name and port.
 
 At this step, you have completed the installation process and are ready to use your Citus cluster. To help you get started, we have a :ref:`tutorial<multi_tenant_tutorial>` which has instructions on setting up a Citus cluster with sample data in minutes.
-
-.. note::
-
-  Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
diff --git a/installation/single_machine_docker.rst b/installation/single_machine_docker.rst
@@ -87,7 +87,3 @@ When you wish to stop the docker containers, use Docker Compose:
 .. code-block:: bash
 
   COMPOSE_PROJECT_NAME=citus docker-compose down -v
-
-.. note::
-
-  Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
diff --git a/installation/single_machine_rhel.rst b/installation/single_machine_rhel.rst
@@ -86,7 +86,3 @@ To verify that the installation has succeeded we check that the coordinator node
 You should see a row for each worker node including the node name and port.
 
 At this step, you have completed the installation process and are ready to use your Citus cluster. To help you get started, we have a :ref:`tutorial<multi_tenant_tutorial>` which has instructions on setting up a Citus cluster with sample data in minutes.
-
-.. note::
-
-  Please note that Citus reports anonymous information about your cluster to the Citus Data company servers. To learn more about what information is collected and how to opt out of it, see :ref:`phone_home`.
diff --git a/reference/common_errors.rst b/reference/common_errors.rst
@@ -130,25 +130,10 @@ Cannot establish a new connection for placement *n*, since DML has been executed
   ERROR:  25001: cannot establish a new connection for placement 314, since DML has been executed on a connection that is in use
   LOCATION:  FindPlacementListConnection, placement_connection.c:612
 
-This is a current limitation. In a single transaction Citus does not support running insert/update statements with the :ref:`router_executor` that reference multiple shards, followed by a read query that consults both of the shards.
-
-.. note::
-
-  A similar error also occurs (misleadingly) when the :ref:`create_distributed_table` function is executed on a table by a role other than the table's owner. See this `github discussion <https://github.com/citusdata/citus/issues/2094>`_ for details. To resolve this particular problem, identify the table's owner, switch roles, and try again.
-
-  .. code-block:: sql
-
-    -- find the role
-    SELECT tablename, tableowner FROM pg_tables;
-    -- switch into it
-    SET ROLE table_owner_name;
-
-  Also note that ``table_owner_name`` must have LOGIN permissions on the worker nodes.
-
 Resolution
 ~~~~~~~~~~
 
-Consider moving the read query into a separate transaction.
+:ref:`Upgrade <upgrading>` to Citus 8.3 or higher.
 
 Could not connect to server: Cannot assign requested address
 ------------------------------------------------------------

diff --git a/use_cases/multi_tenant.rst b/use_cases/multi_tenant.rst
@@ -467,7 +467,7 @@ To improve resource allocation and make guarantees of tenant QoS it is worthwhil
 
 In our case, let's imagine that our old friend company id=5 is very large. We can isolate the data for this tenant in two steps. We'll present the commands here, and you can consult :ref:`tenant_isolation` to learn more about them.
 
-First sequester the tenant's data into a bundle (called a shard) suitable to move. The CASCADE option also applies this change to the rest of our tables distributed by :code:`company_id`.
+First isolate the tenant's data to a dedicated shard suitable to move. The CASCADE option also applies this change to the rest of our tables distributed by :code:`company_id`.
 
 .. code-block:: sql
 

diff --git a/use_cases/realtime_analytics.rst b/use_cases/realtime_analytics.rst
@@ -298,16 +298,16 @@ to the query in our rollup function:
   @@ -1,10 +1,12 @@
     INSERT INTO http_request_1min (
       site_id, ingest_time, request_count,
-      success_count, error_count, average_response_time_msec,
-  +   distinct_ip_addresses
+      success_count, error_count, average_response_time_msec
+  +   , distinct_ip_addresses
     ) SELECT
       site_id,
       minute,
       COUNT(1) as request_count,
       SUM(CASE WHEN (status_code between 200 and 299) THEN 1 ELSE 0 END) as success_count,
       SUM(CASE WHEN (status_code between 200 and 299) THEN 0 ELSE 1 END) as error_count,
-      SUM(response_time_msec) / COUNT(1) AS average_response_time_msec,
-  +   hll_add_agg(hll_hash_text(ip_address)) AS distinct_ip_addresses
+      SUM(response_time_msec) / COUNT(1) AS average_response_time_msec
+  +   , hll_add_agg(hll_hash_text(ip_address)) AS distinct_ip_addresses
     FROM http_request
 
 Dashboard queries are a little more complicated, you have to read out the distinct
@@ -362,17 +362,17 @@ Next, include it in the rollups by modifying the rollup function:
   @@ -1,14 +1,19 @@
     INSERT INTO http_request_1min (
       site_id, ingest_time, request_count,
-      success_count, error_count, average_response_time_msec,
-  +   country_counters
+      success_count, error_count, average_response_time_msec
+  +   , country_counters
     ) SELECT
       site_id,
       minute,
       COUNT(1) as request_count,
       SUM(CASE WHEN (status_code between 200 and 299) THEN 1 ELSE 0 END) as success_c
       SUM(CASE WHEN (status_code between 200 and 299) THEN 0 ELSE 1 END) as error_cou
-      SUM(response_time_msec) / COUNT(1) AS average_response_time_msec,
+      SUM(response_time_msec) / COUNT(1) AS average_response_time_msec
   - FROM http_request
-  +   jsonb_object_agg(request_country, country_count) AS country_counters
+  +   , jsonb_object_agg(request_country, country_count) AS country_counters
   + FROM (
   +   SELECT *,
   +     count(1) OVER (

diff --git a/use_cases/timeseries.rst b/use_cases/timeseries.rst
@@ -29,7 +29,7 @@ Keep in mind that, in the wrong situation, reading all these partitions can hurt
 Scaling Timeseries Data on Citus
 --------------------------------
 
-We can mix the single-node table partitioning techniques with Citus' distributed sharding to make a scalable time-series database. It's the best of both worlds. It's especially elegant atop the declarative table partitioning in Postgres 10.
+We can mix the single-node table partitioning techniques with Citus' distributed sharding to make a scalable time-series database. It's the best of both worlds. It's especially elegant atop Postgres's declarative table partitioning.
 
 .. image:: ../images/timeseries-sharding-and-partitioning.png
 
@@ -131,7 +131,7 @@ As time progresses, pg_partman will need to do some maintenance to create new pa
   -- due to aggressive locks
   SELECT partman.run_maintenance(p_analyze := false);
 
-It's best to set up a periodic job to run the maintenance function. Pg_partman can be built with support for a background worker (BGW) process to do this. Or we can use another extension like `pg_cron <https://github.com/citusdata/pg_cron>`_:
+It's best to set up a periodic job to run the maintenance function. Pg_partman can be built with support for a background worker process to do this. Or we can use another extension like `pg_cron <https://github.com/citusdata/pg_cron>`_:
 
 .. code-block:: postgresql
 
@@ -154,4 +154,4 @@ Now whenever maintenance runs, partitions older than a month are automatically d
 
 .. note::
 
-  Be aware that native partitioning in Postgres is still quite new and has a few quirks. For example, you cannot directly create an in index on a partitioned table. Instead, pg_partman lets you create a template table to define indexes for new partitions. Maintenance operations on partitioned tables will also acquire aggressive locks that can briefly stall queries. There is currently a lot of work going on within the postgres community to resolve these issues, so expect time partitioning in Postgres to only get better.
+  Be aware that native partitioning in Postgres is still quite new and has a few quirks. For example, you cannot directly create an index on a partitioned table. Instead, pg_partman lets you create a template table to define indexes for new partitions. Maintenance operations on partitioned tables will also acquire aggressive locks that can briefly stall queries. There is currently a lot of work going on within the postgres community to resolve these issues, so expect time partitioning in Postgres to only get better.