Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallel and volatile categories to PG functions #181

Closed
Algunenano opened this issue Oct 27, 2017 · 4 comments
Closed

Add parallel and volatile categories to PG functions #181

Algunenano opened this issue Oct 27, 2017 · 4 comments

Comments

@Algunenano
Copy link
Contributor

As CartoDB/cartodb-postgresql#300, with pg9.6+ there is the option to mark functions as PARALLEL (different levels) so the analyzer can decide to run the query in several processes simultaneously.
In a similar fashion, marking the VOLATILE category enables optimizations to avoid re-running queries (or some parts of a query).

Should maintain compatibility with previous PostgreSQL releases (at least for now).

@rafatower
Copy link
Contributor

Referencing CartoDB/cartodb-postgresql#300 to give a bit more of context.

cc'ing @andy-esch and @jgoizueta for awareness

@rafatower rafatower removed their assignment Nov 8, 2017
@andy-esch
Copy link
Contributor

Thanks @rafatower. From Paul's comment in CartoDB/cartodb-postgresql#300:

Almost everything can be marked parallel. If it doesn't read from (or write to) a table, it can be marked.

Some of the PL/Python functions internally use plpy.execute('select ....') to gather data from tables. For example, CDB_AreasOfInterestLocal relies on this here. Based on that comment, are they still eligible to be marked parallel?

@Algunenano
Copy link
Contributor Author

Algunenano commented Nov 8, 2017

Technically speaking, reading shouldn't be an issue as long as you are sure you are only reading. For example, in the PR (#183) I've marked for now a lot of python functions as unsafe because they execute an external subquery that we don't know anything about.
For example, you could pass CDB_AreasOfInterestGlobal something like this as the first parameter:

WITH t AS (
    UPDATE products SET price = price * 1.05
    RETURNING *
)
SELECT * FROM t;

This is not parallelizable (and not STABLE either). One possible solution is to also provide a signature that receives directly the data instead of the query; another one is to blindly trust the user and hope it sends a PARALLEL SAFE subquery.

@Algunenano
Copy link
Contributor Author

Deploying 0.6.1 to test in Staging. There was a couple of issues with the servers (https://github.com/CartoDB/cartodb-platform/issues/3737) but I've updated a single user and it seems to be working fine:

cartodb_staging_user_9824ba31-2cbc-4be5-b723-9bde90d5b883_db=# \df+ cdb_crankshaft.cdb_crankshaft_version
                                                                                               List of functions
     Schema     |          Name          | Result data type | Argument data types |  Type  | Volatility | Parallel |  Owner   | Security | Access privileges | Language |       Source code       | Description 
----------------+------------------------+------------------+---------------------+--------+------------+----------+----------+----------+-------------------+----------+-------------------------+-------------
 cdb_crankshaft | cdb_crankshaft_version | text             |                     | normal | immutable  | safe     | postgres | invoker  |                   | sql      |                        +| 
                |                        |                  |                     |        |            |          |          |          |                   |          |   SELECT '0.6.1'::text;+| 
                |                        |                  |                     |        |            |          |          |          |                   |          |                         | 
(1 row)

cartodb_staging_user_9824ba31-2cbc-4be5-b723-9bde90d5b883_db=# Select cdb_crankshaft.cdb_crankshaft_version();
 cdb_crankshaft_version 
------------------------
 0.6.1
(1 row

Once the issue is resolved I'll update all users in staging and, if no issues appear, deploy to production.

cc/ @rafatower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants