Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure Truncate on distributed tables work as expected #86

Closed
Jhanbanan opened this issue Feb 2, 2016 · 22 comments
Closed

Make sure Truncate on distributed tables work as expected #86

Jhanbanan opened this issue Feb 2, 2016 · 22 comments

Comments

@Jhanbanan
Copy link
Contributor

Jhanbanan commented Feb 2, 2016

Make sure Truncate on distributed tables work as expected. Customer report:

postgres=# select count(*) from Merchandise; 
count 
--------
305284
(1 row)

postgres=# truncate table Merchandise; 
TRUNCATE TABLE
postgres=# select count(*) from Merchandise;
count
--------
305284
(1 row)
@ozgune ozgune changed the title Make sure Truncate on distributed tables work as expected. Customer report: postgres=# select count(*) from Merchandise; count -------- 305284 (1 row) postgres=# truncate table Merchandise; TRUNCATE TABLE postgres=# select count(*) from Merchandise; count -------- 305284 (1 row) Make sure Truncate on distributed tables work as expected Feb 10, 2016
@ozgune
Copy link
Contributor

ozgune commented Feb 10, 2016

We recently observed this issue again when interacting with a user (+1).

@marcocitus also shared a quick script to run TRUNCATE in parallel:

$ psql -tA -F" " -c "SELECT nodename, nodeport, logicalrelid::regclass||'_'||shardid FROM pg_dist_shard JOIN pg_dist_shard_placement USING (shardid) WHERE logicalrelid = 'customer_reviews'::regclass" | xargs -n 3 -P64 sh -c "psql -h $0 -p $1 -c "TRUNCATE $2""

@mtuncer
Copy link
Member

mtuncer commented Mar 2, 2016

Please make sure truncate table is also handled for distributed cstore_fdw tables.

@ozgune
Copy link
Contributor

ozgune commented Mar 4, 2016

A good part of the confusion with TRUNCATE is that we let the command run without any errors, but don't execute anything behind the covers.

@samay-sharma, how hard is it to plug in TRUNCATE to your DDL propagation changes? If it's hard, we could at least emit an error message on this command.

@samay-sharma
Copy link
Contributor

I don't think it should be difficult to plug in TRUNCATE and get it propagated. The process is:

  • capturing the TRUNCATE hook
  • getting the table from the parsed statement
  • checking if the table is distributed
  • calling ExecuteDistributedDDLCommand with the command.

This is easy because we just propagate the text commands which come in.

One thing I'd like to note is that there are issues with propagating the text commands as it is which are discussed in #356 . So, the future of DDL propagation should be by doing DDL deparsing to generate the statement from the parse tree before propagating it. The more commands we propagate, the more we'll need to write deparsing code for (if Postgres DDL deparsing changes are not enough).

@marcocitus
Copy link
Member

For append-partitioned tables, you probably want to drop all shards, which you could even do from a truncate trigger.

@anarazel
Copy link
Contributor

anarazel commented Mar 4, 2016 via email

@lithp
Copy link
Contributor

lithp commented Apr 26, 2016

This just came up again in engage@. If we aren't going to support it we should add an error message.

@ozgune
Copy link
Contributor

ozgune commented May 13, 2016

@lithp -- this fix probably won't make it to v5.2. Could you open an issue to add an error message for TRUNCATE and mention this issue in there for now?

@marcocitus
Copy link
Member

marcocitus commented May 27, 2016

Just making note:

@aamederen's delete function works pretty well:
SELECT master_modify_multiple_shards('DELETE FROM github_events');

I think it would be a ~1 day task to add proper TRUNCATE support to master_modify_multiple_shards. Additionally, we could automatically add a TRUNCATE trigger to distributed tables that calls master_modify_multiple_shards function for hash/range-partitioned tables and master_drop_all_shards for append-partitioned tables.

@ozgune
Copy link
Contributor

ozgune commented Jul 7, 2016

+1 from prospective user. This issue came up again in a conversation today.

The user ran TRUNCATE and it just worked. They then loaded in more data and were confused by the results. They said that finding a workaround was pretty simple once they understood what was going on.

@ozgune
Copy link
Contributor

ozgune commented Jul 22, 2016

I'm noting #209 as a related issue.

@ozgune
Copy link
Contributor

ozgune commented Jul 25, 2016

I'm reading @marcocitus' comment about providing a quick fix using master_modify_multiple_shards.

Based on that, I'm marking this issue as ready for v5.3, rather than quickly erroring out as described in #540.

@ozgune ozgune added this to the 5.3 Release milestone Jul 25, 2016
@mtuncer mtuncer self-assigned this Aug 2, 2016
@mtuncer
Copy link
Member

mtuncer commented Aug 8, 2016

Here is the latest status on this issue.

We used truncate trigger on a relation to detect relation being truncated. I implemented a trigger
citus_truncate_trigger() to get fired upon table truncating.

Our interpretation of truncate differs on distribution method. We treat truncate as drop all shards when we have append distributed tables. The UDF master_drop_all_shards is called to drop the shards. When truncate is issued on hash/range distributed tables, we use master_modify_multiple_shards to truncate all shards.

However, this method has a down side, postgresql does not support truncate on foreign tables, hence no truncate trigger.

We have two (may be three) options here
1 - notice/error for truncate if it is issued on a distributed foreign table (i.e. cstore_fdw) to let caller know about it is not supported.

2 - implement truncate through process utility to support foreign tables

3 - do nothing, and let it silently die. Any processing is done by the foreign server process utility hook.

I think (3) is definitely out of question, perhaps we could do (1) .

@lithp
Copy link
Contributor

lithp commented Aug 10, 2016

I think if Postgresql doesn't even support TRUNCATE on foreign tables we won't get in any trouble for not supporting it either.

@mtuncer
Copy link
Member

mtuncer commented Aug 10, 2016

what about distributed cstore_fdw tables ?

@marcocitus
Copy link
Member

Since cstore_fdw is itself an exception, would it be worth making an exception for distributed cstore_fdw tables and handle those via the utility hook and others via a trigger?

@anarazel
Copy link
Contributor

On 2016-08-10 02:26:53 -0700, Marco Slot wrote:

Since cstore_fdw is itself an exception, would it be worth making an exception for distributed cstore_fdw tables and handle those via the utility hook and others via a trigger?

Is anybody really using distributed cstore tables, without worker side partitioning?

@mtuncer
Copy link
Member

mtuncer commented Aug 16, 2016

We mention using cstore_fdw to prospective users from time to time. We don't know for sure.

@anarazel
Copy link
Contributor

On 2016-08-16 09:41:43 -0700, Murat Tuncer wrote:

We mention using cstore_fdw to prospective users from time to time. We don't know for sure.

But isn't that usually cstore being used on the worker in a partitioned
manner anyway?

@mtuncer
Copy link
Member

mtuncer commented Aug 22, 2016

This is close to final state
1 - truncate on distributed regular tables will be supported using truncate handler
2 - truncate on distributed foreign tables will NOT be supported. Command will fail with error.

@marcocitus, @anarazel Any objections ?

@marcocitus
Copy link
Member

@mtuncer fine with that

@ozgune
Copy link
Contributor

ozgune commented Sep 16, 2016

@jasonmp85 / @mtuncer -- This issue recently came up. How long do you think merging #721 will take?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants