Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support RETURNING in modification commands #242

Closed
citus-github-bot opened this issue Feb 4, 2016 · 22 comments
Closed

Support RETURNING in modification commands #242

citus-github-bot opened this issue Feb 4, 2016 · 22 comments

Comments

@citus-github-bot
Copy link

citus-github-bot commented Feb 4, 2016

Issue by digi604
Thursday Jul 23, 2015 at 15:23 GMT
Originally opened as citusdata/pg_shard#128


Any query using the RETURNING clause produces the following error:

NotSupportedError: cannot perform distributed planning for the given query
DETAIL:  RETURNING clauses are not supported in distributed queries.

Can you add support for RETURNING clauses?

@citus-github-bot
Copy link
Author

Comment by jasonmp85
Monday Jul 27, 2015 at 19:56 GMT


Hey @digi604: What's your use case? We didn't implement RETURNING to keep the feature profile abbreviated during the initial release, but with sufficient demand we could change that. 😃

@citus-github-bot
Copy link
Author

Comment by rodo
Sunday Aug 09, 2015 at 17:05 GMT


In ORM (I know it's bad, but sometimes we have to work with people who love them) like Django the SQL query include RETURNING to return the object's id to the ORM, maybe @digi604 uses an ORM.
I vote for RETURNING support in pg_shard 👍

@citus-github-bot
Copy link
Author

Comment by digi604
Tuesday Aug 11, 2015 at 07:52 GMT


i use peewee (orm) in a rather complex worker queue.

@ozgune ozgune added this to the 5.1 Release milestone Mar 16, 2016
@ozgune ozgune added the planned label Mar 16, 2016
@ozgune
Copy link
Contributor

ozgune commented Mar 16, 2016

This feature came up again. If the query executed on a single shard, how hard would it be to implement? Would we need to keep the returned values from INSERT statements on the master node?

ORMs like to have their INSERT statements to have RETURNING expressions:

returning id -- returning null
FEELINGS_DB.run(Sequel.lit("insert into observation_hash(id, created_at, server_id, data)
  values(?,?,?,?)",
  (feeling_id+i).to_s(16).rjust(32,'0'),
  Time.now,
  (server_id+i).to_s(16).rjust(32,'0'),
  data
))
# vs
FEELINGS_DB[:observation_hash] << {
  id: (feeling_id+i).to_s(16).rjust(32,'0'),
  created_at: Time.now,
  server_id: (server_id+i).to_s(16).rjust(32,'0'),
  data: dat

@ozgune
Copy link
Contributor

ozgune commented Mar 18, 2016

I just chatted with @anarazel on this. We have two ways to implement this:

  1. Execute the RETURNING statement on the master node: @anarazel notes that this approach is much easier to integrate into the PostgreSQL machinery. Andres, could you note this approach's benefits?
  2. Push the RETURNING statement to the worker node. In this approach, we'll push down the query and store results returned from the worker node in a tuple store. In this approach, we could also cover UPDATE statements that touch a single shard.

I'm also noting #213 and #240 as semi-related issues. At a higher level, we could evaluate expressions on the master node and route these requests, or evaluate them on the workers. This decision may also relate to our replication model.

@jasonmp85
Copy link
Contributor

Hey @ozgune

I don't understand what executing this on the master even means. RETURNING is part of an UPDATE, INSERT, or DELETE command that returns a particular column. It would make the most sense to push this down.

Approach two is what I was going to suggest.

@anarazel
Copy link
Contributor

@jasonmp85 The point of not pushing RETURNING down, at the very least for INSERT, is that it a) allows for easy and correct evaluation of now(), nextval(), etc. b) doesn't prevent going to a pipelined execution model c) is actually easier to implement given how RETURNING works in postgres.

@jasonmp85
Copy link
Contributor

Right, but what about UPDATE count = count + 1 RETURNING count? Isn't stuff like that a pretty common pattern, too? Or DELETE FROM accounts WHERE id = 5 RETURNING balance? That is more straightforward if you push the clause down, no?

@anarazel
Copy link
Contributor

On 2016-03-17 21:07:23 -0700, Jason Petersen wrote:

Right, but what about UPDATE count = count + 1 RETURNING count?
Isn't stuff like that a pretty common pattern, too? Or DELETE FROM accounts WHERE id = 5 RETURNING balance? That is more
straightforward if you push the clause down, no?

True. But those are much less common. The most common reason to use
RETURNING is for ORMs after an INSERT, to get the generated key et al.

@samay-sharma
Copy link
Contributor

Ran into this again during a customer engagement. It was again because their ORM was issuing INSERT queries which had a RETURNING clause.

@brettallred
Copy link

+1 - We are running into the same issue while evaluating CitusDB

@brianbroderick
Copy link

Without returning the generated primary key, how do you know what to put in a foreign key without adding the "returning" clause? Yes, I could insert a record then select that record to get the pk value, but that seems inefficient to me.

@ozgune
Copy link
Contributor

ozgune commented Apr 22, 2016

@anarazel and I talked about this last week; and I'm summarizing our meeting notes. @anarazel -- if I have inaccuracies in this description, could you point them out?

We talked about the architectural implications of executing RETURNING on the master or pushing it to the worker node:

(1) If we implement pipelined write statements on the master node: In this model, the client speaking to the master node supports pipelined execution. We then split into two alternatives:

(1.a) The master node maintains a persistent queue. When the client sends INSERT statements, we persist them in this queue -- we could use logical replication queues for this. We then immediately ACK the INSERT to the client. We then flush this queue to the worker nodes as they fill up.

This pipelined execution model is incompatible with doing INSERT .. RETURNING on the worker nodes.

Two questions that relate to this model are: (i) What happens if the INSERT statement violates a constraint, such as a unique constraint, on the worker node? How do we error out? (ii) How do we handle UPDATE or UPSERT statements in this model?

(1.b) The master node queues up INSERTs and then pushes them out in batches to worker nodes. Once we receive replies from the worker node, we ACK to the client. In this model, INSERT .. RETURNING could be executed on the master or worker node.

(2) In the v5.0 replication model, we have shard replicas living on separate machines. If we push down INSERT .. RETURNING, we need to pay extra effort into pushing it down to only one of the replicas.

Also, if we wanted to implement now(), nextval(), things get even more complicated -- we can't run now() on different replicas.

We're currently evaluating changing Citus' replication model. In this updated model, we will have one primary replica and multiple secondary replicas. If we go with the new replication model, INSERT .. RETURNING could be executed on the master or (primary) worker node.

(3) It's easier to implement these changes on the master node given the way RETURNING is implemented in Postgres. The primary note here was that if we pushed INSERT .. RETURNING to worker nodes, we may need to duplicate code. Two follow-up questions were:

  • How much code duplication is there for simple INSERT .. RETURNING on worker node?
  • How much time would it take to do this on the master vs. worker node?

We also talked about -- if we wanted to later support CTEs in RETURNING clauses, this logic would be easier to execute on the master node. I'll also look to upload an architectural diagram that shows these in writing.

(4) We also briefly touched upon UPDATE and DELETE statements. For these statements that have RETURNING clauses, we'd need to push them down to worker nodes.

@sumedhpathak sumedhpathak removed this from the 5.1 Release milestone May 2, 2016
@sumedhpathak
Copy link
Contributor

Removed 5.1 milestone from this issue.

@anarazel
Copy link
Contributor

anarazel commented May 3, 2016

@digi604 @brianbroderick @brettallred @rodo Do you need INSERT RETURNING or also UPDATE / DELETE / INSERT ON CONFLICT ... RETURNING? We're looking at implementing the feature, and your answer would help determine the shape of it.

@brianbroderick
Copy link

yeah, we use upserts all the time so having the "on conflict" clause is important to me.

@rodo
Copy link

rodo commented May 8, 2016

Mainly INSERT RETURNING for primary key with serial.

@ozgune ozgune added this to the 5.2 Release milestone May 14, 2016
@francisco1844
Copy link

When is 5.2 Release planned? since this feature was moved to 5.2 Release.

@ozgune
Copy link
Contributor

ozgune commented Jun 1, 2016

@francisco1844 -- we intend to officially release v5.2 end of July.

We expect this feature to go into the master branch earlier than that; and we'll keep this issue updated.

@ozgune
Copy link
Contributor

ozgune commented Jun 1, 2016

Two additional quick notes.

(1) ActiveRecord / Ruby only supports simple INSERT statements. RETURNING "id" is automatically added to all INSERT calls.

Apart from custom-written CTEs, RETURNING is not used in UPDATE or DELETE statements. It also looks like you can disable this behavior using the "insert_returning: false" config option. In that case it looks like it'll run a SELECT currval(..) on the sequence after the INSERT (typically within the same transaction).

(2) Sequel just inserts the ON CONFLICT clause if needed in the middle of the query, so it behaves the same as insert and uses RETURNING by default.

@anarazel
Copy link
Contributor

anarazel commented Jun 3, 2016

@ozgune @marcocitus @sumedhpathak Any opinion on whether it's ok to send the RETURNING to all nodes? My current implementation does so, and just consumes (ignores) the results for RETURNING for all but the first. I tend to think that's ok.

@marcocitus
Copy link
Member

My current implementation does so, and just consumes (ignores) the results for RETURNING for all but the first.

I think that's fine.

anarazel added a commit that referenced this issue Jun 3, 2016
@lfittl lfittl mentioned this issue Jun 11, 2016
11 tasks
anarazel added a commit that referenced this issue Jun 22, 2016
anarazel added a commit that referenced this issue Jul 1, 2016
DimCitus pushed a commit that referenced this issue Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests