Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate DDL commands to workers through 2PC #513

Closed
lithp opened this issue May 10, 2016 · 8 comments
Closed

Propagate DDL commands to workers through 2PC #513

lithp opened this issue May 10, 2016 · 8 comments
Assignees
Milestone

Comments

@lithp
Copy link
Contributor

lithp commented May 10, 2016

(update to issue to track DDL propagation through 2PC here)

Citus 5.0 propagates Alter Table and Create Index commands to worker nodes. We implemented this feature using Citus' current replication model. We also decided to switch to using 2PC (or pg_paxos) once the metadata propagation changes were implemented.

One drawback to the current approach is when ALTER TABLE ... SET NOT NULL fails it marks shards inactive:

postgres=# ALTER TABLE ddl_testing ALTER COLUMN address SET NOT NULL;
WARNING:  could not receive query results from [snip]
DETAIL:  Client error: column "address" contains null values
WARNING:  could not apply command on shard 102025 on [snip]
DETAIL:  Shard placement will be marked as inactive.
ALTER TABLE

If the problem was caused by the user and not by a failed node we probably shouldn't mark the placements inactive. We should instead somehow error out like we do when you try to create a duplicate index.

@lithp lithp added the bug label May 10, 2016
@ozgune
Copy link
Contributor

ozgune commented May 10, 2016

@lithp -- we're planning on improving DDL propagation logic to make sure that we don't have partial failures.

Do you think #25 or #265 captures the issue mentioned here?

@lithp
Copy link
Contributor Author

lithp commented May 11, 2016

Yep, #25, or better DDL propogation, relates to this issue. Added a comment over there.

@ozgune ozgune changed the title When ALTER TABLE ... SET NOT NULL fails it marks shards inactive Propagate DDL commands to workers through 2PC Jun 1, 2016
@ozgune ozgune added this to the 5.2 Release milestone Jun 1, 2016
@ozgune
Copy link
Contributor

ozgune commented Jun 1, 2016

This issue is also related to #19. We're prioritizing this issue assuming that we could use a shared infrastructure for #19 and this one (#513).

@marcocitus
Copy link
Member

marcocitus commented Jun 1, 2016

I think the existing 2PC infrastructure for COPY and master_modify_multiple_shards (see multi_transaction.c) is suitable for this. Metadata propagation for masterless will also reuse some of that infrastructure.

@metdos
Copy link
Contributor

metdos commented Jun 2, 2016

Also related to inactive shards problem here #480.

@ozgune
Copy link
Contributor

ozgune commented Jun 9, 2016

I'm copy/pasting an internal email thread as additional context to this issue.

Marco: Speculation: A DDL command might have been blocked on something and then failed when the node crashed, which currently causes it to mark the placement as inactive, since it's no longer in sync with the other shards and an operator needs to step in to apply the command manually. In 5.2, we'll have DDL via 2PC, which partially that problem.

Lukas: You are right - I was indeed creating an index around that time. It might have been a DDL command that was running, not (just) COPY.

I'm not sure on logs, since the server was replaced after the crash, and we don't store historic per-node logs elsewhere right now (afaik, Daniel can confirm).

@aamederen
Copy link
Contributor

aamederen commented Jun 17, 2016

After discussions with @marcocitus and @sumedhpathak, we have decided our approach to this issue.

The resolution of this issue could in the future also address other DDL related issues such as #480, #131, #356, #357, #265 and #192.

In the solution, we will focus on 2PC while also handling #480. Besides,

aamederen added a commit that referenced this issue Jun 22, 2016
Fixes #513
Fixes #480

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state.

The workflow of the successful case is this:
1. Open individual connections to all shard placements
2. Send `BEGIN; SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.
aamederen added a commit that referenced this issue Jun 29, 2016
Fixes #513
Fixes #480

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.
aamederen added a commit that referenced this issue Jul 13, 2016
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
@marcocitus
Copy link
Member

marcocitus commented Jul 15, 2016

We are making 2 changes to the way DDL commands are used.

The first is that we will prevent DDL in a transaction block, since that would likely lead to various issues in combination with other commands. Currently we allow DDL Commands in transaction blocks, though actually doing so would be very dangerous. It is probably a 1 week task to properly support multi-statement DDL transactions.

The second is that it will error out if max_prepared_transactions is not set on the workers, since the DDL propagation always uses 2PC. We initially considered reusing multi_shard_commit_protocol which defaults to 1pc, but recovery from commit failures is difficult in that case. In MX we have a function to recover from 2PC failures, which would take 2-3 days to integrate.

This means you cannot CREATE INDEX on a distributed table without setting max_prepared_transactions on the workers to >0

Any strong objections to these changes?

aamederen added a commit that referenced this issue Jul 19, 2016
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.

DDL commands are not allowed inside other transaction blocks or functions.

DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
DimCitus pushed a commit that referenced this issue Jan 10, 2018
Fixes #513

This change modifies the DDL Propagation logic so that DDL queries
are propagated via 2-Phase Commit protocol. This way, failures during
the execution of distributed DDL commands will not leave the table in
an intermediate state and the pending prepared transactions can be
commited manually.

DDL commands are not allowed inside other transaction blocks or functions.

DDL commands are performed with 2PC regardless of the value of
`citus.multi_shard_commit_protocol` parameter.

The workflow of the successful case is this:
1. Open individual connections to all shard placements and send `BEGIN`
2. Send `SELECT worker_apply_shard_ddl_command(<shardId>, <DDL Command>)`
to all connections, one by one, in a serial manner.
3. Send `PREPARE TRANSCATION <transaction_id>` to all connections.
4. Sedn `COMMIT` to all connections.

Failure cases:
- If a worker problem occurs before sending of all DDL commands is finished, then
all changes are rolled back.
- If a worker problem occurs after all DDL commands are sent but not after
`PREPARE TRANSACTION` commands are finished, then all changes are rolled back.
However, if a worker node is failed, then the prepared transactions in that worker
should be rolled back manually.
- If a worker problem occurs during `COMMIT PREPARED` statements are being sent,
then the prepared transactions on the failed workers should be commited manually.
- If master fails before the first 'PREPARE TRANSACTION' is sent, then nothing is
changed on workers.
- If master fails during `PREPARE TRANSACTION` commands are being sent, then the
prepared transactions on workers should be rolled back manually.
- If master fails during `COMMIT PREPARED` or `ROLLBACK PREPARED` commands are being
sent, then the remaining prepared transactions on the workers should be handled manually.

This change also helps with #480, since failed DDL changes no longer mark
failed placements as inactive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants