Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs, *: refactor planning vs exec APIs #37691

Closed
dt opened this issue May 21, 2019 · 3 comments
Closed

jobs, *: refactor planning vs exec APIs #37691

dt opened this issue May 21, 2019 · 3 comments
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@dt
Copy link
Member

dt commented May 21, 2019

The hook plans generally get a limited version of planner (planHookState) some parts of which use the planner's txn.

They use this planner as part of the planning of the job they ultimately start, e.g. resolving the table name or or user privileges, etc that go in to determining if, how and on what the job will run.

Unfortunately, planner usually assumes that it is used for the entirety of a normal SQL statement, so it's embedded txn is usually not committed until after the plan has run. This however means that when the hook plans start and run a job based on calls to this planner, they are running a job -- with side effects -- based on reads made with an uncommitted txn.

However some jobs support being planned in a transaction, and then run after it commits, e.g. using DETACHED. In these cases, it is vital the limit all their planning to use the planner txn to avoid leaking side-effects.

Thus, all job planning statements (BACKUP, IMPORT, RESTORE, etc) should be audited to ensure they only use the planhook (planner) txn.

Then, for those jobs which wish to execute during the statement execution to return results (i.e. non-detached), they should a) assert the planner txn is implicit and then b) commit it inline ala 1PC INSERTs, before running the job.

Separately, job execution currently gets the same PlanHookState interface however, as this closes over a planner, it also closes over a planner transaction field, but as this is outside a statement execution, that field is nil. This has lead to crashes when method that reference that field are called, often indirectly via very non-obvious call-trees.

Instead, job execution (i.e. Resume()) should be passed a different interface entirely from planning, which is not a planner at all and has potentially nil txn field.

@awoods187 awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jun 21, 2019
@dt
Copy link
Member Author

dt commented Oct 14, 2019

part of the problem here is that the conn executor has special code to execute "async" schema changes before sending the reply, making them appear synchronous:
https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/conn_executor.go#L1996

This doesn't exist for jobs, so there is no way to a) commit a job record and then b) wait for its results currently -- the job can't run until its creation commits, but that doesn't currently have any way to block on its execution. Current jobs commit, run and block all during the statement creation them, but that is a gross mis-use of the planner txn -- which won't commit until later -- to plan the job.

spaskob pushed a commit to spaskob/cockroach that referenced this issue Dec 6, 2019
It's useful to be able to create jobs in transactions. Obviously
they would need to execute after the transaction commits. We add
a new function ScheduleJob on the planner that allows that. The
main use case is converting schema changes to jobs. We sweep under
the rug the complicated semantics of handling any failures of these
jobs. This will be addressed in follow up PR(s). This PR handles the
plumbing and this function is not for production use yet.

Touches cockroachdb#37691, cockroachdb#42061.

Release note: none.
spaskob pushed a commit to spaskob/cockroach that referenced this issue Dec 12, 2019
It's useful to be able to create jobs in transactions. Obviously
they would need to execute after the transaction commits. We add
a new function QueueJob on the extendedEvalContext to do that. The
main use case is converting schema changes to jobs. We sweep under
the rug the complicated semantics of handling any failures of these
jobs. This will be addressed in follow up PR(s). This PR handles the
plumbing and this function is not for production use yet.

Touches cockroachdb#37691, cockroachdb#42061.

Release note: none.
spaskob pushed a commit to spaskob/cockroach that referenced this issue Dec 15, 2019
It's useful to be able to create jobs in transactions. Obviously
they would need to execute after the transaction commits. We add
a new function QueueJob on the extendedEvalContext to do that. The
main use case is converting schema changes to jobs. We sweep under
the rug the complicated semantics of handling any failures of these
jobs. This will be addressed in follow up PR(s). This PR handles the
plumbing and this function is not for production use yet.

Touches cockroachdb#37691, cockroachdb#42061.

Release note: none.
spaskob pushed a commit to spaskob/cockroach that referenced this issue Dec 17, 2019
It's useful to be able to create jobs in transactions. Obviously they
would need to execute after the transaction commits. We add a new
function QueueJob on the extendedEvalContext to do that. The main use
case is converting schema changes to jobs. We sweep under the rug the
complicated semantics of handling any failures of these jobs. This will
be addressed in follow up PR(s). This PR handles the plumbing and this
function is not for production use yet.

Touches cockroachdb#37691, cockroachdb#42061.

Release note: none.
craig bot pushed a commit that referenced this issue Dec 18, 2019
42689: jobs: schedule a job in a transaction to execute after commit r=spaskob a=spaskob

jobs: schedule a job in a transaction to execute after commit

It's useful to be able to create jobs in transactions. Obviously
they would need to execute after the transaction commits. We add
a new function QueueJob on the planner that allows that. The
main use case is converting schema changes to jobs. We sweep under
the rug the complicated semantics of handling any failures of these
jobs. This will be addressed in follow up PR(s). This PR handles the
plumbing and this function is not for production use yet.

Touches #37691, #42061.

Release note: none.

43136: storage: remove getQuorumIndex from computeTruncateDecision r=bdarnell a=irfansharif

Some lineage here: `getQuorumIndex` was initially `getBehindIndex`, and was changed to getQuorumIndex to incorporate pending snapshots (561398f).  Previous to this we started off computeTruncateDecision using raft.Status.Commit. As of (b64bdc9), we refactored log truncation decision making and pulled out pending snapshot logic back out of getQuorumIndex. `getQuorumIndex` was then far too similar to raft.Status.Commit but not quite. Given that we already use raft.Status.Progress in `computeTruncateDecision` to track lagging, yet recently active followers, we can get rid of `getQuorumIndex` altogether to simplify things.



Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@spaskob
Copy link
Contributor

spaskob commented Feb 6, 2020

@ajwerner pinging you on this as it may be related to #44739

@spaskob spaskob assigned dt and unassigned spaskob Mar 3, 2020
@dt dt removed their assignment Aug 20, 2020
@dt dt changed the title backupccl,importccl: audit and fix txn use during planning jobs, *: refactor planning vs exec APIs Sep 17, 2020
@dt dt closed this as completed Nov 18, 2020
@dt
Copy link
Member Author

dt commented Nov 18, 2020

#55506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

3 participants