Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: disable DistSQL in server's InternalExecutor until migrations complete #51053

Closed
wants to merge 1 commit into from

Conversation

asubiotto
Copy link
Contributor

DistSQL is fragile on node startup (see #44101) and is therefore disabled in
the migration manager's internal executor. However, migrations sometimes have
side effects and sometimes end up using the SQL server's internal executor
which had DistSQL enabled.

This commit disabled DistSQL for this internal executor until the migrations
complete to avoid flakiness.

Release note: None (testing flake fix)

Fixes #50000

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -610,6 +610,23 @@ func (s *sqlServer) start(
// present situation).
DistSQLMode: sessiondata.DistSQLOff,
})
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'd get this of the migrationsExecutor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this comment.

// The server's internalExecutor might be used as a side effect of
// migrations, so disable distribution for the same reason as above until
// all migrations have been run.
sessionDataCopy, sessionDataWasNil := s.internalExecutor.GetSessionData()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the results of GetSessionData() vary at all? If not, let's just assert that they are what they should be and don't bother keeping track of the old settings. In fact, I wouldn't even call GetSessionData().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so since this is all in the startup path. Removed GetSessionData.

//
// SetSessionData cannot be called concurently with query execution.
func (ie *InternalExecutor) SetSessionData(sessionData *sessiondata.SessionData) {
ie.s.populateMinimalSessionData(sessionData)
if sessionData != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this guy should accept a nil. I hope this becomes moot if it's true that GetSessionData() always returns nil in Server.Start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@asubiotto
Copy link
Contributor Author

Look's like the internal executor is already in use at this point (testrace failure). I added a mutex around sessionData which I think is ok because it's not a hot path.

Something I'm confused about is how exactly an internal executor uses DistSQL? It doesn't look like anywhere sets an explicit sessiondata with distsql enabled. An empty sessiondata is equivalent to having distsql turned off right?

… complete

DistSQL is fragile on node startup (see cockroachdb#44101) and is therefore disabled in
the migration manager's internal executor. However, migrations sometimes have
side effects and sometimes end up using the SQL server's internal executor
which had DistSQL enabled.

This commit disabled DistSQL for this internal executor until the migrations
complete to avoid flakiness.

Release note: None (testing flake fix)
@asubiotto
Copy link
Contributor Author

Planning to add a test to this.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm confused about is how exactly an internal executor uses DistSQL? It doesn't look like anywhere sets an explicit sessiondata with distsql enabled. An empty sessiondata is equivalent to having distsql turned off right?

Good question. I've just tested it and the following results in off.

func TestXXX(t *testing.T) {
	defer leaktest.AfterTest(t)()
	defer log.Scope(t).Close(t)

	ctx := context.Background()
	params, _ := tests.CreateTestServerParams()
	s, _, _ := serverutils.StartServer(t, params)
	defer s.Stopper().Stop(ctx)

	ie := s.InternalExecutor().(*sql.InternalExecutor)
	row, err := ie.QueryRowEx(ctx, "test", nil, /* txn */
		sqlbase.InternalExecutorSessionDataOverride{User: security.RootUser},
		"SHOW DISTSQL")
	if err != nil {
		t.Fatal(err)
	}
	if len(row) != 1 {
		t.Fatalf("expected 1 col, got: %d", len(row))
	}
	r, ok := row[0].(*tree.DString)
	log.Infof(ctx, "!!! ok: %t. r: %s", ok, r)
}

I think before #47019 maybe the session vars for an internal executor were initialized from the GlobalsDefaults, but not now (and I don't think that was generally the case then either).

Let's figure this out. Can you detail what seems to happen in #50000 ?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @asubiotto)


pkg/server/server_sql.go, line 613 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I'm not sure I understand this comment.

You can just have migMgr use s.internalExecutor, can you not? There's no longer a point to migrationsExecutor, is there?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have said that uses of the internal executor from normal user sessions inherit the parent session's data through a SetSessionData() - which does get its original values from those GlobalDefaults, but the migrations start with empty data, so there should be nothing for their children to inherit I would think.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @asubiotto)

@asubiotto
Copy link
Contributor Author

Won't be able to get to this until tomorrow. In #50000, some migration calls GetAllRoles which uses the executor in the planner, which is the internalExecutor in this PR. This was the comment on the original issue: #47024 (comment)

@knz
Copy link
Contributor

knz commented Jul 21, 2020

Is there a way to move forward on this, viz failing roachtests.

@asubiotto
Copy link
Contributor Author

Checked the distsql version on a manual run forcing migrations and it seems that distsql is off so don't think this PR would change anything. Guess it's back to the drawing board. Recent runs of acceptance/version-upgrade with the no inbound stream connection don't seem to have artifacts (e.g. https://teamcity.cockroachdb.com/viewLog.html?buildId=2119855&tab=buildLog). @knz do you by any chance have a link to one that does? In the meantime will roachprod-stress with some more instrumentation.

@knz
Copy link
Contributor

knz commented Aug 6, 2020

@knz do you by any chance have a link to one that does?

not at this time no unfortunately. But i'll be on the lookout.

In the meantime will roachprod-stress with some more instrumentation.

Yes that's what I'd do too.

@asubiotto
Copy link
Contributor Author

Superseded by #52624

@asubiotto asubiotto closed this Aug 11, 2020
@asubiotto asubiotto deleted the avup branch August 11, 2020 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: acceptance/version-upgrade failed because distsql is unexpectedly used
4 participants