Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: audit all usages of QueryEx to use iterator pattern #60693

Merged
merged 1 commit into from
Feb 26, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Feb 17, 2021

This commit audits the usage of QueryEx method of the internal
executor in the following manner:

  • if the caller only needed to execute the statement, ExecEx is now
    used
  • if the query can return at most one row, then QueryRowEx is now used
  • if the caller can be refactored to use the iterator pattern, it is
    done so.

As a result, almost all usages have been refactored (most notably the
virtual crdb_internal.jobs table now uses the iterator pattern, thus
aleviating OOM concerns - the ad-hoc memory accounting logic has been
removed).

QueryEx has been renamed to QueryBufferedEx to highlight that the
full buffering occurs, and it was added to sqlutil.InternalExecutor
interface. The method is now used only in two places.

Addresses: #48595.

Release justification: bug fix.

Release note (bug fix): crdb_internal.jobs virtual table is now
populated in a paginated fashion, thus, alleviating memory related
concerns when previously we could encounter OOM crash.

@yuzefovich yuzefovich requested review from spaskob, ajwerner and a team February 17, 2021 18:58
@yuzefovich yuzefovich requested a review from a team as a code owner February 17, 2021 18:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @spaskob)


pkg/server/diagnostics/reporter.go, line 211 at r1 (raw file):

		}
		if err != nil {
			// TODO(yuzefovich): do we wanna clear the map here?

@andy-kimball do you think we should/need clear the map here and below?

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobs lgtm

Reviewed 5 of 19 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spaskob)

@yuzefovich
Copy link
Member Author

@knz could you please review changes in server/admin.go?

@andy-kimball could you please review changes in server/diagnostics/reporter.go ?

@RichardJCai could you please review changes in sql/user.go?

@adityamaru could you please review changes in filetable/file_table_read_writer.go?

@RichardJCai RichardJCai self-requested a review February 23, 2021 16:30
Copy link
Contributor

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sql/user.go lgtm

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 19 files at r1, 7 of 7 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spaskob and @yuzefovich)


pkg/server/admin.go, line 255 at r2 (raw file):

	defer func() {
		closeErr := it.Close()
		if retErr == nil && closeErr != nil {

It's a problem that the closeErr cannot be joined inside the serverError

What about this:

  1. extract the body of this function into a helper function eg internalDatabases(), that returns a plain error without calling serverError every time.

  2. inside internalDatabases(), do defer func() { retErr = errors.CombineErrors(retErr, it.Close()) }()

  3. in Databases() here call internalDatabases() then if err != nil { return s.serverError(err) }


pkg/server/admin.go, line 1338 at r2 (raw file):

	defer func() {
		closeErr := it.Close()
		if retErr == nil && closeErr != nil {

ditto above


pkg/server/admin.go, line 2030 at r2 (raw file):

	// We have to make sure to close the iterator since we might return from the
	// for loop early (before Next() returns false).
	defer func(it sqlutil.InternalRows) {

ditto above


pkg/server/admin.go, line 2163 at r2 (raw file):

	// We have to make sure to close the iterator since we might return from the
	// for loop early (before Next() returns false).
	defer func(it sqlutil.InternalRows) {

ditto above


pkg/sql/crdb_internal.go, line 628 at r2 (raw file):

		cleanup := func() {
			if err := it.Close(); err != nil {
				log.Infof(ctx, "error closing an iterator: %v", err)

I don't think this is log.Infof, I think this deserves

return errors.AssertionFailedf(...)

(So you'll need to make cleanup return an error and have the caller check it)


pkg/sql/user.go, line 151 at r2 (raw file):

		// the for loop early (before Next() returns false).
		defer func() {
			closeErr := it.Close()

retErr = errors.CombineErrors(retErr, it.Close() no need for a conditional


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 672 at r2 (raw file):

		defer func() {
			if err := it.Close(); err != nil {
				log.Warningf(ctx, "failed to close %+v", err)

that sounds like an assertion error, not a warning


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 732 at r2 (raw file):

			defer func() {
				if err := it.Close(); err != nil {
					log.Warningf(ctx, "failed to close %+v", err)

ditto

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @spaskob)


pkg/server/admin.go, line 255 at r2 (raw file):

Previously, knz (kena) wrote…

It's a problem that the closeErr cannot be joined inside the serverError

What about this:

  1. extract the body of this function into a helper function eg internalDatabases(), that returns a plain error without calling serverError every time.

  2. inside internalDatabases(), do defer func() { retErr = errors.CombineErrors(retErr, it.Close()) }()

  3. in Databases() here call internalDatabases() then if err != nil { return s.serverError(err) }

Good point. I addressed without introducing a separate function by adding a deferred function call to the very beginning of Databases function.

We actually might want to adopt this approach in other places because AFAIU all errors returned by RPC implementation function must be serverErrors, and in Databases that wasn't the case for an error occurring on userFromContext. I also adopted the approach in several other functions refactored by my previous commits.


pkg/server/admin.go, line 1338 at r2 (raw file):

Previously, knz (kena) wrote…

ditto above

Done.


pkg/server/admin.go, line 2030 at r2 (raw file):

Previously, knz (kena) wrote…

ditto above

Done.


pkg/server/admin.go, line 2163 at r2 (raw file):

Previously, knz (kena) wrote…

ditto above

Done.


pkg/sql/crdb_internal.go, line 628 at r2 (raw file):

Previously, knz (kena) wrote…

I don't think this is log.Infof, I think this deserves

return errors.AssertionFailedf(...)

(So you'll need to make cleanup return an error and have the caller check it)

I upgraded it to Warningf, but I don't think it is that easy of a change: the cleanup function ends up being called by virtualTableNode.Close which doesn't have an error return parameter, and changing the signature of planNode.Close seems like too much.


pkg/sql/user.go, line 151 at r2 (raw file):

Previously, knz (kena) wrote…

retErr = errors.CombineErrors(retErr, it.Close() no need for a conditional

Done.


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 672 at r2 (raw file):

Previously, knz (kena) wrote…

that sounds like an assertion error, not a warning

I followed the same pattern as in case *SQLConnFileToTableExecutor below, so I think it is reasonable.


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 732 at r2 (raw file):

Previously, knz (kena) wrote…

ditto

Same pattern as below.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admin.go changes ok

Reviewed 1 of 19 files at r1, 5 of 5 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @spaskob, and @yuzefovich)


pkg/sql/crdb_internal.go, line 628 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I upgraded it to Warningf, but I don't think it is that easy of a change: the cleanup function ends up being called by virtualTableNode.Close which doesn't have an error return parameter, and changing the signature of planNode.Close seems like too much.

Can you use a panic with an error object and then catch it somewhere higher?


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 672 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I followed the same pattern as in case *SQLConnFileToTableExecutor below, so I think it is reasonable.

This looks like a misdesign. Can you double check with @dt or @adityamaru and file a followup issue for the bulk i/o to look into? Thanks


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 732 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Same pattern as below.

see above


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 689 at r3 (raw file):

		defer func() {
			if err := metaRows.sqlConnExecResults.Close(); err != nil {
				log.Warningf(ctx, "failed to close %+v", err)

@dt @adityamaru IMHO this should not have been a warning.

Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @spaskob, and @yuzefovich)


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 672 at r2 (raw file):

Previously, knz (kena) wrote…

This looks like a misdesign. Can you double check with @dt or @adityamaru and file a followup issue for the bulk i/o to look into? Thanks

Yup you can file an issue and assign it to me, thanks!

file_table_read_writer.go LGTM, thanks for doing this!

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm only waiting for @andy-kimball to take a look at server/diagnostics/reporter.go changes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @knz, and @spaskob)


pkg/sql/crdb_internal.go, line 628 at r2 (raw file):

Previously, knz (kena) wrote…

Can you use a panic with an error object and then catch it somewhere higher?

That's an interesting idea. We could use colexecerror.InternalError to propagate it through the vectorized engine and would catch it automatically (if we were to simply use panic, it wouldn't be caught by the vectorized catcher because the panic would come from pkg/sql/.* codepath which is (at least currently) not being caught from). However, if the query is executed via the row-by-row engine, we would have to introduce a catcher somewhere.

I don't think that the severity of not propagating the error properly is high because at the point when this cleanup function is called, virtualTableNode is being Closed which means that either

  1. we were able to iterate over all of the results returned by the query without an error, and an error occurs on iterator.Close. This is pretty unlikely to occur if the iterator was exhausted.
  2. we did hit an error during the iteration in which case the error returned by Close is going to be exactly the same error returned during the iteration which was already propagated.

Now that I have typed it out, I think I should fix the internal executor iterator to return only "new" errors on Close to address case 2. above.

I also left a TODO to fix the error propagation (at the same time as #61123).


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 672 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

Yup you can file an issue and assign it to me, thanks!

Filed #61141


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 732 at r2 (raw file):

Previously, knz (kena) wrote…

see above

Done.


pkg/storage/cloudimpl/filetable/file_table_read_writer.go, line 689 at r3 (raw file):

Previously, knz (kena) wrote…

@dt @adityamaru IMHO this should not have been a warning.

Done.

This commit audits the usage of `QueryEx` method of the internal
executor in the following manner:
- if the caller only needed to execute the statement, `ExecEx` is now
used
- if the query can return at most one row, then `QueryRowEx` is now used
- if the caller can be refactored to use the iterator pattern, it is
done so.

As a result, almost all usages have been refactored (most notably the
virtual `crdb_internal.jobs` table now uses the iterator pattern, thus
aleviating OOM concerns - the ad-hoc memory accounting logic has been
removed).

`QueryEx` has been renamed to `QueryBufferedEx` to highlight that the
full buffering occurs, and it was added to `sqlutil.InternalExecutor`
interface. The method is now used only in three places.

Release justification: bug fix.

Release note (bug fix): `crdb_internal.jobs` virtual table is now
populated in a paginated fashion, thus, alleviating memory related
concerns when previously we could encounter OOM crash.
@yuzefovich
Copy link
Member Author

Thanks everyone for the feedback!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 26, 2021

Build succeeded:

@craig craig bot merged commit 11d37f5 into cockroachdb:master Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants