Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,externalconn: introduce CREATE EXTERNAL CONNECTION #84310

Merged
merged 2 commits into from
Jul 21, 2022

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Jul 12, 2022

systemschema: introduce system.external_connections table

This change introduces the system.external_connections table
and its associated migration. The system table will be responsible
for storing External Connection objects that represent resources
that reside outside of CockroachDB, for example, a bucket being backed
up into or a sink being written to by a changefeed.

Informs: #84225

Release note: None

sql,cli,spanconfigccl: test updates when adding system table

This is a purely mechanical commit to update all the tests that
need updating when introducing a new system table.

Release note: None

sql,externalconn: introduce CREATE EXTERNAL CONNECTION

This change introduces the CREATE EXTERNAL CONNECTION syntax
to CockroachDB. This statement can be used to create an External
Connection object that represents an external resource.

Majority of this change is introducing the required interfaces
to persist an External Connection object in the system.external_connections
table. We only register nodelocal as a supported External Connection
to allow for end to end testing of functionality. A user can now execute:

CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';

All the other providers will be registered as part of #84228.
Furthermore, none of the permission model outlined in the
RFC #84209 has been implemented, to minimize the scope of this change.

Release note (sql change): introduce CREATE EXTERNAL CONNECTION
syntax that can be used to create an External Connection representing
a resource that resides outside of CockroachDB. The only supported resource
at the moment is a nodelocal URI that can be represented as an External
Connection object using:

CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';

Fixes: #84225

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru changed the title External create sql,externalconn: introduce CREATE EXTERNAL CONNECTION Jul 12, 2022
@adityamaru adityamaru force-pushed the external-create branch 5 times, most recently from 6ab9782 to 066577b Compare July 14, 2022 17:39
@adityamaru adityamaru marked this pull request as ready for review July 14, 2022 17:42
@adityamaru adityamaru requested a review from a team as a code owner July 14, 2022 17:42
@adityamaru adityamaru requested a review from a team July 14, 2022 17:42
@adityamaru adityamaru requested a review from a team as a code owner July 14, 2022 17:42
@adityamaru adityamaru requested a review from a team July 14, 2022 17:42
@adityamaru adityamaru requested a review from a team as a code owner July 14, 2022 17:42
@adityamaru adityamaru requested a review from a team July 14, 2022 17:42
@adityamaru adityamaru requested review from a team as code owners July 14, 2022 17:42
@adityamaru adityamaru requested review from rhu713, miretskiy and benbardin and removed request for rhu713 and a team July 14, 2022 17:42
@adityamaru
Copy link
Contributor Author

Sorry for the large review but the first two commits are just the general motions of adding a system table. If you would prefer the third commit in a separate PR I'm happy to pull it out.

Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

pkg/ccl/cloudccl/externalconnccl/datadriven_test.go Outdated Show resolved Hide resolved
"net/url"

"github.com/cockroachdb/cockroach/pkg/cloud/externalconn/connectionpb"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a "mouthful"... maybe just proto dir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting renaming connectionpb? I just went with the nomenclature some other packages like spanconfig use.

pkg/cloud/externalconn/connection_record.go Outdated Show resolved Hide resolved
pkg/cloud/externalconn/connection_record.go Outdated Show resolved Hide resolved
if err != nil {
// If we see a UniqueViolation it means that we are attempting to create an
// External Connection with a `connection_name` that already exists.
if pgerror.GetPGCode(err) == pgcode.UniqueViolation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the resolution re uniqueness? Do we need to also associate
"version" (i.e. some sort of unique number/id) with the record so that if some conn is dropped and then recreated with the same name as something else.... Or it's v2 stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main concern was around the renaming of External Connections. However, @dt rightly pointed out that we should not allow this since it opens us up to having to rewrite all persisted references of this object.

As far as reusing a name goes: this should be tackled by the v2 work around not allowing a drop if someone is actively using the connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand these concerns.. However, just like you have e.g. column or table names, and those get converted to IDs during query execution, simply having a unique ID associated with each resource and then storing that ID with the job record allows you to handle "renames" -- those are drop/create, or name reuse after old one is dropped.

pkg/sql/parser/sql.y Show resolved Hide resolved
pkg/sql/parser/sql.y Outdated Show resolved Hide resolved
pkg/sql/parser/sql.y Outdated Show resolved Hide resolved

UNSPECIFIED = 0 [(gogoproto.enumvalue_customname) = "TypeUnspecified"];
STORAGE = 1 [(gogoproto.enumvalue_customname) = "TypeStorage"];
KMS = 2 [(gogoproto.enumvalue_customname) = "TypeKMS"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where/how is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KMS is not used anywhere so I've deleted it. Storage is returned by nodelocal. As such the type isn't used anywhere today, but it will be used by SHOW BACKUP EXTERNAL CONNECTIONS when we do add that in.

pkg/ccl/cloudccl/externalconnccl/datadriven_test.go Outdated Show resolved Hide resolved
@@ -56,16 +56,20 @@ func TestDescriptorRepairOrphanedDescriptors(t *testing.T) {
// DROP DATABASE db CASCADE;
//
// This, due to #51782, leads to the table remaining public but with no
// parent database (52).
// parent database.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xiang-Gu @postamar can I get you to check this please 😃

WHERE name = 't'
);

NB: As of 07/15/22 the injected descriptor has been edited to have an ID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xiang-Gu @postamar ditto as above.

This change introduces the `system.external_connections` table
and its associated migration. The system table will be responsible
for storing External Connection objects that represent resources
that reside outside of CockroachDB, for example, a bucket being backed
up into or a sink being written to by a changefeed.

Informs: cockroachdb#84225

Release note: None
Copy link
Collaborator

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! thank you!

This change introduces the `CREATE EXTERNAL CONNECTION` syntax
to CockroachDB. This statement can be used to create an External
Connection object that represents an external resource.

Majority of this change is introducing the required interfaces
to persist an External Connection object in the `system.external_connections`
table. We only register `nodelocal` as a supported External Connection
to allow for end to end testing of functionality. A user can now execute:

`CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';`

All the other providers will be registered as part of cockroachdb#84228.
Furthermore, none of the permission model outlined in the
RFC cockroachdb#84209 has been implemented, to minimize the scope of this change.

Release note (sql change): introduce `CREATE EXTERNAL CONNECTION`
syntax that can be used to create an External Connection representing
a resource that resides outside of CockroachDB. The only supported resource
at the moment is a `nodelocal` URI that can be represented as an External
Connection object using:

`CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo'`;

Fixes: cockroachdb#84225
@adityamaru
Copy link
Contributor Author

Bazel is only failing on the known windows build error. TFTR!!

bors r=miretskiy,benbardin

craig bot pushed a commit that referenced this pull request Jul 20, 2022
84310: sql,externalconn: introduce CREATE EXTERNAL CONNECTION r=miretskiy,benbardin a=adityamaru

systemschema: introduce system.external_connections table

This change introduces the `system.external_connections` table
and its associated migration. The system table will be responsible
for storing External Connection objects that represent resources
that reside outside of CockroachDB, for example, a bucket being backed
up into or a sink being written to by a changefeed.

Informs: #84225

Release note: None

sql,cli,spanconfigccl: test updates when adding system table

This is a purely mechanical commit to update all the tests that
need updating when introducing a new system table.

Release note: None

sql,externalconn: introduce CREATE EXTERNAL CONNECTION

This change introduces the `CREATE EXTERNAL CONNECTION` syntax
to CockroachDB. This statement can be used to create an External
Connection object that represents an external resource.

Majority of this change is introducing the required interfaces
to persist an External Connection object in the `system.external_connections`
table. We only register `nodelocal` as a supported External Connection
to allow for end to end testing of functionality. A user can now execute:

`CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';`

All the other providers will be registered as part of #84228.
Furthermore, none of the permission model outlined in the
RFC #84209 has been implemented, to minimize the scope of this change.

Release note (sql change): introduce `CREATE EXTERNAL CONNECTION`
syntax that can be used to create an External Connection representing
a resource that resides outside of CockroachDB. The only supported resource
at the moment is a `nodelocal` URI that can be represented as an External
Connection object using:

`CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo'`;

Fixes: #84225

Co-authored-by: Aditya Maru <adityamaru@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jul 20, 2022

Build failed:

@adityamaru
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 21, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 21, 2022

Build succeeded:

@craig craig bot merged commit 9c263df into cockroachdb:master Jul 21, 2022
@adityamaru adityamaru deleted the external-create branch July 21, 2022 18:32
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 22, 2022
In cockroachdb#84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: cockroachdb#84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 25, 2022
In cockroachdb#84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: cockroachdb#84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 27, 2022
In cockroachdb#84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: cockroachdb#84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.
craig bot pushed a commit that referenced this pull request Jul 27, 2022
84931: externalconn,nodelocal: add `external` ExternalStorage provider r=adityamaru a=adityamaru

In #84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: #84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.

85041: sql: better logging about the optimizer as well as some cancellation checks r=yuzefovich a=yuzefovich

**sql: add logging about the optimizer into the trace**

This commit derives a separate tracing span on the main query path that
covers the whole optimizer as well as adds a few logging messages around
the optbuilder and the optimization. It also audits some of the already
present logs to be written down before performing possibly long
operations.

Release note: None

**xform: check for context cancellation**

reviously, the optimizer didn't check the context cancellation ever
which could lead to surprising behavior - e.g. if the statement timeout
occurs in the middle of the optimization, we would still finish the
whole optimization phase only to realize that we were canceled at the
execution time.

This situation is now improved by periodically checking for the context
cancellation every time a group is optimized by `optimizeGroup`. It's
still possible for the optimizer to continue running long after
cancellation if there is a long running procedure at a deeper level.
For example, a long-running, individual invocation of an exploration
rule will not halt due to context cancellation. However,
the cancellation improvement in this commit should halt long-running
optimization in most cases.

Fixes: #70245.
Addresses: #70314.

Release note: None

85101: kvcoord: remove an artifact of unsetting WriteTooOld flag r=yuzefovich a=yuzefovich

It seems like during 20.1 release cycle we had a change so that the
errors don't carry `WriteTooOld` flag set to `true` and now only the
BatchResponse can have that. We kept the ability of unsetting that flag
so that in a mixed-version scenario we wouldn't run into any issues, but
now that unsetting behavior is no longer needed and is removed in this
commit.

My particular interest in removing this is because it is the only
modification of the `SetupFlowRequest` proto (which carries the `Txn`
proto inside) that occurs during the distributed query setup, and I want
to have more parallelization there.

Release note: None

85118: roachpb: remove `RangeDescriptor.deprecated_generation_comparable` r=pavelkalinnikov,andreimatei a=erikgrinaker

This patch removes the unused field `deprecated_generation_comparable`
from `RangeDescriptor`. The preparatory work for this was done in 20.2,
see a578719.

Release note: None

Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

external: introduce CREATE EXTERNAL CONNECTION
4 participants