Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: relax restrictions on partition name reuse #39332

Merged
merged 1 commit into from
Aug 7, 2019

Conversation

solongordon
Copy link
Contributor

@solongordon solongordon commented Aug 5, 2019

Previously, we did not allow users to reuse partition names between
different indexes on the same table. The reason for this was that it
caused some ambiguity with the zone specifiers passed to the cockroach zone CLI. However, that CLI has now been removed in favor of more
explicit SQL statements, so this is no longer an issue. I removed this
restriction, so partition names now only need to be unique per index,
not per table.

My original intent was to do away with the CLI zone specifiers (also
known as zone names) entirely, since they are no longer specified by
users in any command. However, there are a few places that rely on
having a string identifier for each zone. One is the event log, which
logs an event whenever a zone configuration is set or removed. Another
is the 'SHOW ZONE CONFIGURATIONS' command, which includes a zone_name
column for each row. (We could remove this but I think readability would
suffer.) The Admin UI uses zone name as well in its proto interface,
though I'm not sure if it's actually displayed anywhere in the UI.

So rather than removing zone names, I decided to keep them around only
as a display identifier. I removed any logic which relied on parsing a
zone name into a ZoneSpecifier. I also added explicit columns to the
crdb_internal.zones table for database_name, table_name, etc. to avoid
reliance on the zone_name for JOINs and such. I also removed any
references to "CLI specifiers."

The zone name for a partition on a secondary index takes the form
DATABASE.TABLE@INDEX.PARTITION. This was previously a disallowed format
for CLI specifiers, which either specified an index or a partition.

Finally, please note that it is no longer possible to alter a secondary index
partition using the ALTER PARTITION ... OF TABLE command. Users must
use ALTER PARTITION ... OF INDEX instead.

Release note (sql change): Partition names can now be reused between
different indexes on the same table.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@solongordon solongordon requested review from rohany, a team and danhhz August 5, 2019 18:54
Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does alter partition of table ... need to be removed?

Reviewed 11 of 19 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @rohany)

@solongordon
Copy link
Contributor Author

It wasn't removed entirely. It's just you can only use that command to alter partitions on the primary index now. Otherwise it's ambiguous: if you have a partition named p on multiple indexes and run ALTER PARTITION p OF TABLE, which one did you mean to alter?

@rohany
Copy link
Contributor

rohany commented Aug 5, 2019

That makes sense

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me, most of the changes are in deleting stuff from old tests. Restructuring the zones table like this definitely makes it much more usable in other contexts.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @solongordon)


pkg/sql/logictest/testdata/logic_test/crdb_internal, line 206 at r2 (raw file):

SELECT * FROM crdb_internal.zones WHERE false
----
zone_id  zone_name  range_name  database_name  table_name  index_name  partition_name  config_yaml  config_sql  config_protobuf

we implicitly test this table by using it in SHOW PARTITIONS, but I don't see anywhere that you had to update something testing it directly. should we?

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @danhhz)


pkg/sql/logictest/testdata/logic_test/crdb_internal, line 206 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

we implicitly test this table by using it in SHOW PARTITIONS, but I don't see anywhere that you had to update something testing it directly. should we?

Yes, good point, I'll add a test.

Previously, we did not allow users to reuse partition names between
different indexes on the same table. The reason for this was that it
caused some ambiguity with the zone specifiers passed to the `cockroach
zone` CLI. However, that CLI has now been removed in favor of more
explicit SQL statements, so this is no longer an issue. I removed this
restriction, so partition names now only need to be unique per index,
not per table.

My original intent was to do away with the CLI zone specifiers (also
known as zone names) entirely, since they are no longer specified by
users in any command. However, there are a few places that rely on
having a string identifier for each zone. One is the event log, which
logs an event whenever a zone configuration is set or removed. Another
is the 'SHOW ZONE CONFIGURATIONS' command, which includes a `zone_name`
column for each row. (We could remove this but I think readability would
suffer.) The Admin UI uses zone name as well in its proto interface,
though I'm not sure if it's actually displayed anywhere in the UI.

So rather than removing zone names, I decided to keep them around only
as a display identifier. I removed any logic which relied on parsing a
zone name into a `ZoneSpecifier`. I also added explicit columns to the
crdb_internal.zones table for database_name, table_name, etc. to avoid
reliance on the zone_name for JOINs and such. I also removed any
references to "CLI specifiers."

The zone name for a partition on a secondary index takes the form
DATABASE.TABLE@INDEX.PARTITION. This was previously a disallowed format
for CLI specifiers, which either specified an index or a partition.

Finally, please note that it is no longer possible to alter a secondary
index partition using the `ALTER PARTITION ... OF TABLE` command. Users
must use `ALTER PARTITION ... OF INDEX` instead.

Release note (sql change): Partition names can now be reused between
different indexes on the same table.
@solongordon
Copy link
Contributor Author

TFTRs!

bors r+

craig bot pushed a commit that referenced this pull request Aug 7, 2019
39332: sql: relax restrictions on partition name reuse r=solongordon a=solongordon

Previously, we did not allow users to reuse partition names between
different indexes on the same table. The reason for this was that it
caused some ambiguity with the zone specifiers passed to the `cockroach
zone` CLI. However, that CLI has now been removed in favor of more
explicit SQL statements, so this is no longer an issue. I removed this
restriction, so partition names now only need to be unique per index,
not per table.

My original intent was to do away with the CLI zone specifiers (also
known as zone names) entirely, since they are no longer specified by
users in any command. However, there are a few places that rely on
having a string identifier for each zone. One is the event log, which
logs an event whenever a zone configuration is set or removed. Another
is the 'SHOW ZONE CONFIGURATIONS' command, which includes a `zone_name`
column for each row. (We could remove this but I think readability would
suffer.) The Admin UI uses zone name as well in its proto interface,
though I'm not sure if it's actually displayed anywhere in the UI.

So rather than removing zone names, I decided to keep them around only
as a display identifier. I removed any logic which relied on parsing a
zone name into a `ZoneSpecifier`. I also added explicit columns to the
crdb_internal.zones table for database_name, table_name, etc. to avoid
reliance on the zone_name for JOINs and such. I also removed any
references to "CLI specifiers."

The zone name for a partition on a secondary index takes the form
DATABASE.TABLE@INDEX.PARTITION. This was previously a disallowed format
for CLI specifiers, which either specified an index or a partition.

Finally, please note that it is no longer possible to alter a secondary index
partition using the `ALTER PARTITION ... OF TABLE` command. Users must
use `ALTER PARTITION ... OF INDEX` instead.

Release note (sql change): Partition names can now be reused between
different indexes on the same table.

Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Aug 7, 2019

Build succeeded

@craig craig bot merged commit f17eb1c into cockroachdb:master Aug 7, 2019
@solongordon solongordon deleted the index-scoped-partition-names branch August 8, 2019 12:00
solongordon added a commit to solongordon/cockroach that referenced this pull request Sep 3, 2019
The commands for partitioning indexes in the TPCC import were erroring
out due to a syntax change introduced in cockroachdb#39332. I updated them to use
`ALTER PARTITION ... OF INDEX` rather than `ALTER PARTITION ... OF
TABLE`.

Fixes cockroachdb#39005
Fixes cockroachdb#40360
Fixes cockroachdb#40416

Release note: None
craig bot pushed a commit that referenced this pull request Sep 3, 2019
40248: opt: calculate number of rows processed when costing joins r=rytaft a=rytaft

This PR updates the costing of joins to take into account the number of
rows processed by the operator. This number may be larger than the
number of output rows if an additional filter is applied as part of the
ON condition that is not used to determine equality
columns for the join.

For example, consider the query
  `SELECT * FROM abc JOIN def ON a = e AND b = 3;`

Assuming there is no index on b, if a lookup join is used to execute this
query, the number of rows processed is actually the same as the query
  `SELECT * FROM abc JOIN def ON a = e;`

The difference is that the filter b=3 must also be applied to every row in
the first query. The coster now takes this into account when determining
the cost of joins.

Fixes #34810

Release note: None

40431: workload: fix partition commands in tpcc import r=solongordon a=solongordon

The commands for partitioning indexes in the TPCC import were erroring
out due to a syntax change introduced in #39332. I updated them to use
`ALTER PARTITION ... OF INDEX` rather than `ALTER PARTITION ... OF
TABLE`.

Fixes #39005
Fixes #40360
Fixes #40416

Release note: None

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants