-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: support DEFAULT columns for constant DEFAULT expressions #50295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this pr should have release notes; it's a user visible change.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96)
pkg/ccl/importccl/import_stmt.go, line 593 at r1 (raw file):
// expressions are nullable. for _, col := range found.VisibleColumns() { if len(isTargetCol) != 0 && !isTargetCol[col.Name] && !col.IsNullable() && !col.HasDefault() {
can you wrap this long if statement?
pkg/ccl/importccl/import_stmt_test.go, line 2490 at r1 (raw file):
}) // Supporting import into with non-targetted default or nullable columns are not implemented yet at this point.
can you wrap long lines (100 char or so)?
pkg/sql/row/row_converter.go, line 270 at r1 (raw file):
var txCtx transform.ExprTransformContext semaCtx := tree.MakeSemaContext() // We do not currently support DEFAULT expressions on target or non-target
does this comment need to be updated?
pkg/sql/row/row_converter.go, line 303 at r1 (raw file):
c.Datums = make([]tree.Datum, len(targetColDescriptors), len(cols)) // Check for a hidden column. This should be the unique_rowid PK if present.
this comment needs to be updated; we no longer just check hidden columns, but also assign defaults
264af78
to
3640e7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miretskiy just a head's up that I think this PR is still a WIP (I noticed it was marked as draft).
@Anzoteh96 I sometimes update the title temporarily to be prefixed with [WIP]
since it's not easy to tell if a PR is marked as draft in reviewable.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just changed that to "Ready for review". Made small changes as per below (except adding Release Note).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt.go, line 593 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
can you wrap this long if statement?
Done.
pkg/ccl/importccl/import_stmt_test.go, line 2490 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
can you wrap long lines (100 char or so)?
Test removed: as per the discussion with @pbardea , we will focus only on adding support for IMPORT INTO
non-targeted columns withDEFAULT
expressions as it specifies exactly which columns in a table t
should the columns in CSV map to.
Having non-targeted column with DEFAULT
expression for IMPORT TABLE
, on the other hand, could introduce ambiguity for the following reason: say we have a CSV file my_csv
of types (INT, INT, INT)
and we do the following:
IMPORT TABLE t (a INT, b INT DEFAULT 10, c INT, d INT DEFAULT 8) CSV DATA ("my_csv")
How to decide which column (b or d) to be assigned default value instead of taken from the CSV?
pkg/sql/row/row_converter.go, line 270 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
does this comment need to be updated?
Done.
pkg/sql/row/row_converter.go, line 303 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
this comment needs to be updated; we no longer just check hidden columns, but also assign defaults
Done.
88b3d09
to
a1dceee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt_test.go, line 2479 at r3 (raw file):
// data from source file (like CSV) is used. It also checks IMPORT TABLE works // when there are default columns. t.Run("import-into-default", func(t *testing.T) {
Note: these are the tests I have now. Feel free to suggest anything else (including the specifications of this IMPORT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a work in progress or is it ready for review?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt, @miretskiy, and @pbardea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ready for review now :)
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt, @miretskiy, and @pbardea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96, @dt, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt.go, line 590 at r3 (raw file):
} // Ensure that non-target columns that doesn't have default
nit: s/doesn't/don't/
pkg/ccl/importccl/import_stmt.go, line 594 at r3 (raw file):
for _, col := range found.VisibleColumns() { if len(isTargetCol) != 0 && !isTargetCol[col.Name] && !col.IsNullable() && !col.HasDefault() {
This may be a personal thing, but I find this logic a lot easier to understand as !(col.IsNullable() || col.HasDefault())
since then we're just enforcing that every non-target column is either nullable or has a default value. Again, it may just be personal preference but something to consider.
pkg/ccl/importccl/import_stmt.go, line 595 at r3 (raw file):
if len(isTargetCol) != 0 && !isTargetCol[col.Name] && !col.IsNullable() && !col.HasDefault() { return errors.Errorf("all non-target columns in IMPORT INTO must be nullable")
I think we need to update this error message adding the fact that these columns must be nullable or have a default. While we're here I think we can s/Errorf/Error/
since this error message isn't using any formatting directives.
pkg/ccl/importccl/import_stmt_test.go, line 2475 at r3 (raw file):
}) // Test that IMPORT INTO works when columns with default expression is present.
nit: s/default expression is/default expressions are/
pkg/ccl/importccl/import_stmt_test.go, line 2479 at r3 (raw file):
Previously, Anzoteh96 (Anzo Teh) wrote…
Note: these are the tests I have now. Feel free to suggest anything else (including the specifications of this IMPORT).
I would be interested in adding a test case with a default column that also has a NOT NULL
constraint given that we have some checks around nullability and default columns. I'd also be interested in seeing some tests cases where the data has NULL
values that are inserted into default columns. I would also be interested in a test case with a more complex default expressions, perhaps using some builtins?
pkg/ccl/importccl/import_stmt_test.go, line 2495 at r3 (raw file):
}) t.Run("is-target", func(t *testing.T) { data = "1,36\n2,42"
I think having the last value set to the same values as the default expression is a bit confusing in this case - I'm not sure what we're testing here? From a test perspective, I think it would be more clear if this value was set to something else to clearly indicate that we don't expect the default expression to be evaluated.
pkg/sql/row/row_converter.go, line 301 at r3 (raw file):
// Check for a hidden column. This should be the unique_rowid PK if present. // In addition, check for non-targeted columns with non-null DEFAULT expressions. colNameSet := make(map[string]struct{}, len(targetColDescriptors))
I think targetColNames
or something of that sort may be a more descriptive name.
pkg/sql/row/row_converter.go, line 318 at r3 (raw file):
datum, err := defaultExprs[i].Eval(evalCtx) if err != nil { return nil, errors.New("Error parsing default expression")
nit: we usually format error messages with a lower-case first letter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96, @dt, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt.go, line 595 at r3 (raw file):
if len(isTargetCol) != 0 && !isTargetCol[col.Name] && !col.IsNullable() && !col.HasDefault() { return errors.Errorf("all non-target columns in IMPORT INTO must be nullable")
let's update this error message: "all non-target columsn ... must be nullable or have default expression set"
pkg/ccl/importccl/import_stmt_test.go, line 2523 at r3 (raw file):
{"11", "string3", "33"}, {"29", "string4", "33"}}) })
See the comment I left in row converter.
You definitely need a good test case where default expression changes.
For example:
create sequence testseq;
create table a(a int primary key default unique_rowid(), b string, c int default nextval('testseq'));
If you were to import INTO column B, a CSV file with the following data:
un
deux
trois
You expect something like this to be stored in the table.
a | b | c
---------------------+-------+----
566184438199091201 | un | 1
566184438203088897 | deux | 2
566184438203678721 | trois | 3
pkg/sql/row/row_converter.go, line 315 at r3 (raw file):
c.Datums = append(c.Datums, nil) } else { if _, ok := colNameSet[col.Name]; !ok && col.DefaultExpr != nil {
I know "ok" is pretty common pattern, but consider renaming to isTargetCol (or similar).
pkg/sql/row/row_converter.go, line 320 at r3 (raw file):
return nil, errors.New("Error parsing default expression") } c.Datums = append(c.Datums, datum)
This looks a bit suspicious to me. Disclaimer: I'm not an sql expert; so perhaps somebody else should also take a look at this (@dt)?
Reading documentation on "DEFAULT" expressions:
"The value may be either a hard-coded literal or an expression that is evaluated at the time the row is created."
So, the problem here is that datum converter is created once (or few times, if using parallel importer), and is then reused for the duration of the import. That means that this expression would not actually be evaluated at row creating time, but instead at the time when we start the import.
That seems a bit wrong to me. I think this code ought to be moved to Row() function below, and handled similarly to how we handle primary key (note, above, we append nil in the case of Hidden column because we want to evaluate the primary key default expression, namely unique_rowid(), when we actually call Row()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96, @dt, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt.go, line 594 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
This may be a personal thing, but I find this logic a lot easier to understand as
!(col.IsNullable() || col.HasDefault())
since then we're just enforcing that every non-target column is either nullable or has a default value. Again, it may just be personal preference but something to consider.
This is not just you -- I also find it harder to read..
There are few things we can do to help things out:
Before this loop, I would add:
isTargetCol := func(name string) bool {
return len(isTargetCol) > 0 && isTargetCol[col.Name]
}
Then, in this loop, this condition becomes
if !(isTargetCol(col.Name) || col.IsNullable() || col.HasDefault()) {
}
bc84440
to
f8537c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96, @dt, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt.go, line 590 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
nit:
s/doesn't/don't/
Done.
pkg/ccl/importccl/import_stmt.go, line 594 at r3 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
This is not just you -- I also find it harder to read..
There are few things we can do to help things out:Before this loop, I would add:
isTargetCol := func(name string) bool {
return len(isTargetCol) > 0 && isTargetCol[col.Name]
}Then, in this loop, this condition becomes
if !(isTargetCol(col.Name) || col.IsNullable() || col.HasDefault()) {
}
Did an alternative change: move len(isTargetCol)
is an outermost loop (as this is a value that stays the same across all the columns), so that we can have !(isTarget[Col.Name] || col.IsNullable() || col.HasDefault())
.
pkg/ccl/importccl/import_stmt.go, line 595 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think we need to update this error message adding the fact that these columns must be nullable or have a default. While we're here I think we can
s/Errorf/Error/
since this error message isn't using any formatting directives.
error.Error
seemed to give some sort of "missing implementation" error, I changed it to errors.New(...)
.
pkg/ccl/importccl/import_stmt.go, line 595 at r3 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
let's update this error message: "all non-target columsn ... must be nullable or have default expression set"
Done.
pkg/ccl/importccl/import_stmt_test.go, line 2475 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
nit:
s/default expression is/default expressions are/
Done.
pkg/ccl/importccl/import_stmt_test.go, line 2495 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think having the last value set to the same values as the default expression is a bit confusing in this case - I'm not sure what we're testing here? From a test perspective, I think it would be more clear if this value was set to something else to clearly indicate that we don't expect the default expression to be evaluated.
Done. (Changed to 37).
pkg/sql/row/row_converter.go, line 318 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
nit: we usually format error messages with a lower-case first letter
Done.
pkg/sql/row/row_converter.go, line 320 at r3 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
This looks a bit suspicious to me. Disclaimer: I'm not an sql expert; so perhaps somebody else should also take a look at this (@dt)?
Reading documentation on "DEFAULT" expressions:
"The value may be either a hard-coded literal or an expression that is evaluated at the time the row is created."So, the problem here is that datum converter is created once (or few times, if using parallel importer), and is then reused for the duration of the import. That means that this expression would not actually be evaluated at row creating time, but instead at the time when we start the import.
That seems a bit wrong to me. I think this code ought to be moved to Row() function below, and handled similarly to how we handle primary key (note, above, we append nil in the case of Hidden column because we want to evaluate the primary key default expression, namely unique_rowid(), when we actually call Row()).
You're right: unfortunately, the current implementation only supports DEFAULT literal expressions for non-targeted columns. For default values that are functions (that are sequence-related) the change is more elaborate than what you outlined: it also involves changing the Sequence
field in the evalCtx
into to either a planner
or importSequenceOperators
field (and in the later case, we also need to implement func ( *importSequenceOperators) IncrementSequence
).
Apologies for not catching that earlier: I will post an update when this is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Anzoteh96, @dt, @miretskiy, and @pbardea)
pkg/ccl/importccl/import_stmt.go, line 595 at r3 (raw file):
Previously, Anzoteh96 (Anzo Teh) wrote…
error.Error
seemed to give some sort of "missing implementation" error, I changed it toerrors.New(...)
.
You might use errors.Newf(...) if you also want to include column name in the error message.
…INTO The PR cockroachdb#50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. This is done by assigning the same value of `rowid` that was generated at the `IMPORT stage (row converter) as the default value. Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression.
…INTO The PR cockroachdb#50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. This is done by assigning the same value of `rowid` that was generated at the `IMPORT stage (row converter) as the default value. Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression.
…INTO The PR cockroachdb#50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. This is done by assigning the same value of `rowid` that was generated at the `IMPORT stage (row converter) as the default value. Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression.
…INTO The PR cockroachdb#50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. This is done by assigning the same value of `rowid` that was generated at the `IMPORT stage (row converter) as the default value. Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression.
…INTO The PR cockroachdb#50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. This is done by assigning the same value of `rowid` that was generated at the `IMPORT stage (row converter) as the default value. Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression.
…INTO The PR cockroachdb#50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. This is done by assigning the same value of `rowid` that was generated at the `IMPORT stage (row converter) as the default value. Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression.
…INTO The PR cockroachdb#50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. Previously, the only support given to rowid as a default expression is for hidden column, which is a function of timestamp, row number, and source ID (the ID of processor). To accommodate for more usage of unique_rowid(), this PR modifies the unique_rowid function by making unique_rowid as a function of timestamp, row number, source ID, the total occurrences of unique_rowid in the table schema, and instances of each unique_rowid within each row. In addition, this PR also modifies the visitor method cockroachdb#51390 by adding override methods for volatile methods like unique_rowid. Annotations containing the total occurrences of unique_rowid and unique_rowid instances within a row are stored inside evalCtx, which will be read and updated when visitor walks through the default expression at the sanitization stage, and when default expression is evaluated at each row. Partially addresses cockroachdb#48253 Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression.
50922: importccl: support `unique_rowid()` as default expression for IMPORT INTO r=Anzoteh96 a=Anzoteh96 The PR #50295 supports non-targeted columns with constant expression. This PR is a follow up to that in adding support to `unique_rowid()`. Previously, the only support given to `rowid` as a default expression is for hidden column, which is a function of timestamp, row number, and source ID (the ID of processor). To accommodate for more usage of `unique_rowid()`, this PR modifies the `unique_rowid` function by making `unique_rowid` as a function of: 1. timestamp; 2. row number; 3. source ID; 4. the total occurrences of `unique_rowid` in the table schema; 5. instances of each `unique_rowid` within each row. In addition, this PR also modifies the visitor method #51390 by adding override methods for volatile methods like `unique_rowid`. Annotations containing the total occurrences of `unique_rowid` and `unique_rowid` instances within a row are stored inside `evalCtx`, which will be read and updated when visitor walks through the default expression at the sanitization stage, and when default expression is evaluated at each row. Partially addresses #48253 Release note (general change): IMPORT INTO now supports `unique_rowid()` as a default expression. 51518: rowflow,colexec: make routers propagate errors to all non-closed outputs r=yuzefovich a=yuzefovich This commit changes the way we propagate the errors in the hash router so that the error metadata is sent on all non-closed streams. Previously, we would be sending it over only the first non-closed stream which could result in the processors on the same stage as that single stream end to treat the absence of rows and errors as the input being exhausted successfully, which is wrong because the input did encounter an error. The same thing has been happening in the vectorized flow, but in that case the problem is less severe - the issue will present itself only when we have wrapped processors (because the materializers will prevent the propagation throughout the whole flow as described below): In the vectorized engine we use panic-catch mechanism of error propagation, and we end up with the following sequence of events: 1. an operator encounters an error on any node (e.g. `colBatchScan` encounters RWUI error on a remote node). It is not an internal vectorized error, so the operator will panic with `colexecerror.ExpectedError`. 2. the panic is caught by one of the catchers (it can be a parallel unordered synchronizer goroutine, an outbox goroutine, a materializer, a hash router) 3. that component will then decide how to propagate the error further: 3.1 if it is a parallel unordered synchronizer, then it will cancel all of its inputs and will repanic 3.2 if it is an outbox, the error is sent as metadata which will be received by an inbox which will panic with it 3.3. if it is a materializer, then it might swallow the error (this is the reason we need for the vectorized hash router to send the error to all of its inputs). The swallowing is acceptable if it is the root materializer though. 3.4 if it is a hash router, it'll cancel all of its outputs and will forward the error on each of the outputs. Fixes: #51458. Release note (bug fix): Previously, CockroachDB could return incorrect results on query that encountered ReadWithinUncertaintyInterval error, and this has been fixed. 52016: colexec: re-enable short-circuiting in the hash joiner r=yuzefovich a=yuzefovich This commit re-enables short-circuiting logic in the hash joiner when the build side is empty (it was temporarily disabled because of #48785 which has been fixed). Fixes: #49631. Release note: None 52027: sql: skip TestQueryProgress r=yuzefovich a=yuzefovich This test started failing more often, so we'll skip it temporarily until we figure it out. Addresses: #51356. Release note: None Co-authored-by: anzoteh96 <anzot@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding (say, every N rows) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
This PR follows up from cockroachdb#50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding every N = 1000 instances of random() and gen_random_uuid() (combined) and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT.
52247: importccl: support random() and gen_random_uuid() as default expressions r=pbardea a=Anzoteh96 This PR follows up from #50295 to support random() and gen_random_uuid() as default expression. Following the visitor and overriding patterns on previous PRs, the random() expression is supported by periodically re-seeding for every N instances of random() and gen_random_uuid() (combined), and making sure that when we resume (after a failed import), we return to the position where it's last re-seeded. The seed is determined by the row number, timestamp, and source ID. Release note (general change): random() and gen_random_uuid() are supported as default expressions for IMPORT. Co-authored-by: anzoteh96 <anzot@cockroachlabs.com>
Currently,
IMPORT
does not support importing DEFAULT columns, and non-targeted columns must be nullable.This PR adds support for
IMPORT INTO
that involves non-targeted columns with DEFAULT expressions that are constant. Constant expressions are those that return the same value in different statements. Some examples are:-- Literals (booleans, strings, integers, decimals, dates)
-- Functions where each argument is a constant expression and the functions themselves depend solely on their arguments. Examples are arithmetic operations, boolean logical operations (AND, OR), string operations like concat and repeat.
In particular, functions like
now()
andnextval
that return different values across different transactions are not supported.This is done by evaluating the required default expressions in the function
sql/row/row_converter:NewDatumRowConverter
.To illustrate this, given file.csv with content:
this will be the expected behaviour of the following line:
Note that the default expression for (c) is used since it's non-targeted, but the default expression for (a) is ignored in favour of the column in the csv.
Fixes #48253
Release note (general change):
IMPORT INTO
now supports columns with constant default expressions, and non-targeted columns with constant default expressions are no longer required to be nullable.