Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: various COPY improvements #78303

Merged
merged 5 commits into from Mar 24, 2022
Merged

Conversation

otan
Copy link
Contributor

@otan otan commented Mar 23, 2022

Refs: #41608

See individual commits for details.

@otan otan requested review from ajstorm and a team March 23, 2022 04:06
@otan otan requested a review from a team as a code owner March 23, 2022 04:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 3 of 3 files at r2, 1 of 1 files at r3, 3 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @otan)


-- commits, line 7 at r2:
Can we add some testing here to validate that the options show up as unimplemented?


pkg/util/encoding/csv/reader_test.go, line 396 at r4 (raw file):

		Escape: 'x',
		Input:  `"x"",",","xxx"",x,"xxxx,"` + "\n",
		Output: [][]string{{`"`, `,`, `x"`, `x`, `xx,`}},

I'm not following the 4th case. If we just provide an escape character, shouldn't this show nothing or a parse error?


pkg/util/encoding/csv/reader_test.go, line 402 at r4 (raw file):

		Comma:  'x',
		Input:  `"x""x,x"xxx""x"xx"x"xxxx,"` + "\n",
		Output: [][]string{{`"`, `,`, `x"`, `x`, `xx,`}},

I think I'm missing something. I would have thought that this would yield:

'"',',"x"','"x','"xx'


pkg/util/encoding/csv/writer_test.go, line 54 at r4 (raw file):

	{Input: [][]string{{"a", "a", "a"}}, Output: "a,a,a\n"},
	{Input: [][]string{{`\.`}}, Output: "\"\\.\"\n"},
	{Input: [][]string{{`"`, `,`, `x"`, `x`, `xx,`}}, Escape: 'x', Output: `"x"",",","xxx"",x,"xxxx,"` + "\n"},

More confusion on my part- why is a single x being interpreted as x instead of as the escape, and thus a parse error?

@ajstorm ajstorm self-requested a review March 23, 2022 15:29
@awoods187
Copy link
Contributor

Exciting stuff--can we list explicitly what will remain unimplemented for Copy from the linked issue?

@ajstorm
Copy link
Collaborator

ajstorm commented Mar 23, 2022 via email

@awoods187
Copy link
Contributor

Awesome news!

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm)


-- commits, line 7 at r2:

Previously, ajstorm (Adam Storm) wrote…

Can we add some testing here to validate that the options show up as unimplemented?

the new tests in TestUnimplementedSyntax in parse_test.go already does this


pkg/util/encoding/csv/reader_test.go, line 396 at r4 (raw file):

Previously, ajstorm (Adam Storm) wrote…

I'm not following the 4th case. If we just provide an escape character, shouldn't this show nothing or a parse error?

"x"",",","xxx"",x,"xxxx,"` shouldn't.
breaking it down:

"x"",: "x"" - the wrapping " indicates to use escapes, x escapes ", so it is ".
",",: the "," is just ,
"xxx"",: the x escapes x, then x escapes ", so it is x"
x,: no quotes, so we don't have to escape anything. so it is x (see below comment)
"xxxx,", x escapes x twice, then a comma, so xx,

Code quote:

"x"",",","xxx"",x,"xxxx,"` + "\n"

pkg/util/encoding/csv/reader_test.go, line 402 at r4 (raw file):

Previously, ajstorm (Adam Storm) wrote…

I think I'm missing something. I would have thought that this would yield:

'"',',"x"','"x','"xx'

"x""x, yields ", then a delimiter
,x yields , then a delimiter
"xxx""x yields "x", then a delimiter "xx"x": point of contention. begin quote, then x escapes x, so it becomes just x, then a delimiter
"xxxx,": quote, two escapes x's, then a comma, so xx,

fwiw all these outputs are from PG


pkg/util/encoding/csv/writer_test.go, line 54 at r4 (raw file):

Previously, ajstorm (Adam Storm) wrote…

More confusion on my part- why is a single x being interpreted as x instead of as the escape, and thus a parse error?

because things that are not in quotes are their accepted "raw value".

@otan otan force-pushed the csv_editing branch 2 times, most recently from 735456b to e690e37 Compare March 23, 2022 20:28
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @otan)


pkg/sql/pgwire/testdata/pgtest/copy, line 297 at r5 (raw file):

{"Type":"DataRow","Values":[{"text":"1"},{"text":"x\",x"}]}
{"Type":"CommandComplete","CommandTag":"SELECT 2"}
{"Type":"ReadyForQuery","TxStatus":"I"}

just to double-check, these new ones also passed against PG?

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @otan)


pkg/util/encoding/csv/reader.go, line 133 at r4 (raw file):

	// Escape, if not 0, is the character used to escape `"` characters and
	// itself.

maybe mention that if it's unset, it defaults to the same as QUOTE. (which will only be more important once we support the QUOTE option)

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @otan, and @rafiss)


pkg/util/encoding/csv/reader.go, line 133 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

maybe mention that if it's unset, it defaults to the same as QUOTE. (which will only be more important once we support the QUOTE option)

Done.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good from my end

@ajstorm ajstorm requested a review from rafiss March 23, 2022 23:42
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too. Thanks for clearing up my confusion about the unquoted characters.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @otan, and @rafiss)


-- commits, line 7 at r2:

Previously, otan (Oliver Tan) wrote…

the new tests in TestUnimplementedSyntax in parse_test.go already does this

Thanks. I see them now.


pkg/util/encoding/csv/reader_test.go, line 396 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

"x"",",","xxx"",x,"xxxx,"` shouldn't.
breaking it down:

"x"",: "x"" - the wrapping " indicates to use escapes, x escapes ", so it is ".
",",: the "," is just ,
"xxx"",: the x escapes x, then x escapes ", so it is x"
x,: no quotes, so we don't have to escape anything. so it is x (see below comment)
"xxxx,", x escapes x twice, then a comma, so xx,

Yeah, makes sense. It was step 4 that I was missing.


pkg/util/encoding/csv/reader_test.go, line 402 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

"x""x, yields ", then a delimiter
,x yields , then a delimiter
"xxx""x yields "x", then a delimiter "xx"x": point of contention. begin quote, then x escapes x, so it becomes just x, then a delimiter
"xxxx,": quote, two escapes x's, then a comma, so xx,

fwiw all these outputs are from PG

All good. Thanks.


pkg/util/encoding/csv/writer_test.go, line 54 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

because things that are not in quotes are their accepted "raw value".

Ahh, interesting. That explains all of my confusion. Thx.

@ajstorm ajstorm self-requested a review March 23, 2022 23:42
@otan otan force-pushed the csv_editing branch 3 times, most recently from f102228 to 03ee50c Compare March 24, 2022 00:51
Add `ESCAPE` logic to the `encoding/csv` package, for exposure to
SQL at a later stage.

It is worth noting I wrote this in the "safest" backportable way
possible. Ideally we'd rewrite the read logic to be more "parser"-like
to account for the change in QUOTE case, but that's a lot riskier to
backport.

Release note: None
Release note (sql change): Implemented the `COPY FROM ... ESCAPE ...`
syntax.
@otan
Copy link
Contributor Author

otan commented Mar 24, 2022

thanks

bors r=rafiss,ajstorm

@craig
Copy link
Contributor

craig bot commented Mar 24, 2022

Build succeeded:

@craig craig bot merged commit 8b36717 into cockroachdb:master Mar 24, 2022
@otan
Copy link
Contributor Author

otan commented Mar 24, 2022 via email

@otan
Copy link
Contributor Author

otan commented Mar 24, 2022

blathers backport 21.2

@otan
Copy link
Contributor Author

otan commented Jun 8, 2022

blathers backport 21.2

@blathers-crl
Copy link

blathers-crl bot commented Jun 8, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 149a1db to blathers/backport-release-21.2-78303: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants