Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colserde: add simple serialization for DECIMALs #43311

Merged
merged 1 commit into from
Jan 14, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Dec 18, 2019

This commit adds simple serialization of DECIMALs using the provided
MarshalText method.

Touches: #37044.

Release note (sql change): Previously, DECIMALs could not be sent over
the network when the computation was performed by the vectorized engine.
This has been fixed, and the vectorized engine now fully supports
DECIMAL type.

@yuzefovich yuzefovich requested review from asubiotto and a team December 18, 2019 21:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/col/colserde/arrowbatchconverter_test.go, line 102 at r1 (raw file):

				}
			}
		} else if typ == coltypes.Timestamp {

I've just run into a failure because require.Equal didn't work on equivalent timestamp slices. This probably happened because there is *time.Location in there, and maybe my assumption that those locations are cached (which I mentioned in pkg/sql/colexec/allocator.go:172) is incorrect. But it's a separate concern, so I'll leave it as TODO.

@yuzefovich
Copy link
Member Author

The benchmarks look rather sad though:

BenchmarkArrowBatchConverter/Bool/nullFraction=0.00/BatchToArrow-16         	  447144	      2249 ns/op	 455.37 MB/s
BenchmarkArrowBatchConverter/Bool/nullFraction=0.25/BatchToArrow-16         	  542944	      2255 ns/op	 454.18 MB/s
BenchmarkArrowBatchConverter/Bool/nullFraction=0.50/BatchToArrow-16         	  546918	      2258 ns/op	 453.57 MB/s
BenchmarkArrowBatchConverter/Bool/nullFraction=0.00/ArrowToBatch-16         	  620725	      1875 ns/op	 546.18 MB/s
BenchmarkArrowBatchConverter/Bool/nullFraction=0.25/ArrowToBatch-16         	  672000	      1829 ns/op	 559.94 MB/s
BenchmarkArrowBatchConverter/Bool/nullFraction=0.50/ArrowToBatch-16         	  666283	      1827 ns/op	 560.52 MB/s
BenchmarkArrowBatchConverter/Bytes/nullFraction=0.00/BatchToArrow-16        	 5882042	       200 ns/op	327924.15 MB/s
BenchmarkArrowBatchConverter/Bytes/nullFraction=0.25/BatchToArrow-16        	 5831419	       205 ns/op	320126.79 MB/s
BenchmarkArrowBatchConverter/Bytes/nullFraction=0.50/BatchToArrow-16        	 5823836	       204 ns/op	320532.64 MB/s
BenchmarkArrowBatchConverter/Bytes/nullFraction=0.00/ArrowToBatch-16        	  769612	      1558 ns/op	42070.03 MB/s
BenchmarkArrowBatchConverter/Bytes/nullFraction=0.25/ArrowToBatch-16        	  748358	      1628 ns/op	40267.03 MB/s
BenchmarkArrowBatchConverter/Bytes/nullFraction=0.50/ArrowToBatch-16        	  674947	      1619 ns/op	40490.85 MB/s
BenchmarkArrowBatchConverter/Decimal/nullFraction=0.00/BatchToArrow-16      	    3733	    319850 ns/op	  71.26 MB/s
BenchmarkArrowBatchConverter/Decimal/nullFraction=0.25/BatchToArrow-16      	    3742	    319979 ns/op	  71.23 MB/s
BenchmarkArrowBatchConverter/Decimal/nullFraction=0.50/BatchToArrow-16      	    3673	    319458 ns/op	  71.35 MB/s
BenchmarkArrowBatchConverter/Decimal/nullFraction=0.00/ArrowToBatch-16      	    2220	    538115 ns/op	  42.36 MB/s
BenchmarkArrowBatchConverter/Decimal/nullFraction=0.25/ArrowToBatch-16      	    2264	    530742 ns/op	  42.95 MB/s
BenchmarkArrowBatchConverter/Decimal/nullFraction=0.50/ArrowToBatch-16      	    2154	    534273 ns/op	  42.66 MB/s
BenchmarkArrowBatchConverter/Int64/nullFraction=0.00/BatchToArrow-16        	 7542470	       157 ns/op	52246.90 MB/s
BenchmarkArrowBatchConverter/Int64/nullFraction=0.25/BatchToArrow-16        	 7385175	       163 ns/op	50367.65 MB/s
BenchmarkArrowBatchConverter/Int64/nullFraction=0.50/BatchToArrow-16        	 7430563	       161 ns/op	50768.17 MB/s
BenchmarkArrowBatchConverter/Int64/nullFraction=0.00/ArrowToBatch-16        	 8732474	       136 ns/op	60347.13 MB/s
BenchmarkArrowBatchConverter/Int64/nullFraction=0.25/ArrowToBatch-16        	 6350826	       189 ns/op	43292.38 MB/s
BenchmarkArrowBatchConverter/Int64/nullFraction=0.50/ArrowToBatch-16        	 6361404	       190 ns/op	43080.26 MB/s
BenchmarkArrowBatchConverter/Timestamp/nullFraction=0.00/BatchToArrow-16    	   15110	     80315 ns/op	 306.00 MB/s
BenchmarkArrowBatchConverter/Timestamp/nullFraction=0.25/BatchToArrow-16    	   14852	     80079 ns/op	 306.90 MB/s
BenchmarkArrowBatchConverter/Timestamp/nullFraction=0.50/BatchToArrow-16    	   14872	     80608 ns/op	 304.88 MB/s
BenchmarkArrowBatchConverter/Timestamp/nullFraction=0.00/ArrowToBatch-16    	    9068	    132900 ns/op	 184.92 MB/s
BenchmarkArrowBatchConverter/Timestamp/nullFraction=0.25/ArrowToBatch-16    	    9212	    132726 ns/op	 185.16 MB/s
BenchmarkArrowBatchConverter/Timestamp/nullFraction=0.50/ArrowToBatch-16    	    9378	    132135 ns/op	 185.99 MB/s

@asubiotto
Copy link
Contributor

Thanks for this! Could you try running some higher-level benchmarks to see what the microbenchmark slowdown translates to when running a query? Maybe some distributed aggregation on decimals just above the row count threshold to get a worst case comparison.

This commit adds simple serialization of DECIMALs using the provided
MarshalText method.

Release note (sql change): Previously, DECIMALs could not be sent over
the network when the computation was performed by the vectorized engine.
This has been fixed, and the vectorized engine now fully supports
DECIMAL type.
@yuzefovich
Copy link
Member Author

I ran a simple benchmark as you suggested: I set up a three node local cluster and created a table with 1000 decimals which were split into 3 ranges with one range per node and then ran SELECT sum(d) FROM t. With vectorized engine the query took around 1.5 ms, with row-by-row engine 1.2 ms. Then inserted another 9000 decimals and ran the query again. Now vectorized took 2.9 ms whereas row-by-row 3.2 ms. I think the performance of the current serialization is more or less acceptable, but if we have time this release to improve it, we should do so. In any case, it is (probably in most cases) better than not having serialization at all.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 6 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Jan 14, 2020
43311: colserde: add simple serialization for DECIMALs r=yuzefovich a=yuzefovich

This commit adds simple serialization of DECIMALs using the provided
MarshalText method.

Touches: #37044.

Release note (sql change): Previously, DECIMALs could not be sent over
the network when the computation was performed by the vectorized engine.
This has been fixed, and the vectorized engine now fully supports
DECIMAL type.

43793: sql: properly enforce uniqueness of key columns for FK references r=lucy-zhang a=lucy-zhang

We introduced a bug in 19.2 that would allow, e.g., a unique index on `(a, b,
c)` to be used as an index that is supposed to enforce uniqueness for a foreign
key constraint pointing only to `(a, b)`. This PR reintroduces a check that the
indexed columns match the FK columns exactly.

Fixes #42680.

Release note (bug fix): Fixed a bug introduced in 19.2 that would allow foreign
keys to use a unique index on the referenced columns that indexed more columns
than were included in the columns used in the FK constraint, which allows
potentially violating uniqueness in the referenced columns themselves.

43956: build: post issues from local roachtests r=andreimatei a=tbg

This wasn't happening due to some missing env vars. I noticed this since
there are many failures of acceptance/version-upgrade not reflected via
created issues (I get email notifications from TC).

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jan 14, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants