Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/pgwire: binary encoding for bool, decimal, and string #6661

Merged
merged 5 commits into from
May 13, 2016
Merged

sql/pgwire: binary encoding for bool, decimal, and string #6661

merged 5 commits into from
May 13, 2016

Conversation

tamird
Copy link
Contributor

@tamird tamird commented May 12, 2016

Depends on cockroachdb/postgres-test#9

Fixes #6526.

cc @mjibson for the test.


This change is Reviewable

@maddyblue
Copy link
Contributor

:lgtm:

Previously, tamird (Tamir Duberstein) wrote…

sql/pgwire: binary encoding for bool, decimal, and string

Depends on cockroachdb/postgres-test#9

Fixes #6526.

cc @mjibson for the test.


Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 1 of 1 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 3 of 3 files at r5, 1 of 4 files at r6, 4 of 4 files at r9.
Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


acceptance/c_test.go, line 67 [r5] (raw file):

  }

  // // '1401-01-19 BC'

Should all of these be commented out? Don't we want to test other data types?


sql/pgwire/types.go, line 63 [r5] (raw file):

//go:generate stringer -type=pgSign
type pgSign uint16

s/psSign/pgNumericSign/?


sql/pgwire/types.go, line 72 [r5] (raw file):

const (
  DEC_DIGITS = 4

A comment here would be helpful. That said, PG's comment /* decimal digits per NBASE digit */ tells me basically nothing without NBASE defined.


sql/pgwire/types.go, line 221 [r5] (raw file):

  case *parser.DDecimal:
      alloc := struct {

👍


sql/pgwire/types.go, line 242 [r5] (raw file):

      alloc.pgNum.dscale = int16(alloc.fracDec.Scale())

      for big := alloc.bigI.Set(alloc.wholeDec.UnscaledBig()); big.Sign() > 0; big.Div(big, decShift) {

big.Int.Div internally involves a doubly nested iteration, so it scares me to see it in another loop here (and below!). We're doing a lot to avoid allocations and speed this up, but if we're going to put so much effort into it, I think we should have benchmark metrics to back up whatever approach we choose. I wouldn't be surprised if printing either the big.Int or inf.Dec to a string and working on that proves to be a lot easier and of comparable performance.


sql/pgwire/types.go, line 509 [r5] (raw file):

              }
          }
          for i := int16(0); i < alloc.pgNum.ndigits; i++ {

Same thing about the benchmark here. I'm less convinced that working with a string would help, but I'd love to get numbers on it, and having the benchmark also gives us a foundation to base future optimizations off of.

Really any of these marshaling or unmarshmaling routines that aren't dead simple could benefit from having an associated benchmark as well, given that these are going to be in critical path of almost all queries and will will see usage scale linearly with the size of the query results. However, the types not touched here are out of the scope of this change.


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented May 13, 2016

Review status: 3 of 6 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


acceptance/c_test.go, line 67 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should all of these be commented out? Don't we want to test other data types?


These are the data types the binary format of which we don't support - I've left them here for future work.


sql/pgwire/types.go, line 63 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/psSign/pgNumericSign/?


Done.


sql/pgwire/types.go, line 72 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

A comment here would be helpful. That said, PG's comment /* decimal digits per NBASE digit */ tells me basically nothing without NBASE defined.


I'm open to suggestions =/


sql/pgwire/types.go, line 242 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

big.Int.Div internally involves a doubly nested iteration, so it scares me to see it in another loop here (and below!). We're doing a lot to avoid allocations and speed this up, but if we're going to put so much effort into it, I think we should have benchmark metrics to back up whatever approach we choose. I wouldn't be surprised if printing either the big.Int or inf.Dec to a string and working on that proves to be a lot easier and of comparable performance.


Added a benchmark for this guy (but not the below). That said, I found working with a string much clumsier and harder to read.


sql/pgwire/types.go, line 509 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same thing about the benchmark here. I'm less convinced that working with a string would help, but I'd love to get numbers on it, and having the benchmark also gives us a foundation to base future optimizations off of.

Really any of these marshaling or unmarshmaling routines that aren't dead simple could benefit from having an associated benchmark as well, given that these are going to be in critical path of almost all queries and will will see usage scale linearly with the size of the query results. However, the types not touched here are out of the scope of this change.


This one is a bit trickier to benchmark, but I'll add one before merging this.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 2 of 4 files at r10.
Review status: 4 of 6 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


sql/pgwire/types.go, line 72 [r5] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I'm open to suggestions =/


"the number of decimal digits per int16" or something. You know how it's used better than I do.


sql/pgwire/types.go, line 242 [r5] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Added a benchmark for this guy (but not the below). That said, I found working with a string much clumsier and harder to read.


The benchmark is indicating that this is taking on the order of 10 us, while we were able to get all key encoding routines to work on the order of 100 ns. That definitely seems like there is some area for improvement here.

I'm not sure why you think the string is clumsier, but from those previous numbers it doesnt seem unreasonable to think the string approach could give us something like a 10x speedup. For instance, instead of all of the divisions we do below, having a string we're working on (or better, a byte array from .Append, which could even be writeBuffer.putBuf) would allow us to just move an index in the array.

As a POC, try adding a benchmark for the decimal text marshaling. I would expect that to always be faster than this routine, but not by much.


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented May 13, 2016

Review status: 4 of 6 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


sql/pgwire/types.go, line 72 [r5] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"the number of decimal digits per int16" or something. You know how it's used better than I do.


Done.


sql/pgwire/types.go, line 242 [r5] (raw file):

      alloc.pgNum.dscale = int16(alloc.fracDec.Scale())

      for big := alloc.bigI.Set(alloc.wholeDec.UnscaledBig()); big.Sign() > 0; big.Div(big, decShift) {

As a POC, try adding a benchmark for the decimal text marshaling. I would expect that to always be faster than this routine, but not by much.

Doesn't sound like a useful POC to me; the result is completely different.

Anyway, I did something of a hybrid string/not string implementation that is indeed way faster! Thanks for the nag, PTAL.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Nice! I'll give you your hard earned LGTM once we have a decoding benchmark.

Previously, mjibson (Matt Jibson) wrote…

:lgtm:


Reviewed 1 of 4 files at r10.
Review status: 5 of 6 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


sql/pgwire/types.go, line 242 [r5] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

As a POC, try adding a benchmark for the decimal text marshaling. I would expect that to always be faster than this routine, but not by much.

Doesn't sound like a useful POC to me; the result is completely different.

Anyway, I did something of a hybrid string/not string implementation that is indeed way faster! Thanks for the nag, PTAL.


👍


sql/pgwire/types.go, line 241 [r14] (raw file):

      // managed manually instead of actually padding the string, for reasons of
      // performance.
      decDigits := alloc.bigI.Abs(v.UnscaledBig()).String()

You might want to try .Append(writeBuffer.putBuf, 10).


sql/pgwire/types.go, line 546 [r14] (raw file):

              if overPrecision := scale - alloc.pgNum.dscale; overPrecision > 0 {
                  scale -= overPrecision
                  alloc.i16 /= int16(math.Pow10(int(overPrecision)))

I'm wondering it it would be faster to just have a quick loop here:

for i := 0; i < overPrecision; i++ {
    alloc.i16 /= 10
}

I dont think overPrecision can ever be greater than 3, right?


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented May 13, 2016

Done.

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice! I'll give you your hard earned LGTM once we have a decoding benchmark.


Review status: 1 of 6 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


sql/pgwire/types.go, line 241 [r14] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You might want to try .Append(writeBuffer.putBuf, 10).


Using putBuf for this is no bueno, since those bytes will be trampled by the various putInt calls below.


sql/pgwire/types.go, line 546 [r14] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm wondering it it would be faster to just have a quick loop here:

for i := 0; i < overPrecision; i++ {
    alloc.i16 /= 10
}

I dont think overPrecision can ever be greater than 3, right?


Done.


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented May 13, 2016

Sure looks like all big arithmetic allocates. Let me see about optimizing decoding.

Previously, tamird (Tamir Duberstein) wrote…

Done.


Review status: 1 of 6 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


Comments from Reviewable

```
BenchmarkWriteBinaryDecimal-4 	  500000	      2564 ns/op	     160 B/op	       4 allocs/op
BenchmarkDecodeBinaryDecimal-4	  300000	      4491 ns/op	     384 B/op	      19 allocs/op
```
@nvanbenschoten
Copy link
Member

:lgtm:

Previously, tamird (Tamir Duberstein) wrote…

Sure looks like all big arithmetic allocates. Let me see about optimizing decoding.


Review status: 1 of 6 files reviewed at latest revision, all discussions resolved, some commit checks pending.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants