Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance, decomposing requests and code generation #72

Closed
asdine opened this issue Aug 23, 2016 · 16 comments
Closed

Performance, decomposing requests and code generation #72

asdine opened this issue Aug 23, 2016 · 16 comments

Comments

@asdine
Copy link
Owner

asdine commented Aug 23, 2016

Storm's primary goal is to be simple to use.

It has never been about achieving good performance, that's why reflection is heavily used.
I am not a fan of reflection, it just happened naturally when designing the API.
I sacrified raw speed for ease of use and i am glad i did.

But i believe we can do something to avoid reflection when not needed, and without a lot of work.
Basically, getting one of several records is like this:

  • Using reflection on the given structure, extracting all the relevant informations
  • Fetching the selected field and value in the indexes to get the matching IDs
  • Querying BoltDB for those IDs

The reflection boilerplate is essentially done at the first step, it is necessary so we can collect the following informations:

  • The name of the bucket, which is the name of the struct
  • What field is the ID and is it a zero value?
  • What fields are indexed and what kind of index is used for the field

I think that if we can provide a set of methods that allow the users to manually provide these informations, we could achieve excellent performance.

These methods could also be used internally to simplify some parts of the code.

But the most interesting part is that we may be able to transform current struct declarations that use reflection into using the new methods that we talked about above, at compile time, using go generate or whatever.

@asdine
Copy link
Owner Author

asdine commented Aug 23, 2016

Here are some ideas:

First step would be to allow users to query indexes by themselves. The index package as it is right now isn't very intuitive.
Adding methods to Node seems more intuitive.

ids, err := db.Index("User", "Group", "staff")

// signature
Index(bucketName string, fieldName string, value interface{}) ([][]byte, error)

The good side of this method is that we don't need to tell what kind of index it is (list vs unique), both will return a list of ids.
The downside is that users need to specify the bucket name themselves.

We could also play with the From method:

ids, err := db.From("User").Index("Group", "staff")

// signature
Index(fieldName string, value interface{}) ([][]byte, error)

@asdine
Copy link
Owner Author

asdine commented Aug 23, 2016

We could then have a method to fetch all the ids at once.

users := make([]User, len(ids))
i := 0
err := db.GetAll("User", ids, func(value []byte, codec storm.Codec) error {
  err := codec.Decode(value, &users[i])
  if err != nil {
    return err
  }
  i++
  return nil
})

// signature
GetAll(bucketName string, ids [][]byte, func([]byte, storm.Codec) error) error

@asdine
Copy link
Owner Author

asdine commented Aug 23, 2016

Or we could simply ask the user to provide a pointer to the variable where he wants Storm to decode the value:

users := make([]User, len(ids))
err := db.GetAll("User", ids, func(i int) interface{} {
  return &users[i]
})

// signature
GetAll(bucketName string, ids [][]byte, func(int) interface{}) error

@asdine
Copy link
Owner Author

asdine commented Aug 23, 2016

The remaining reflection is in the Codec. Storm uses encoding/gob by default but one can easily creates its own Codec, even generate one, using ffjson, goa, or whatever

@asdine
Copy link
Owner Author

asdine commented Aug 24, 2016

After running some benchmarks, it turns out that removing reflection for reads doesn't speed things up at all:

# run on a 1 cpu / 1 gb ram Digital Ocean droplet

BenchmarkFind100WithIndex            300       5787692 ns/op      854553 B/op      21739 allocs/op
BenchmarkFind100WithoutIndex         100      12103953 ns/op     1711363 B/op      43725 allocs/op
BenchmarkIndex                    200000          5910 ns/op        1280 B/op         30 allocs/op
BenchmarkGetAll100                  3000        515640 ns/op       93234 B/op       1807 allocs/op
BenchmarkGetAll100WithIndex          300       5841613 ns/op      845365 B/op      21741 allocs/op
BenchmarkOneWithIndex              20000         70241 ns/op        9689 B/op        249 allocs/op
BenchmarkOneWithoutIndex             200       6021143 ns/op      843149 B/op      21914 allocs/op
BenchmarkSave                       2000        792640 ns/op       16805 B/op        245 allocs/op

What we can see:

  • BenchmarkFind100WithIndex uses the Find method to fetch 100 records out of 200 using gob.Encoder (it decodes 100 times)
  • BenchmarkGetAll100WithIndex does exactly the same thing, but uses db.Index() to fetch the IDs + db.GetAll to fetch the records corresponding to those IDs.

~= Approximately the same speed, sometimes slightly slower, sometimes slightly faster.

Now, let's change the Codec and use JSON:

BenchmarkFind100WithIndex           2000        741702 ns/op       53660 B/op        739 allocs/op
...
BenchmarkGetAll100WithIndex         2000        730447 ns/op       44460 B/op        741 allocs/op
...
BenchmarkSave                       2000       1035949 ns/op       84044 B/op        263 allocs/op

Still the same speed, but a lot faster for both tests.

Conclusion: The bottleneck is the Codec, not the struct parsing that happens before. Providing a Codec that uses generated code like Protobuf or ffjson would speed things up.

PS: Save is still slow though

@bep
Copy link
Collaborator

bep commented Aug 24, 2016

Benchmark runs with benchcmp (before, after) would be easier to understand, but I'm not surprised by the conclusion.

@asdine
Copy link
Owner Author

asdine commented Aug 24, 2016

Here is the comparison (i didn't know benchcmp, i like it)

benchmark                        old ns/op     new ns/op     delta
BenchmarkFind100WithIndex        6360606       748201        -88.24%
BenchmarkFind100WithoutIndex     11757504      1617889       -86.24%
BenchmarkIndex                   6915          6295          -8.97%
BenchmarkGetAll100               485732        118979        -75.51%
BenchmarkGetAll100WithIndex      6059162       746119        -87.69%
BenchmarkOneWithIndex            68554         14968         -78.17%
BenchmarkOneWithoutIndex         6049717       28336         -99.53%
BenchmarkSave                    807820        879132        +8.83%

benchmark                        old allocs     new allocs     delta
BenchmarkFind100WithIndex        21739          739            -96.60%
BenchmarkFind100WithoutIndex     43725          1724           -96.06%
BenchmarkIndex                   30             30             +0.00%
BenchmarkGetAll100               1807           307            -83.01%
BenchmarkGetAll100WithIndex      21741          741            -96.59%
BenchmarkOneWithIndex            249            39             -84.34%
BenchmarkOneWithoutIndex         21914          41             -99.81%
BenchmarkSave                    245            211            -13.88%

benchmark                        old bytes     new bytes     delta
BenchmarkFind100WithIndex        854584        53660         -93.72%
BenchmarkFind100WithoutIndex     1711373       109528        -93.60%
BenchmarkIndex                   1280          1280          +0.00%
BenchmarkGetAll100               93234         27816         -70.17%
BenchmarkGetAll100WithIndex      845363        44460         -94.74%
BenchmarkOneWithIndex            9689          1688          -82.58%
BenchmarkOneWithoutIndex         843160        1888          -99.78%
BenchmarkSave                    16802         15066         -10.33%

@bep
Copy link
Collaborator

bep commented Aug 24, 2016

That looks MUCH faster to me...?

@asdine
Copy link
Owner Author

asdine commented Aug 24, 2016

Yeah but all i did was changing the default codec in storm.go.

The Find benchmark for example is right here. I have printed the list to see if it had the right results and it did

@nooproblem
Copy link

@asdine what two codecs are you using in the most recent benchcmp results?

@asdine
Copy link
Owner Author

asdine commented Aug 26, 2016

@nooproblem gob for the "old" part and json for the "new" part

@bep
Copy link
Collaborator

bep commented Aug 26, 2016

@asdine both tests and benchmarks fail for me when changing to JSON as default codec. I think it would add value to somehow run all tests with both (and I suspect JSON is a better default, too).

panic: reflect.Value.Slice: slice index out of bounds [recovered]
    panic: reflect.Value.Slice: slice index out of bounds

goroutine 56 [running]:
panic(0x2e8c00, 0xc4203facd0)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
testing.tRunner.func1(0xc42009a9c0)
    /usr/local/go/src/testing/testing.go:579 +0x25d
panic(0x2e8c00, 0xc4203facd0)
    /usr/local/go/src/runtime/panic.go:458 +0x243
reflect.Value.Slice(0x2e5660, 0xc42044ff88, 0x197, 0x0, 0x17, 0x197, 0x188, 0xc4204737e8)
    /usr/local/go/src/reflect/value.go:1554 +0x20c
github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew.(*dumpState).dumpSlice(0xc420473a50, 0x2e5660, 0xc42044ff20, 0x97)
    /Users/bep/go/src/github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew/dump.go:198 +0x78a
github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew.(*dumpState).dump(0xc420473a50, 0x2e5660, 0xc42044ff20, 0x97)
    /Users/bep/go/src/github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew/dump.go:354 +0xf04
github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew.fdump(0x4bece0, 0x4a6880, 0xc420460230, 0xc420473be0, 0x1, 0x1)
    /Users/bep/go/src/github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew/dump.go:467 +0x142
github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew.Sdump(0xc420473be0, 0x1, 0x1, 0x2e5660, 0x1)
    /Users/bep/go/src/github.com/stretchr/testify/vendor/github.com/davecgh/go-spew/spew/dump.go:482 +0x9b
github.com/stretchr/testify/assert.diff(0x2e5660, 0xc42044ff20, 0x2e5660, 0xc42044ff40, 0xc420473c00, 0x123b5)
    /Users/bep/go/src/github.com/stretchr/testify/assert/assertions.go:990 +0x1bd
github.com/stretchr/testify/assert.Equal(0x4a77c0, 0xc42009a9c0, 0x2e5660, 0xc42044ff20, 0x2e5660, 0xc42044ff40, 0x0, 0x0, 0x0, 0x40000000)
    /Users/bep/go/src/github.com/stretchr/testify/assert/assertions.go:264 +0xac
github.com/asdine/storm.TestSave.func1(0xc4200ed180, 0x384f30, 0xc4200ed180)
    /Users/bep/go/src/github.com/asdine/storm/save_test.go:63 +0x46f
github.com/boltdb/bolt.(*DB).View(0xc4200f6000, 0xc420473f38, 0x0, 0x0)
    /Users/bep/go/src/github.com/boltdb/bolt/db.go:626 +0xb5
github.com/asdine/storm.TestSave(0xc42009a9c0)
    /Users/bep/go/src/github.com/asdine/storm/save_test.go:65 +0x850
testing.tRunner(0xc42009a9c0, 0x384e40)
    /usr/local/go/src/testing/testing.go:610 +0x81
created by testing.(*T).Run
    /usr/local/go/src/testing/testing.go:646 +0x2ec

@asdine
Copy link
Owner Author

asdine commented Aug 26, 2016

Yep, done here: 7aa972a on the branch getall

The failed test is because it assumes that the codec is gob, and tries to decode it with the gob.Codec

Here are the results on my current holliday laptop (a ultra slow 200$ Asus C200 chromebook):

go test -bench=. -benchmem -run=NONE 
BenchmarkFind100WithIndexJSON-2             1000       2027367 ns/op       53664 B/op        739 allocs/op
BenchmarkFind100WithIndexGOB-2               100      13824064 ns/op      854615 B/op      21739 allocs/op
BenchmarkFind100WithoutIndexJSON-2           300       4436821 ns/op      109547 B/op       1724 allocs/op
BenchmarkFind100WithoutIndexGOB-2             50      27048721 ns/op     1711408 B/op      43725 allocs/op
BenchmarkIndex-2                          100000         13745 ns/op        1280 B/op         30 allocs/op
BenchmarkGetAll100JSON-2                    5000        268879 ns/op       27817 B/op        307 allocs/op
BenchmarkGetAll100GOB-2                     2000       1130672 ns/op       93236 B/op       1807 allocs/op
BenchmarkGetAll100WithIndexJSON-2           1000       1945721 ns/op       44464 B/op        741 allocs/op
BenchmarkGetAll100WithIndexGOB-2             100      13479462 ns/op      845402 B/op      21741 allocs/op
BenchmarkOneWithIndexJSON-2                50000         37817 ns/op        1688 B/op         39 allocs/op
BenchmarkOneWithIndexGOB-2                 10000        156288 ns/op        9689 B/op        249 allocs/op
BenchmarkOneWithoutIndexJSON-2             20000         73354 ns/op        1888 B/op         41 allocs/op
BenchmarkOneWithoutIndexGOB-2                100      13393258 ns/op      843190 B/op      21914 allocs/op
BenchmarkSaveJSON-2                         1000       2030083 ns/op       15095 B/op        211 allocs/op
BenchmarkSaveGOB-2                           500       2050051 ns/op       16857 B/op        245 allocs/op
PASS
ok      github.com/asdine/storm 43.793s

It seems that the JSON codec is much faster with Storm because gob seems optimized for streams

@asdine
Copy link
Owner Author

asdine commented Aug 28, 2016

As @bep mentionned, i think we should set the default codec to JSON, it will increase dramatically the read speed and the memory footprint.

The downside of using JSON is that the records are no longer ordered in a "natural" way:
For example if an index has the following number values:

  • 1, 2, 3, 4, ...10
    They will be ordered like this
  • "1", "10", "2", "3", ...

So a function like AllByIndex will not work as expected, so maybe we should sort results after fetching the indexes... I don't know.

Also, i think we can drop (at least for now) GetAll and Index

@bep
Copy link
Collaborator

bep commented Aug 28, 2016

So a function like AllByIndex will not work as expected, so maybe we should sort results after fetching the indexes... I don't know.

Sort by default? I don't think so. I don't think any of the big vendors in the SQL world is doing that. In SQL, if you want a guaranteed order, you'll have to add an order by clause.

@asdine
Copy link
Owner Author

asdine commented Aug 29, 2016

Yeah you are right. AllByIndex is expected to return sorted results so this one will be sorted.
But for the other functions there will be no guaranteed order. Also we should add a storm.OrderBy(field string) but that would be for another issue.

I think we can close this one after changing the default codec and remove the getall branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants