Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

Add MmapFlags option for MAP_POPULATE (unix) #455

Merged
merged 1 commit into from Nov 12, 2015
Merged

Add MmapFlags option for MAP_POPULATE (unix) #455

merged 1 commit into from Nov 12, 2015

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Nov 6, 2015

/cc @xiang90

This adds MmapFlags to set syscall.MAP_POPULATE in Linux 2.6.23+.
If true, it would do the sequential read-ahead, as discussed in [1].

Xiang helped me on this PR. I benchmarked the READ performance and with MAP_POPULATE flag,
its read performance increased. I did:

  1. Generated random data of 2GB.
  2. Clean up page cache echo "echo 1 > /proc/sys/vm/drop_caches" | sudo sh
  3. Read with and without MAP_POPULATE and compare two results [2].

Results - SSD, Ubuntu, Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz, go version go1.5.1 linux/amd64, Linux kernel version: 3.16.0-52-generic

For 2GB, read takes 8 seconds without MAP_POPULATE, 4 seconds WITH MAP_POPULATE.


- HDD, 1 vCPU, 3.75 GB Memory, Google Cloud, Intel(R) Xeon(R) CPU @ 2.30GHz, go version go1.5.1 linux/amd64, Linux kernel version: 3.16.0-0.bpo.4-amd64

For 2GB, read takes 2 minutes 10 seconds seconds without MAP_POPULATE, 8 seconds WITH MAP_POPULATE.


And I added `MAP_POPULATE` and WITHOUT cleaning up the page cache in a separate machine. This takes about the same as the first test case where we cleaned up page cache.

[1]:

etcd-io/etcd#3786

[2]:

package main

import (
    "fmt"
    "time"
    "syscall"

    "github.com/boltdb/bolt"
)

const (
    dbPath     = "fake.db"
    bucketName = "fake-bucket"
    writable   = false
)

func main() {
    st := time.Now()
    // Open the dbPath data file in your current directory.
    // It will be created if it doesn't exist.
        // db, err := bolt.Open(dbPath, 0600, &bolt.Options{Timeout: 5 * time.Minute, ReadOnly: true})
    db, err := bolt.Open(dbPath, 0600, &bolt.Options{Timeout: 5 * time.Minute, ReadOnly: true, MmapFlags: syscall.MAP_POPULATE})
    if err != nil {
        panic(err)
    }
    defer db.Close()

    tx, err := db.Begin(writable)
    if err != nil {
        panic(err)
    }
    defer tx.Rollback()

    bk := tx.Bucket([]byte(bucketName))
    c := bk.Cursor()

    for k, v := c.First(); k != nil; k, v = c.Next() {
        // fmt.Printf("%s ---> %s.\n", k, v)
        _ = k
        _ = v
    }
    fmt.Println("took:", time.Since(st))
}

@xiang90
Copy link
Contributor

xiang90 commented Nov 6, 2015

@gyuho A more meaningful comparison is to compare readahead enabled with cached read.

@@ -63,6 +63,9 @@ type DB struct {
// https://github.com/boltdb/bolt/issues/284
NoGrowSync bool

// When true, Linux 2.6.23+ sets MAP_POPULATE for sequential read-ahead.
MmapFlags bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a bool but a int. So people can set the flag directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then in the comment we can mention that if you want to read entire database fast, you can set MmapFlag to syscall.MAP_POPULATE on linux.

@benbjohnson
Copy link
Member

Overall, lgtm. Just some minor clean up with the OR. The perf bump for spinning disks looks great!

@gyuho
Copy link
Contributor Author

gyuho commented Nov 8, 2015

Thanks! I will clean things up tonight.

@gyuho
Copy link
Contributor Author

gyuho commented Nov 9, 2015

@benbjohnson Just updated as you suggest, and ran the same performance tests, for sample 2GB data:

WITHOUT read-ahead: took: 14.328380492s
WITH read-ahead:    took: 3.130086719s

Thanks,

@@ -63,6 +63,11 @@ type DB struct {
// https://github.com/boltdb/bolt/issues/284
NoGrowSync bool

// If you want to read the entire database fast, you can set MmapFlag to
// syscall.MAP_POPULATE on Linux 2.6.23+ for sequential read-ahead.
// https://github.com/coreos/etcd/issues/3786
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the issue reference from etcd. It does not make a lot of sense to others.

This adds MmapFlags to DB.Options in case we need syscall.MAP_POPULATE
flag in Linux 2.6.23+ to do the sequential read-ahead, as discussed in [1].

---

[1]: etcd-io/etcd#3786
@gyuho
Copy link
Contributor Author

gyuho commented Nov 9, 2015

@xiang90 Just removed it from the db.go. Thanks

@benbjohnson
Copy link
Member

👍 lgtm, thanks for all the tweaks and the benchmarks.

benbjohnson added a commit that referenced this pull request Nov 12, 2015
Add MmapFlags option for MAP_POPULATE (unix)
@benbjohnson benbjohnson merged commit 0b00eff into boltdb:master Nov 12, 2015
@gyuho
Copy link
Contributor Author

gyuho commented Nov 12, 2015

Thanks @benbjohnson !

gyuho added a commit to gyuho/etcd that referenced this pull request Nov 12, 2015
gyuho added a commit to gyuho/etcd that referenced this pull request Nov 12, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants