Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator to iterate over keys #22

Merged
merged 19 commits into from
Nov 2, 2016
Merged

Add Iterator to iterate over keys #22

merged 19 commits into from
Nov 2, 2016

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Oct 12, 2016

No description provided.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 98.773% when pulling e2823d8 on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 98.784% when pulling 5b5116b on wendigo:master into f403062 on allegro:master.

@wendigo wendigo changed the title Add Keys() function to iterate over keys Add EntriesMatching() function to iterate over keys Oct 12, 2016
type EntryInfoFilter func(EntryInfo) bool

// EntryInfoIterator allows to iterate over entries in the cache
type EntryInfoIterator chan EntryInfo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this iterator to be as non-blocking as possible

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With channel you add blocking. Take a look at loops you created. You are adding mutex for reading from map and then pushing value to channel which is blocking. If you remove channel and instead return closure to function that returns entry form shard i on position index then you don't need to copy all (1024) elements and this will be a lazy iterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel is buffered

@wendigo wendigo changed the title Add EntriesMatching() function to iterate over keys Add Iterator to iterate over keys Oct 14, 2016
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 98.567% when pulling ce30cf9 on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 98.567% when pulling f0c671f on wendigo:master into f403062 on allegro:master.

cache *BigCache
currentShard int
currentIndex int32
elements []*EntryInfo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just slice of indexes []uin32. We don't need all data during iteration.

shard.lock.RLock()
defer shard.lock.RUnlock()

elements := make([]*EntryInfo, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to estimate initial size for elements as len(shard.queue)/cache.config.MaxEntrySize

for iterator.Next() {
current, err := iterator.Value()

if err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test when error occures: modify map when it terates, delete element under Value()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also needs bench test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark - done.


if entry, err := it.cache.shards[it.currentShard].entries.Get(int(current.index)); err != nil {
return nil, fmt.Errorf("Could not retrieve entry from cache")
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (golint)

@@ -33,6 +33,79 @@ type cacheShard struct {
onRemove func(wrappedEntry []byte)
}

// EntryInfo holds informations about entry in the cache
type EntryInfo struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not contain entry value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to keep encoded entry instead of parsed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason is simple: entryinfo is lazy - you will only parse what you need instead of everything

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it but it's lazy every time it's called instead of loading values on a first call. I think parsing data should not be much slower then copying them but it should be checked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So whats whit this? I did a small benchmark and gready extraction is a little bit slower than copy.

diff --git a/bigcache.go b/bigcache.go
index d610216..f15c1de 100644
--- a/bigcache.go
+++ b/bigcache.go
@@ -35,27 +35,30 @@ type cacheShard struct {

 // EntryInfo holds informations about entry in the cache
 type EntryInfo struct {
-       entry []byte
+       timestamp uint64
+       hash      uint64
+       key       string
+       value     []byte
 }

 // Key returns entry's underlying key
 func (e EntryInfo) Key() string {
-       return readKeyFromEntry(e.entry)
+       return e.key
 }

 // Hash returns entry's hash value
 func (e EntryInfo) Hash() uint64 {
-       return readHashFromEntry(e.entry)
+       return e.hash
 }

 // Timestamp returns entry's timestamp (time of insertion)
 func (e EntryInfo) Timestamp() uint64 {
-       return readTimestampFromEntry(e.entry)
+       return e.timestamp
 }

 // Value returns entry's underlying value
 func (e EntryInfo) Value() []byte {
-       return readEntry(e.entry)
+       return e.value
 }

 // EntryInfoIterator allows to iterate over entries in the cache
@@ -140,11 +143,11 @@ func (it *EntryInfoIterator) Value() (EntryInfo, error) {
                return EntryInfo{}, fmt.Errorf("Could not retrieve entry from cache")
        }

-       var dst = make([]byte, len(entry))
-       copy(dst, entry)
-
        return EntryInfo{
-               entry: dst,
+               timestamp: readTimestampFromEntry(entry),
+               hash:      readHashFromEntry(entry),
+               key:       readKeyFromEntry(entry),
+               value:     readEntry(entry),
        }, nil
 }
$go test ./... -bench=BenchmarkIterateOverCache -run=NONE -count 3 -cpu 1,6 -benchmem > new.txt && git stash && go test ./... -bench=BenchmarkIterateOverCache -run=NONE -count 3 -cpu 1,6 benchmem > old.txt 
$ benchcmp old.txt new.txt                                                                                                                                                                                     
benchmark                                                    old ns/op     new ns/op     delta
BenchmarkIterateOverCache/512-shards        224           224           +0.00%
BenchmarkIterateOverCache/512-shards        217           223           +2.76%
BenchmarkIterateOverCache/512-shards        218           248           +13.76%
BenchmarkIterateOverCache/512-shards-6      273           293           +7.33%
BenchmarkIterateOverCache/512-shards-6      273           284           +4.03%
BenchmarkIterateOverCache/512-shards-6      274           284           +3.65%
BenchmarkIterateOverCache/1024-shards       214           216           +0.93%
BenchmarkIterateOverCache/1024-shards       214           215           +0.47%
BenchmarkIterateOverCache/1024-shards       215           218           +1.40%
BenchmarkIterateOverCache/1024-shards-6     268           279           +4.10%
BenchmarkIterateOverCache/1024-shards-6     271           278           +2.58%
BenchmarkIterateOverCache/1024-shards-6     268           281           +4.85%
BenchmarkIterateOverCache/8192-shards       209           210           +0.48%
BenchmarkIterateOverCache/8192-shards       207           210           +1.45%
BenchmarkIterateOverCache/8192-shards       207           211           +1.93%
BenchmarkIterateOverCache/8192-shards-6     265           276           +4.15%
BenchmarkIterateOverCache/8192-shards-6     266           275           +3.38%
BenchmarkIterateOverCache/8192-shards-6     263           279           +6.08%

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done, it saves some allocs (20B/op vs 32B/op). Times are the a little smaller.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 98.873% when pulling 50a2570 on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.2%) to 96.458% when pulling 4e08def on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 98.91% when pulling 1784e63 on wendigo:master into f403062 on allegro:master.

Mateusz Gajewski and others added 2 commits October 14, 2016 21:16
@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 98.91% when pulling b037a2a on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 98.928% when pulling edc1ec7 on wendigo:master into f403062 on allegro:master.

}

// Value returns current value from the iterator
func (it *EntryInfoIterator) Value() (*EntryInfo, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return value not a pointer to make it obvious that we return copy of underlying value.


// partition is empty - move to next
if len(it.elements) == 0 {
goto incrementShard
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use for instead of if above the label

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? :) I don't understand the pros

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean change if in line 90 (above the label) and remove this condition. You are implementing for loop with if and goto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's exactly what I'm doing and code is readable for me. Goto is a standard jmp in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind - I rewrite this part of iterator

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use for loop and extract condition to well named function so the comment will be redundant.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 98.925% when pulling 60e3989 on wendigo:master into f403062 on allegro:master.

func (it *EntryInfoIterator) Value() (EntryInfo, error) {
it.Lock()
defer it.Unlock()
current := it.elements[it.currentIndex]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check if current index is valid.


for i := it.currentShard + 1; i < it.cache.config.Shards; i++ {
it.currentShard, it.currentIndex, it.elements = i, 0, copyCurrentShardMap(it.cache.shards[i])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


it.currentIndex++

if it.currentIndex >= len(it.elements) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reverting condition. Just for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean: if !(len(it.elements) > it.currentIndex) ?

return false
}

for i := it.currentShard + 1; i < it.cache.config.Shards; i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if all shards are empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will iterate to the end and return false

return elements
}

// Next returns true if there is next element in the iterator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes only half of the job that Next() do becouse it also moves iterator to next element or invalid position if it doesn't exist. This comment better suits HasNext()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right - changed to HasNext()

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 98.95% when pulling 183ce8d on wendigo:master into f403062 on allegro:master.

Copy link
Collaborator

@janisz janisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost ready. Only small fixes need to be done. We need to decide how EntryInfo should look like. After introducing it it will be recommended to us it as a default type instead of []byte.
@druminski @medzin Can you take a look at this?

it.valid = false
it.currentIndex++

if !(len(it.elements) > it.currentIndex) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was thinking about early return true so the code below doesn't need to be indent.

if len(it.elements) > it.currentIndex {
   it.valid = true;
    return true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 it also simplifies logical statement as negation mark is not necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :) Renamed HasNext to SetNext also

assert.Equal(t, current.Key(), "key")
assert.Equal(t, current.Hash(), uint64(0x3dc94a19365b10ec))
assert.Equal(t, current.Value(), []byte("value"))
assert.Equal(t, current.Timestamp(), uint64(0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters order mismatch func Equal(t TestingT, expected, actual interface{}, msgAndArgs ...interface{}) bool. Check with other places too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -33,6 +33,79 @@ type cacheShard struct {
onRemove func(wrappedEntry []byte)
}

// EntryInfo holds informations about entry in the cache
type EntryInfo struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So whats whit this? I did a small benchmark and gready extraction is a little bit slower than copy.

diff --git a/bigcache.go b/bigcache.go
index d610216..f15c1de 100644
--- a/bigcache.go
+++ b/bigcache.go
@@ -35,27 +35,30 @@ type cacheShard struct {

 // EntryInfo holds informations about entry in the cache
 type EntryInfo struct {
-       entry []byte
+       timestamp uint64
+       hash      uint64
+       key       string
+       value     []byte
 }

 // Key returns entry's underlying key
 func (e EntryInfo) Key() string {
-       return readKeyFromEntry(e.entry)
+       return e.key
 }

 // Hash returns entry's hash value
 func (e EntryInfo) Hash() uint64 {
-       return readHashFromEntry(e.entry)
+       return e.hash
 }

 // Timestamp returns entry's timestamp (time of insertion)
 func (e EntryInfo) Timestamp() uint64 {
-       return readTimestampFromEntry(e.entry)
+       return e.timestamp
 }

 // Value returns entry's underlying value
 func (e EntryInfo) Value() []byte {
-       return readEntry(e.entry)
+       return e.value
 }

 // EntryInfoIterator allows to iterate over entries in the cache
@@ -140,11 +143,11 @@ func (it *EntryInfoIterator) Value() (EntryInfo, error) {
                return EntryInfo{}, fmt.Errorf("Could not retrieve entry from cache")
        }

-       var dst = make([]byte, len(entry))
-       copy(dst, entry)
-
        return EntryInfo{
-               entry: dst,
+               timestamp: readTimestampFromEntry(entry),
+               hash:      readHashFromEntry(entry),
+               key:       readKeyFromEntry(entry),
+               value:     readEntry(entry),
        }, nil
 }
$go test ./... -bench=BenchmarkIterateOverCache -run=NONE -count 3 -cpu 1,6 -benchmem > new.txt && git stash && go test ./... -bench=BenchmarkIterateOverCache -run=NONE -count 3 -cpu 1,6 benchmem > old.txt 
$ benchcmp old.txt new.txt                                                                                                                                                                                     
benchmark                                                    old ns/op     new ns/op     delta
BenchmarkIterateOverCache/512-shards        224           224           +0.00%
BenchmarkIterateOverCache/512-shards        217           223           +2.76%
BenchmarkIterateOverCache/512-shards        218           248           +13.76%
BenchmarkIterateOverCache/512-shards-6      273           293           +7.33%
BenchmarkIterateOverCache/512-shards-6      273           284           +4.03%
BenchmarkIterateOverCache/512-shards-6      274           284           +3.65%
BenchmarkIterateOverCache/1024-shards       214           216           +0.93%
BenchmarkIterateOverCache/1024-shards       214           215           +0.47%
BenchmarkIterateOverCache/1024-shards       215           218           +1.40%
BenchmarkIterateOverCache/1024-shards-6     268           279           +4.10%
BenchmarkIterateOverCache/1024-shards-6     271           278           +2.58%
BenchmarkIterateOverCache/1024-shards-6     268           281           +4.85%
BenchmarkIterateOverCache/8192-shards       209           210           +0.48%
BenchmarkIterateOverCache/8192-shards       207           210           +1.45%
BenchmarkIterateOverCache/8192-shards       207           211           +1.93%
BenchmarkIterateOverCache/8192-shards-6     265           276           +4.15%
BenchmarkIterateOverCache/8192-shards-6     266           275           +3.38%
BenchmarkIterateOverCache/8192-shards-6     263           279           +6.08%

it.valid = false
it.currentIndex++

if !(len(it.elements) > it.currentIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 it also simplifies logical statement as negation mark is not necessary

}

// HasNext returns true if there is next element in the iterator, false otherwise
func (it *EntryInfoIterator) HasNext() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name is misleading right now as its implementation does not only answer the question whether there is another element in iterator but also modifies currentIndex value. Because of that when you call time after time HasNext() then you can get different answers. I see two options here, we can change this method name to SetNext() bool or we can implement HasNext() to be idempotent and then Value() convert to Next().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - renamed method to SetNext().

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 98.688% when pulling f007704 on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 98.705% when pulling a20e201 on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 98.705% when pulling d1439df on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 98.956% when pulling d3bfe18 on wendigo:master into f403062 on allegro:master.

@wendigo
Copy link
Contributor Author

wendigo commented Nov 2, 2016

Waiting for final thoughts @janisz

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 98.956% when pulling 216b1fe on wendigo:master into f403062 on allegro:master.


// ErrCannotRetrieveEntry is reported when entry cannot be retrieved from underlying
var ErrCannotRetrieveEntry = errors.New("Could not retrieve entry from cache")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two should not be public. Use empty struct withe Error method like we do in entry_not_found_error.go or make this private. Take a look at http://dave.cheney.net/2016/04/07/constant-errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "What with this"? :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this line but I assume you are referring to this discussion #22 (comment)

@@ -33,6 +33,79 @@ type cacheShard struct {
onRemove func(wrappedEntry []byte)
}

// EntryInfo holds informations about entry in the cache
type EntryInfo struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What with this?

@@ -17,6 +17,8 @@ func BenchmarkWriteToCacheWith1Shard(b *testing.B) {
func BenchmarkWriteToLimitedCacheWithSmallInitSizeAnd1Shard(b *testing.B) {
m := blob('a', 1024)
cache, _ := NewBigCache(Config{1, 100 * time.Second, 100, 256, false, nil, 1, nil})

b.ReportAllocs()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If applied to all tests then better use -test.benchmem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will include setup also (creating entries for reads etc)

t.Parallel()

// given
keysCount := 10000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need so many entries in unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced by 10x

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 98.172% when pulling eeaf0df on wendigo:master into f403062 on allegro:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 98.956% when pulling aa2187b on wendigo:master into f403062 on allegro:master.

@wendigo wendigo merged commit 8c40d09 into allegro:master Nov 2, 2016
flisky pushed a commit to flisky/bigcache that referenced this pull request May 7, 2020
Add Iterator to iterate over keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants