Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

35 Added the support for FILTERKEYS #84

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MayukhSobo
Copy link

@MayukhSobo MayukhSobo commented Nov 2, 2022

Fixes #35

It seems I have added too much but there is a reason why something so trivial should take this much time. Apologies for such an extreme level of changes and please allow me to explain it.

  1. FILTERKEYS is supposed to work on multiple keys and keeping in mind that we are not using any special data structure to store the keys in any ORDERED manner and the fact the regex can be complete random, the best time complexity that we can achieve is O(n) via iterating over all the keys.
  2. This leaves me with no choice but either implement the FILTERKEYS trivially or use sheer brute force of concurrency to optimise the performance.
  3. I choose the second option. This also goes with the thought that @arpitbbhayani had to make individual commands as concurrent but not the multiple command just to keep it simple for now.
  4. With the introduction of go routine worker pool, I had to introduce synchronisation issues inevitably. To fight this there are two ways. Either I use normal golang maps with Mutex or use sync.Map from go1.9.
  5. Go has a weird property. The map is not thread safe. Even if multiple goroutines are working on individual keys. It is not even thread safe if we are reading on keys A and B from one goroutine and writing on C and D from another goroutines. That's weird. Maps are only safe iff we initialise the map before all the go routine starts and they don't change before reading ends.
  6. Here this is not the case because we are also deleting the expired keys while preforming read keys for example. This means the read in this scenario is also not safe.
  7. Implementing mutex along with map is good but cumbersome. People usually use sync.Mutex but there is also sync.RWMutex. Nevertheless, performance of map with mutex doesn't scale well in my experience and has a lot of implementation blockers.
  8. sync.Map was introduced keeping in mind this issue in go 1.9. It scales EXCEPTIONALLY better if the vertical scaling of the system is considerable.
  9. sync.Map has a problem. We can't use len function on it. So I had to create my own custom Storage(DiceKVStore) with an atomic count in it.
  10. Because I had to introduce sync.Map and goroutine pool (can explain why not simple go routine if needed), I was mistakenly introducing a lot of cyclic dependency in project. In simpler word, package A is importing package B and package B is importing package A. This reminded me that the current design is very tightly coupled. So I had to introduce a lot of packages.

I know I have introduced a lot of changes and I also know huge changes in a single PR is a bad practise but I couldn't stay away from optimising it. It took me a lot of time to rewire the whole thing. I would request the reviewers to review the code and give pointers. If you guys decide to merge this, I shall be ready to work along with other contributors to adopt change that happened in between. If not, you can put it in a separate experimental branch or else worst option, I can continue it on my fork.

Thanks
Mayukh

Comment on lines +13 to +14
GetCount() uint64
GetStorage() *sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename it to Count and Storage ... more go-like. and as discussed, the Storage should return an iterator - a function that upon invocation returns the next KV pair.

Comment on lines +1 to +4
package istorageengines

// This is for future usage to use Redis Storage
// as the main storage engine instead of KV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this file.

Comment on lines +1 to +2
package istorageengines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name of the package should be storage_engines

"github.com/dicedb/dice/object"
)

type IKVStorage interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the interface and keep it StorageEngine

@@ -6,28 +6,31 @@ import (
"os"
"strings"

dbEngine "github.com/dicedb/dice/IStorageEngines"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbEngine -> storage_engine

"github.com/dicedb/dice/config"
"github.com/dicedb/dice/object"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of creating another package for object see if we can fit it in storage_engine or core

@@ -26,13 +28,13 @@ func (c *Client) TxnBegin() {
c.isTxn = true
}

func (c *Client) TxnExec() []byte {
func (c *Client) TxnExec(dh *handlers.DiceKVstoreHandler) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling it handler, let's name our storage engine and use that here.

I propose the name - ozone

Comment on lines +1 to +30
package expiry

import (
dbEngine "github.com/dicedb/dice/IStorageEngines"
"github.com/dicedb/dice/object"
)

func expireSample(dh dbEngine.IKVStorage) float32 {
var limit int = 20
var expiredCount int = 0
dh.GetStorage().Range(func(k, v interface{}) bool {
key := k.(string)
value := v.(*object.Obj)
// fmt.Printf("The key is : %v and the value is %v\n", key, *value)
limit--
if object.GetDiceExpiryStore().HasExpired(value) {
dh.Del(key)
expiredCount++
}
if limit == 0 {
return true
}
return true
})
return float32(expiredCount) / float32(limit)
}

// Deletes all the expired keys - the active way
// Sampling approach: https://redis.io/commands/expire/
func DeleteExpiredKeys(dh dbEngine.IKVStorage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let expiry be part of the storage engine implementation and expose it as an interface for an explicit trigger.

Comment on lines +14 to +16
type DiceKVstoreHandler struct {
object.DiceKVstore
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename it to our storage engine ozone

Comment on lines +15 to +23
// Max number of workers 256 * 1024
DefaultPoolSize = 1 << 18

// If we should wait when pool workers ain't available
// false make it won't wait
Nonblocking = false

// ExpiryDuration is the interval time to clean up those expired workers.
ExpiryDuration = 10 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultPoolSize -> MaxConcurrency and let the default value be part of the config.
Expiry duration can also be part of the config. Nonblocking should always be true. Don't see a reason for it to be configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for command FILTERKEYS
2 participants