Skip to content

Latest commit

 

History

History
150 lines (96 loc) · 8.1 KB

set-design-en.md

File metadata and controls

150 lines (96 loc) · 8.1 KB

Proposal for Titan-Set

Introduction

The Set type in Redis is a collection of unordered data. Like an instance of List type, operations like adding elements, deleting elements or checking if some element is existed and so on can be easily done with a Set instance.

Furthermore, here we list the main difference between the List and Set.

  • All the elements in one set are different from each other. If an existed element is added for multiple times, then this element will be continuously updated with its duplications. Actually, it is quite easy to manage just by updating the last modification time.
  • Another important property of Set is that some aggregation operations within several sets as SUNION, SINTER and SDIFF can be accomplished on the server side which saves the heavy overhead of Network I/O.

Applications

Functions of calculating intersections, unions and difference sets are designated for those situations in which digging for common friends, common interest, classifying labels and so on are the basic task.

Design

Struct Set

type Set struct { meta *SetMeta key [ ]byte exist bool txn *Transaction }

Struct SetMeta

type SetMeta struct { Object Len int64 }

Key Points

  • The present implementation is still designed to use Len to keep the count of the elements within a set and we do not use the slot in hash. [Read More] Implementation of hash-slot
  • ==The characteristics of set means that we don't need to store the value in the kv-storage. So for the case of storing data in Tikv which the Set interface will be called, we only need to save the concatenated member in the key of tikv.==
  • The format of MetaKey is: {Namespace}:{DBID}:M:{key}
  • The format of DataKey is: {Namespace}:{DBID}:D:{ObjectID}
  • The format of member which will be stored in the key of tikv: {Namespace}:{DBID}:D:{ObjectID}:{member}
  • The format of value in tikv (for an empty value is not permitted in tikv and we use a placeholder instead): []byte{0}

Command processing

General command processing

SAdd key member [member ...]

  • Add the specified members to the set stored at key. Specified members that are already a member of this set are ignored. If key does not exist, a new set is created before adding the specified members.

Implementation steps

  • Check if the member passed in is already in the set and remove the duplication if there is any.
  • Call BatchGetValues to get values corresponding to the datakey in a batch manner and judge the existance of the member by the value type.
  • Eliminate the existed members and count the number of newly-added members.
  • Update Meta information and return the number of new members

SMembers key

  • Returns all the members of the set value stored at key.

Implementation steps

  • Use an iterator to find the location of the spliced prefix in the storage
  • Return all the elements with the same prefix

SCard key

  • Returns the set cardinality (number of elements) of the set stored at key.

Implementation steps

  • Return the Len property of the meta data

SIsmember key member

  • Returns if member is a member of the set stored at key.

Implementation steps

  • Try looking for the spliced datakey
  • If exists, return 1 or return 0

SPop key

  • Removes and returns one or more random elements from the set value store at key.

Implementation steps

  • Delete the first element of the set given by the key
  • Update meta information and return the deleted member

SRem key member [member ...]

  • Remove the specified members from the set stored at key.

Implementation steps

  • Try looking for the spliced datakey
  • If exists, call delete to delete it/them.
  • update meta information

SMove key key1 member

  • Move member from the set at source to the set at destination.

Implementation steps

  • First, check if the member is an element of the source set given by the key . If not, return.
  • Next, check if the member is an element of the target set given by key1. If not, add the member to the key1 target set and update the meta information.
  • Lastly, delete the member of the key source set and update the meta information.

Set-based command processing

For the set-class command, the most intuitive scheme to calculating intersections, unions and difference sets is to read all members into the memory first. Although the performance can be partially improved by using map to do the deduplicating and sorting process, it still meets with the problem of insufficient memory in case of large sets with so many elements. Another idea is based on the merging thought for each member in the set is saved in order in the memory. The detailed implementation is as follows.

SUion -- Returns the members of the set resulting from the union of all the given sets.

Implementation steps
  1. Set a pointer (iter) for each set with a specific key. The pointer is initialized to the first member of the key set. Since members are stored in order, the first member must be the smallest one in the current set.

  2. Compare the value of each member. According to the comparison result, following cases are managed.

    • If the values of the members are the same, it proves they are the same element, then record this member as a part of the union.
    • If the values are different, move the pointer of the smallest element one step backwards and record this member as a part of union.
  3. Repeat step 2 until all the members of each set finish with seeking. A special case is that there is only one set left in the end. Then just merge the remaining elements of the set into the union and finally, return the union.

SInter -- Returns the members of the set resulting from the intersection of all the given sets.

Implementation steps
  1. Read meta information while constructing a set object. If the set is not existed, regard it as an empty set. Once an empty set is found, stop calculating as the empty set is obviously the intersection.
  2. If no empty set exists, a pointer (iter) is set for each set with a specific key. The pointer is initialized to the first member of the key set. Since members are stored in order, the first member must be the smallest one in the current set.
  3. Compare the value of each member. According to the comparison result, following cases are managed.
    • If the values of the members are the same, it proves they are the same element, then record this member as a part of the intersection. Move all pointers(one for each set) one step backwards.
    • If the values are different, it proves smaller element can't be in the members of other sets. So move all the pointers one step backwards except for the set with the largest member.
  4. Repeat step 2 until some pointer exceeds the sequence. It proves that the set with the least members has already finished with seeking. It's time to exit the process and return the intersection result.

SDiff key [key]... -- Returns the members of the set resulting from the difference between the first set and all the successive sets.

Implementation steps
  1. Set a pointer (iter) for each set with a specific key. The pointer is initialized to the first member of the key set. Since members are stored in order, the first member must be the smallest one in the current set.
  2. First key denotes the benchmark set. So compare other sets(specified by [key ...]) with the first key set. The following cases are handled.
  • If the members are with the same value, then the member can't be a part of the difference set. Move both the pointers of benchmark key and other keys one step backwards.
  • If the two values of members are different, then move the pointer of the set with the smallest member one step backwards and record the current pointed member of the benchmark key set as a part of the difference set.
  1. Repeat step 2 until all the members of each set finish with seeking and return the difference set.