Skip to content
This repository has been archived by the owner on Jul 14, 2019. It is now read-only.

Log Structured Hash Table (LSHT) #9

Merged
merged 1 commit into from May 17, 2017
Merged

Log Structured Hash Table (LSHT) #9

merged 1 commit into from May 17, 2017

Conversation

ditmarus
Copy link
Contributor

Adding in-memory option and bug fixes based on initial @lioramr work.

@ditmarus ditmarus self-assigned this May 15, 2017
@@ -101,7 +101,7 @@ $(DEP): $(DIRS)

../ribified/%: $(RIBIFY_DIR)
@echo " (RIBIFY) $(@:../ribified/%=%) [ $@ $(RIBIFYFLAGS) ]"
@objcopy $(shell find $(RIBIFY_LIB_PATH) /usr/lib -name $(@:../ribified/%=%) 2>/dev/null) $@ $(RIBIFYFLAGS)
@objcopy $(shell find $(RIBIFY_LIB_PATH) /usr/lib64 /usr/lib -name $(@:../ribified/%=%) 2>/dev/null) $@ $(RIBIFYFLAGS)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does /usr/lib point to /usr/lib64 or is /usr/lib 32bit the home for libraries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some linux systems some 64-bit libraries can be installed in lib64. I was facing issues building ribs2 on CentOS so I added it to the ribidied libs path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That makes sense.

@eliotpearson
Copy link

Looks good to me.

@niryeffet
Copy link

http_server.c: In function ‘http_server_generate_dir_list’:
http_server.c:820:9: error: ‘readdir_r’ is deprecated [-Werror=deprecated-declarations]

Seems like adap.tv's ribs2 is very old. I believe I already fixed these. Dmitry, please merge with Lior's or Mine.

if (lshashtable_is_expired(lsht, rec)) {
lshashtable_del_key_index(lsht, rec->data, rec->key_size, NULL);
lsht_record_store_del(rs, rec);
} else
Copy link
Contributor

@lioramr lioramr May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this logic correct? if rec->val_size == UINT_MAX and in last record store then rs_rseek(rs, _rec_size(r)) should be called.

Copy link
Contributor Author

@ditmarus ditmarus May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks ok to me. If we go in to "else break" condition it is valid unexpired record that we need to deal with so we break the while loop that deals only with expiration and decide what to do with this record in later logic (pop at line 239). So we can not rseek it at this point since we did not finish dealing with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the following scenario you may end up with stale key that can be expired.

  1. key 'k1' was inserted
  2. additional keys were inserted and key 'k1' was relocated to the second file
  3. key 'k1' was deleted therefore there is a special record in the first file to indicated that
  4. more keys were inserted causing 'k1' to be deleted and special 'k1' to be relocated to the second file

There is no logic to remove 'k1' special in that scenario although it is safe to remove deleted keys (special keys) if they are in the last file. 'k1' special was inserted after 'k1' so when reaching 'k1' special in the last file the original 'k1' is already gone from the system. The purpose of keeping 'k1' special until is reaches the last file is to correctly handle restarts where the in memory index and the bitmap are re-generated from the log (records).

Copy link
Contributor Author

@ditmarus ditmarus May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this case is handled by the outer "if" (line 222). If it is special record and it is last store it will NOT go to this "if" and will go to line 230 to rseek this record. There is a comment there that is bit confusing since it reflects the actually omitted "else" case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it looks good then.

size += rec_size;
} else {/* evict or delete LRU */
struct lshashtable_rec *rec = (struct lshashtable_rec *)r->data;
if (lshashtable_is_expired(lsht, rec)) {
Copy link
Contributor Author

@ditmarus ditmarus May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lioramr: I was wondering if we need to check expiry again here. I understand that it can happen between line 224 and 246 but we can always deal with this very unlikely case next time we do make room cleanup...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find a reason for that extra condition, you should probably remove it.

abort();
}
struct lsht_record_store_rec *r = lsht_record_store_pop_oldest_rec(rs);
if (lsht_record_store_num_free_blocks(rs) > num_free_blocks_threshold) { /* relocate */
Copy link
Contributor Author

@ditmarus ditmarus May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lioramr: I am not sure what is the benefit of relocating records here. Why not just always evicting them? I understand that evicting is more expensive but does it really worth relocating them for this rare and special case (avail < size && free > threshold)? Seems like bit over-optimization... but my main concern is that it can break LRU approach since we may be moving old records behind the new once...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If records are always moved to the next file the number of files will continue to grow. If there is enough space in the current file there is no need to move records to the next file. Consider the following scenario:

  1. start with fresh system, no keys
  2. insert 2 keys: k1, k2
  3. update k1 many times

Without the logic above, k2 will "travel" to the next file even though the system has total of 2 keys, causing the system to allocate huge file for no good reason.

Copy link
Contributor Author

@ditmarus ditmarus May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good example. It means that k2 record is not frequently updated relatively to others and its expiration time is too long for this store full cycle so we better keep it out of the way in the next level store otherwise we will stumble over it all the time. On practice it is likely that we would have more records like that from expiration time and access pattern perspective so the store will not be wasted. I think it would be more consistent approach and will keep LRU order, simplify the code and need less computing cycles work (record moves and updates to the index that more frequent in stores with smaller index). It is probably ok to keep it like that for disk base store but for memory based case I think I will add code to avoid relocation. For this PR we can keep it as it is for both cases.

@lioramr lioramr merged commit 210db70 into master May 17, 2017
@carenas carenas deleted the lsht branch September 17, 2017 22:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants