SSD cache-dictionary #8624

nikvas0 · 2020-01-12T14:37:55Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Cache-dictionaries layout for storing data on SSD.

TODO:

… parameters

nikitamikhaylov · 2020-05-28T16:20:40Z

I think it is necessary to add performance tests to benchmark it in comparison with DirectDictionary and CacheDictionary. Or just CacheDictionary in memory with SSDCacheDictionary.
I don't mean performance tests in CI, I want something like the one given here #10622

nikitamikhaylov · 2020-05-28T16:51:40Z

src/Dictionaries/BucketCache.h

+
+namespace
+{
+    inline size_t nearestPowTwo(size_t x)


It is better to use roundUpToPowerOfTwoOrZero from our codebase

nikitamikhaylov · 2020-05-28T16:57:01Z

src/Dictionaries/BucketCache.h

+
+    size_t getPosition(const size_t bucket) const
+    {
+        const size_t idx = (bucket >> 1);


Why do we need it here? bucket it already a number from [0, buckets - 1]

To determine a position in the bucket we need to read 4 bits from the array with positions, but we can address only bytes, so we need to divide bucket number by 2.

nikitamikhaylov · 2020-05-29T22:19:10Z

src/Dictionaries/SSDCacheDictionary.cpp

+    };
+
+    if (!write_buffer)
+    {


Braces are redundant according to our codestyle

nikitamikhaylov · 2020-05-29T22:28:14Z

src/Dictionaries/SSDCacheDictionary.cpp

+
+    const size_t start_block = current_file_block_id % max_size;
+    const size_t finish_block = start_block + write_buffer_size;
+    for (const auto& key : keys)


Style: for (const auto & key : keys)

nikitamikhaylov · 2020-05-29T22:34:38Z

src/Dictionaries/SSDCacheDictionary.cpp

+{
+    // add partitions to queue
+    while (partitions.size() > max_partitions_count)
+    {


nikitamikhaylov · 2020-05-29T22:50:30Z

src/Dictionaries/SSDComplexKeyCacheDictionary.cpp

+        AIOContext aio_context(1);
+
+        while (io_submit(aio_context.ctx, 1, &request_ptr) != 1)
+        {


nikitamikhaylov · 2020-05-29T22:50:38Z

src/Dictionaries/SSDComplexKeyCacheDictionary.cpp

+        }
+
+        while (io_getevents(aio_context.ctx, 1, 1, &event, nullptr) != 1)
+        {


nikitamikhaylov · 2020-05-29T23:05:02Z

src/Dictionaries/SSDComplexKeyCacheDictionary.cpp

+
+    TemporalComplexKeysPool tmp_keys_pool;
+    storage.update(
+            source_ptr,


Is it true, that we can use many sources and not only one?

nikitamikhaylov · 2020-05-30T11:45:16Z

src/Dictionaries/SSDCacheDictionary.cpp

+void SSDCachePartition::remove()
+{
+    std::unique_lock lock(rw_lock);
+    std::filesystem::remove(std::filesystem::path(path + BIN_FILE_EXT));


Error handling?

nikitamikhaylov · 2020-05-30T11:47:07Z

src/Dictionaries/SSDCacheDictionary.cpp

+    std::vector<Key> required_ids(not_found_ids.size());
+    std::transform(std::begin(not_found_ids), std::end(not_found_ids), std::begin(required_ids), [](const auto & pair) { return pair.first; });
+
+    storage.update(


Maybe better to wrap it in a separate function, which adds source_ptr and lifetime and call this function?

filimonov · 2020-06-22T09:48:55Z

BTW - it would be really interesting to see some perf number, and/or comparison with aerospike (#5629)

alexey-milovidov · 2020-06-22T12:40:17Z

@filimonov They are published in https://presentations.clickhouse.tech/hse_2020/3rd/SSD_Dictionary_full.pdf

filimonov · 2020-06-22T21:17:03Z

Just for the reference - aerospike gives up to ~1Mln req/sec

alexey-milovidov · 2020-06-22T21:20:18Z

@filimonov This number does not mean anything.
It can be too slow or too fast depending on hardware.
Nikita has tested with laptop SSD.

into nikvas0/ssd_dict

Merging #8624 (ssd-cache)

nikitamikhaylov · 2020-06-27T11:00:49Z

Merged in #11947

nikvas0 and others added 30 commits October 25, 2019 21:06

fix

c1af83d

Merge remote-tracking branch 'upstream/master' into nikvas0/ssd_dict

cb0e037

Merge remote-tracking branch 'upstream/master' into nikvas0/ssd_dict

d89a6b8

changes

443a5ca

update

3bbb73e

update, changed block -> attrs

b55d8dd

compl

5dccab3

create + refactoring

55125cd

fixed update

57d9e38

fix

f3b00e6

some fixes

2c52162

change buffer

b62ac3a

fix

dbb565f

aio read

2e10fe5

fix read

297b8aa

has and ttl

05622f2

test

ddaf23d

fix

81c9d66

read with one thread

371c3f8

test for dict with MT

4a65b1b

fix

d968bf6

remove unused attributes

ce29a3c

some refactoring

fc94ffe

locks

07ffe96

opt table

75f7508

some refactoring

2e10d93

rm unused file

15afa52

fixes

50a68c4

preallocate

0e2080c

read multipartition

ee1e8cb

nikvas0 added 11 commits May 23, 2020 19:10

fix

1b9e2df

fix

3fb0eab

fix

7358410

fix other os

c70401b

fix

e76cdbd

fix other os

63ef973

Merge remote-tracking branch 'upstream/master' into nikvas0/ssd_dict

5801a33

ya.make

3150667

fix

797fa40

fix aio for other os

d6f4c66

fixed direct tests

207de9c

achulkov2 added a commit to achulkov2/ClickHouse that referenced this pull request May 24, 2020

add parsing multiple parameters from ClickHouse#8624 & add the actual…

7144495

… parameters

Merge remote-tracking branch 'upstream/master' into nikvas0/ssd_dict

dd26661

nikitamikhaylov approved these changes May 30, 2020

View reviewed changes

alexey-milovidov force-pushed the master branch from 6c77191 to 09b9a30 Compare June 9, 2020 02:04

nikvas0 added 3 commits June 21, 2020 17:22

fix review

8fef9f4

fix

68d3ec5

logger

f8c818c

nikitamikhaylov and others added 3 commits June 24, 2020 16:36

fix + bump tests

8bbf1ce

Merge branch 'nikvas0/ssd_dict' of https://github.com/nikvas0/ClickHouse

c100547

into nikvas0/ssd_dict

fix

2b64394

nikitamikhaylov added a commit that referenced this pull request Jun 27, 2020

Merge pull request #11947 from nikitamikhaylov/merging-ssh-cache

3654988

Merging #8624 (ssd-cache)

nikitamikhaylov closed this Jun 27, 2020

filimonov mentioned this pull request Sep 21, 2020

StorageEmbeddedRocksDB #15073

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSD cache-dictionary #8624

SSD cache-dictionary #8624

nikvas0 commented Jan 12, 2020 •

edited

Loading

nikitamikhaylov commented May 28, 2020

nikitamikhaylov May 28, 2020

nikitamikhaylov May 28, 2020

nikvas0 Jun 9, 2020

nikitamikhaylov May 29, 2020

nikitamikhaylov May 29, 2020

nikitamikhaylov May 29, 2020

nikitamikhaylov May 29, 2020

nikitamikhaylov May 29, 2020

nikitamikhaylov May 29, 2020

nikitamikhaylov May 30, 2020

nikitamikhaylov May 30, 2020

filimonov commented Jun 22, 2020

alexey-milovidov commented Jun 22, 2020

filimonov commented Jun 22, 2020

alexey-milovidov commented Jun 22, 2020

nikitamikhaylov commented Jun 27, 2020

SSD cache-dictionary #8624

SSD cache-dictionary #8624

Conversation

nikvas0 commented Jan 12, 2020 • edited Loading

nikitamikhaylov commented May 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

filimonov commented Jun 22, 2020

alexey-milovidov commented Jun 22, 2020

filimonov commented Jun 22, 2020

alexey-milovidov commented Jun 22, 2020

nikitamikhaylov commented Jun 27, 2020

nikvas0 commented Jan 12, 2020 •

edited

Loading