Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use RWMutex replace Mutex in segment #16

Conversation

hengfeiyang
Copy link
Contributor

@hengfeiyang hengfeiyang commented May 27, 2022

This will be slight, but it would cause problem when load many many segments, need a mechanism to release unused cache.

@mschoch
Copy link
Member

mschoch commented May 27, 2022

I have 2 concerns with this approach:

  1. We reuse the buffer, but we keep re-decomprressing the same bytes over and over again. For example, if I have match 1000 documents and they all are in the same chunk in the same segment. To simply load the _id field of each, we decompress that same compressed stored doc chunk 1000 times. Do you agree this will happen and that it is bad?
  2. We now have additional lock contention, and all stored field access will have a bottleneck on this single shared buffer.

I don't see how this can work either.

@mschoch
Copy link
Member

mschoch commented May 27, 2022

To me, compression chunks of stored documents is great for compression, but terrible for our existing search API, and without changes to that, it cannot perform well.

@hengfeiyang
Copy link
Contributor Author

We can reduce the scale of lock, when load data we can initial many mutex equal to thunk num, but we don't cache uncompress data. We still cache uncompress data when first use it.

After then, we split the lock contention to 1/N (n=thunkNum)

Fake code:

decompressedStoredFieldChunks map[uint64]cacheData

type cacheData struct{
    data []byte
    m      sync.RWMutex
}

@mschoch
Copy link
Member

mschoch commented May 27, 2022

I guess I misunderstood what you were doing here.

Don't you still think this is going to cache too much data?

@hengfeiyang
Copy link
Contributor Author

hengfeiyang commented May 27, 2022

It going will be to add a cache manager to keep memory usage at a proper size.

@mschoch
Copy link
Member

mschoch commented May 27, 2022

It going will be to add a cache manager to keep memory size at a proper usage.

You're saying in order for this to work well, we need some other component that doesn't exist yet?

@hengfeiyang
Copy link
Contributor Author

hengfeiyang commented May 27, 2022

Yes, you are right, let me think more, try to find other solution.

var storedFieldDecompressed []byte
var ok bool
if storedFieldDecompressed, ok = s.decompressedStoredFieldChunks[chunkI]; !ok {
storedFieldDecompressed := s.decompressedStoredFieldChunks[chunkI]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever actually succeed in updating items inside the map. Although you mutate storedFieldDecompressed I believe this is a copy, not the item the map. See this example I think is analagous: https://go.dev/play/p/s0UrS7nD_fg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right, i changed it. 34f03d6

@hengfeiyang hengfeiyang closed this Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants