Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Cache-service #6786

Closed
2 of 11 tasks
drmingdrmer opened this issue Jul 24, 2022 · 17 comments
Closed
2 of 11 tasks

Tracking issue for Cache-service #6786

drmingdrmer opened this issue Jul 24, 2022 · 17 comments

Comments

@drmingdrmer
Copy link
Member

drmingdrmer commented Jul 24, 2022

Cache service will be a cluster of data block cache servers sitting in mid of databend-query and underlying storage service to provide fast data block retrieving.

Server-side tasks

The following are still some tracking-issue level tasks:(

  • Impl cache replacement policy: modified LIRS
  • Impl Chunk store. See arch
  • Impl Object store. See arch
  • Impl Access store. See arch
  • Impl Manifest store. See arch
  • Impl service protocol http2 PUT+GET
  • Impl configuration and server initialization.
  • Impl server restart procedure.

Databend side tasks could be found at: #6803

@flaneur2020
Copy link
Member

flaneur2020 commented Jul 25, 2022

beside to accelerating block retrieve from s3, the cache service may also plays a good place to store the spilled data from join statemets's shuffle, thus can reduce the OOM caused by skewed joins.

@drmingdrmer
Copy link
Member Author

beside to accelerating block retrieve from s3, the cache service may also plays a good place to store the spilled data from join statemets's shuffle, thus can reduce the OOM caused by skewed joins.

AFAIK, the latest decision about the temp data generated by a join is to store them in a persistent store such as s3. Since the cache server can not provide a durability guarantee, the temp data has a chance to be evicted when a query execution still needs it.

To support such a requirement, the cache server has to be able to let an application explicitly disable auto eviction for some data. And the application has to purge these caches explicitly when the join job is done.

Maybe it can be a future feature that objects have durability configuration?

@BohuTANG BohuTANG mentioned this issue Jul 25, 2022
4 tasks
@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2022

I think Client side tasks should be split into two parts:

  • OpenDAL's opencache service implements Accessor based on the opencache API.
  • Databend's Temporary Operator defines cluster metadata format.

databend-meta support opencache cluster metadata.

Should databend have special knowledge for caching services? Maybe they should be handled inside opendal client?

@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2022

beside to accelerating block retrieve from s3, the cache service may also plays a good place to store the spilled data from join statemets's shuffle, thus can reduce the OOM caused by skewed joins.

Based on the current design, this feature will be resolved by Temporary Services. The query can store temporary data:

image

OpenCache can be implemented as a temporary services too ~

@drmingdrmer
Copy link
Member Author

I think Client side tasks should be split into two parts:

  • OpenDAL's opencache service implements Accessor based on the opencache API.
  • Databend's Temporary Operator defines cluster metadata format.

Do you mean that Accessor only connects to one opencache server and lets Operator do the load balancing job, such as with a consistent hash?
It would be nice if I understood you correctly.

databend-meta support opencache cluster metadata.

Should databend have special knowledge for caching services? Maybe they should be handled inside opendal client?

I agree:D

@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2022

Do you mean that Accessor only connects to one opencache server and lets Operator do the load balancing job, such as with a consistent hash?

OpenDAL's opencache client can handle the load balancing job and accept some options:

[cache]
type = "opencache"

[cache.opencache]
endpoints = ["192.168.0.2", "192.168.0.3"]
hash_methods = "ConsistentHash"

OpenDAL's operator is just an Arc wrapper for Accessor, we should wrap the logic inside.

@flaneur2020
Copy link
Member

adding a namespace-like configuration may make operations on a multi-tenant deployment easier:

  1. cache keys can be isolated in a different namespaces
  2. we can collect some info or restrict usage at the namespace level, like qps, memory limit, rate limit, etc.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2022

This tracking issue is composing two things:

  • Cache support in Databend
  • Distributed Cache services: OpenCache

I split them into #6803

@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2022

adding a namespace-like configuration may make operations on a multi-tenant deployment easier:

I created a feature request for opencache: datafuselabs/opencache#3

@drmingdrmer
Copy link
Member Author

adding a namespace-like configuration may make operations on a multi-tenant deployment easier:

  1. cache keys can be isolated in a different namespaces
  2. we can collect some info or restrict usage at the namespace level, like qps, memory limit, rate limit, etc.

I'd prefer not to introduce namespace into the server side:

  • Splitting the cache server disk space into several sub-space reduces resource efficiency. The reserved space in one sub-space can not be used by another heavily loaded sub-space.

  • every time a node is added or removed, or a tenant(sub-space) is added or removed, the quota of every sub-space needs to be adjusted.

Maybe it's better to be done on the client side.

@flaneur2020
Copy link
Member

Splitting the cache server disk space into several sub-space reduces resource efficienc

oh, do not really need split the storage physically, it can be just a builtin key prefix in the internal storage, like the bucket in boltdb: https://github.com/boltdb/bolt#using-buckets

the operational functionality like quota do not need to be built in the 1st phase, but a design with namespace built-in would make the life easier when multi-tenant workload get into place.

@drmingdrmer
Copy link
Member Author

Splitting the cache server disk space into several sub-space reduces resource efficienc

oh, do not really need split the storage physically, it can be just a builtin key prefix in the internal storage, like the bucket in boltdb: https://github.com/boltdb/bolt#using-buckets

It's different from a db such as bolt: the access load pattern affects the internal data layout. A tenant who reads a lot from the cache server will aggressively evict every piece of other user's data.

It's not difficult to have a namespace inside the cache server, but it can not be used for throttling without physical isolation.

@flaneur2020
Copy link
Member

flaneur2020 commented Jul 26, 2022

Splitting the cache server disk space into several sub-space reduces resource efficienc

oh, do not really need split the storage physically, it can be just a builtin key prefix in the internal storage, like the bucket in boltdb: https://github.com/boltdb/bolt#using-buckets

It's different from a db such as bolt: the access load pattern affects the internal data layout. A tenant who reads a lot from the cache server will aggressively evict every piece of other user's data.

It's not difficult to have a namespace inside the cache server, but it can not be used for throttling without physical isolation.

do our cache service planned to have a proxy layer like twemproxy ?

if it is, I guess we can do something in the proxy layer to take some control. it'd be easy to rate limit or restrict memory size in the proxy layer.

if not, it's hard to maintain the namespace concept in server side IMHO, there'd be no such a central "server side" but many peers in a distributed manner q.q

@flaneur2020
Copy link
Member

flaneur2020 commented Jul 26, 2022

IMHO a proxy layer do have its complexities (especially in HA) and need not to be a top priority work.

another way we may consider is a sidecar proxy to coordinate the multi tenant usages, which works more likes a client, it encapsulates the complex parts like distributed coordination & billing logic, just offering a simple GET/SET rpc interface.

maybe likes this: https://github.com/facebook/mcrouter , I remember it's deployed on each host machine to forward mc requests to a big mc pool, and have some additional features like failover, local cache, stats, etc.

@ZhiHanZ @hantmac what's your opinion about this? q.q

@drmingdrmer
Copy link
Member Author

do our cache service planned to have a proxy layer like twemproxy ?

Hmm... no such plan yet. Meanwhile the proxy job is done by the opendal.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 26, 2022

I agree that namespace, proxy, quota, and audit are all useful features. But considering our cache services only have markdown docs now, I prefer to start as simple as possible.

Let's implement the cache service and databend's cache mechanism first. After that, we can have a more profound and solid code base for improvement.

What do you think? @flaneur2020 @drmingdrmer

@Xuanwo
Copy link
Member

Xuanwo commented Nov 18, 2022

Not used so far. Feel free to re-open it when needed.

@Xuanwo Xuanwo closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants