Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider alternative KV store #433

Open
jonassmedegaard opened this issue Jun 10, 2022 · 13 comments
Open

consider alternative KV store #433

jonassmedegaard opened this issue Jun 10, 2022 · 13 comments

Comments

@jonassmedegaard
Copy link

I noticed in a Safenet discussion a mention that sled "is buggy and doesn’t seem to be actively maintained", with Persy and Cacache being their current candidates for replacement.

@joepio
Copy link
Member

joepio commented Jun 10, 2022

I haven't yet encountered any bugs in sled in the past two years, so I don't think it's worryingly buggy for my usecase. But I do have some worries about active maintenance. I've reached out to the maintainer some time ago, who told me he's working on a large low-level library that he'll integrate in Sled too.

Some requirements for alternatives:

  • Embeddable KV database
  • Fast
  • range and prefix queries possible

Some options:

  • ReDB. Impressive benchmarks, faster than pretty much anything else! But it's quite new and doesn't have a stable format yet. Definitely keep an eye on this one!
  • Reddit thread with more contenders
  • TiKV (multi-node!)
  • OpenDAL (supports sled, tikv, redb, so users can choose for themselves)

@AlexMikhalev
Copy link
Collaborator

I walked through a similar impressive list of contenders for embedded DB, before finding the atomic data server and it's not as clean-cut: for example, indradb has sled or Postgres as a dependency. While doing research I started working on a testing bench for common data structures in Rust used for those, but I run out of steam.
If we want to consider moving to different databases I would start with benchmarks - expand our criterion benchmarks to cover as many cases as possible and then select a handful to try with the new backend.
I believe we can get a lot of improvements using in-memory data structures for the cache - like dashmap, before we need to move from sled.
One more thought: team at https://github.com/Synerise/cleora found it's faster to re-build graph structure from data -they use sparse matrix of nodes and edges stored in FxHash than to deserialize it from serde.

@joepio
Copy link
Member

joepio commented Oct 4, 2022

Update on sled: maintainer of sled is working on it in the background, mostly on a new storage engine. So sled ain't dead, baby.

Some other thoughts:

  • Switching to Redis might help to achieve multi-node setup Multi-node / distributed setup (scaling to huge datasets / multi-tenant) #213, although it is not embeddable. Maybe we need some sort of abstraction that allows users to switch KV store? Wouldn't be too complex, I think.
  • Cloudflare's KV store might be interesting, too, as it allows for an edge deploy. Would probably involve rewriting far more, though.

@netthier
Copy link

netthier commented Dec 30, 2022

Update on sled: maintainer of sled is working on it in the background, mostly on a new storage engine. So sled ain't dead, baby.

Do note that the new engine is licensed under GPL3. I'm not familiar with how sled is being used in your project, but it may be incompatible with your MIT license.

https://github.com/komora-io/marble/blob/main/Cargo.toml#L7

@joepio
Copy link
Member

joepio commented Dec 30, 2022

@netthier That could very well be a problem, thanks! I've sent a mail to sled's maintainer.

Relevant issue in marble: komora-io/marble#7

@joepio joepio changed the title maybe move away from using sled consider alternative KV store Dec 30, 2022
@AlexMikhalev
Copy link
Collaborator

I propose to hook into Apache OpenDAL (Data Access Library), I was going to use it to handle s3 uploads and writes, but it supports in memory, sled/dash map/redis in addition to all major cloud services + IPFS. Fully functional example:

use log::debug;
use log::info;
use opendal::layers::LoggingLayer;
use opendal::Scheme;
use std::collections::HashMap;
use std::env;
use opendal::services;
use opendal::Operator;
use opendal::Result;

#[tokio::main]
async fn main() -> Result<()> {
    let _ = tracing_subscriber::fmt()
    .with_env_filter("info")
    .try_init();
    let schemes = [Scheme::S3, Scheme::Memory, Scheme::Dashmap, Scheme::Sled, Scheme::Redis];
    
    for scheme in schemes.iter() {
        info!("scheme: {:?}", scheme);
        read_and_write(*scheme).await?;
    }

    Ok(())
}

async fn read_and_write(scheme:Scheme) -> Result<()>{
    // Write data into object test and read it back
    let op = match scheme {
        Scheme::S3 => {
            let op = init_operator_via_map()?;
            debug!("operator: {op:?}");
            op

        },
        Scheme::Dashmap => {
            let builder = services::Dashmap::default();
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op

        },
        Scheme::Sled => {
            let mut builder = services::Sled::default();
            builder.datadir("/tmp/opendal/sled");
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op
        },
        Scheme::Redis => {
            let builder = services::Redis::default();
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op
        },
        _=>{    
            let builder = services::Memory::default();
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op

        }

        
    };
    // Write data into object test.
    let test_string = format!("Hello, World! {scheme}");
    op.write("test", test_string).await?;

    // Read data from object.
    let bs = op.read("test").await?;
    info!("content: {}", String::from_utf8_lossy(&bs));

    // Get object metadata.
    let meta = op.stat("test").await?;
    info!("meta: {:?}", meta);

    Ok(())

}

fn init_operator_via_map() -> Result<Operator> {
    // setting up the credentials
let access_key_id = env::var("AWS_ACCESS_KEY_ID").expect("AWS_ACCESS_KEY_ID is set and a valid String");
let secret_access_key = env::var("AWS_SECRET_ACCESS_KEY").expect("AWS_ACCESS_KEY_ID is set and a valid String");

    let mut map = HashMap::default();
    map.insert("bucket".to_string(), "test".to_string());
    map.insert("region".to_string(), "us-east-1".to_string());
    map.insert("endpoint".to_string(), "http://rpi4node3:8333".to_string());
    map.insert("access_key_id".to_string(), access_key_id.to_string());
    map.insert(
        "secret_access_key".to_string(),
        secret_access_key.to_string(),
    );

    let op = Operator::via_map(Scheme::S3, map)?;
    Ok(op)
}

@AlexMikhalev
Copy link
Collaborator

@joepio
Copy link
Member

joepio commented Jun 19, 2023

Wow @AlexMikhalev that looks really promising! Seems like it supports [scan] so that's good, althought it's missing in the Sled connector. I'm also wondering if it has Tree support, see issues:

apache/opendal#2498

apache/opendal#2497

@Xuanwo
Copy link

Xuanwo commented Jun 24, 2023

Hi, I'm the maintainer of OpenDAL. Thanks for @AlexMikhalev's sharing and @joepio's contact!

I'm here to bring some updates from OpenDAL side:

Apart from existing issues, I'm interesed in adding support for more services so our users can have more choices:

Please feel free to let me know if there is anything I can help you with!

@AlexMikhalev
Copy link
Collaborator

@Xuanwo awesome. A small example (like example 2 in your plans) of how to use OpenDal from tokio async functions will help me personally - I am building complementary to atomic product, https://terraphim.ai/ and I want to plug OpenDal operator instead of redis.rs KV.

@Xuanwo
Copy link

Xuanwo commented Jun 24, 2023

@Xuanwo awesome. A small example (like example 2 in your plans) of how to use OpenDal from tokio async functions will help me personally - I am building complementary to atomic product, https://terraphim.ai/ and I want to plug OpenDal operator instead of redis.rs KV.

Thanks for the feedback! I will write one tomorrow 🤪

@joepio
Copy link
Member

joepio commented Jul 24, 2023

OpenDAL now also supports TiKV! apache/opendal#2533

This opens up multi-node setups for Atomic-Server.

@Xuanwo
Copy link

Xuanwo commented Jul 24, 2023

Hi, I'm the maintainer of OpenDAL. Thanks for @AlexMikhalev's sharing and @joepio's contact!

I'm here to bring some updates from OpenDAL side:

Apart from existing issues, I'm interesed in adding support for more services so our users can have more choices:

Please feel free to let me know if there is anything I can help you with!

One month later, OpenDAL community implemented all the issues 🚀!

joepio added a commit that referenced this issue Jul 31, 2023
@joepio joepio mentioned this issue Jul 31, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants