Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: compaction test in project2 gives a false negative result #310

Open
aptend opened this issue Jan 2, 2020 · 2 comments
Open

rust: compaction test in project2 gives a false negative result #310

aptend opened this issue Jan 2, 2020 · 2 comments

Comments

@aptend
Copy link

aptend commented Jan 2, 2020

OS: Windows10 64-bit, version 1903, build 18362.535

Rust: 1.39.0

I didn’t implement compaction feature for my own KvStore at all, however, after I ran all tests, it showed that my own KvStore passed the compaction test.

Then I found the way of detecting the size of a directory seemed to be the problem. Actually, it is risky to find out the size of an opened file by inspecting its metadata. When we dump data into an opened database log file, it is possible that the metadata won’t be updated immediately.

The related code is shown as follows:

let dir_size = || {
    let entries = WalkDir::new(temp_dir.path()).into_iter();
    let len: walkdir::Result<u64> = entries
    .map(|res| {
        res.and_then(|entry| entry.metadata())
        .map(|metadata| metadata.len())
    })
    .sum();
    len.expect("fail to get directory size")
};
let mut current_size = dir_size(); // current_size = 0
for iter in 0..1000 {
    for key_id in 0..1000 {
        let key = format!("key{}", key_id);
        let value = format!("{}", iter);
        store.set(key, value)?;
    }

    let new_size = dir_size(); // new_size = 0 as metadata is not updated
    if new_size > current_size { // false --> 'compaction triggered' (not!)
        current_size = new_size;
        continue;
    }
    // Compaction triggered

    drop(store);
    ...
}

let’s say, the first call of dir_size returns 0, which is assigned to current_size. And then 1000 key-value pairs are put into store. Unfortunately, the second call of dir_size will still return 0 because there is only one log file and its metadata is not updated. Therefore,the statement new_size > current_size is false because they are equal actually, finally, the test function claims “compaction triggered”, which is not true.

If we do system call like fsync or something as we write to file, metadata will be updated of course, but it is too heavy. Here we just want to know file size, so why not just open the file, seek the file pointer to the end, and record its position? Like this:

let dir_size = || -> u64 {
    let size: io::Result<u64> = WalkDir::new(temp_dir.path())
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(|e| e.path().is_file())
    .filter_map(|e| File::open(&e.path()).ok())
    .map(|mut f| f.seek(SeekFrom::End(0)))
    .sum();
    size.expect("fail to get total size of logs")
};

By this way, the compaction test fails if KvStore doesn’t deal with file compaction.

Any ideas?

@jessetham
Copy link

Also seeing the same issue on my Windows machine (Windows 10, 64-bit, Version 1903).

There's no issue when I run the test on my Raspberry Pi 4 though (Raspbian GNU/Linux 10 (buster)).

@slhmy
Copy link

slhmy commented Mar 24, 2021

I guess this problem is related with Windows' file system and also related with crate tempfile.

I reappeared this problem in a smaller version. let's see.

use std::io::prelude::*;
use std::io::BufWriter;
use std::fs::OpenOptions;
use walkdir::WalkDir;
use tempfile::TempDir;

fn main() {
    // create tmp directory in "C:\\Users\\<USER>\\AppData\\Local\\Temp"
    let temp_dir = TempDir::new().expect("unable to create temporary working directory");
    
    // create tmp directory in project directory
    // let temp_dir = TempDir::new_in("").expect("unable to create temporary working directory");

    let mut stream = BufWriter::new(
        OpenOptions::new()
            .create(true)
            .write(true)
            .append(true)
            .open(temp_dir.path().join("tmp.txt")).unwrap()
    );

    for _ in 0..10000 {
        stream.write(&[0]).unwrap();
        stream.flush().unwrap();
    }

    // drop(stream);

    let dir_size = || {
        let entries = WalkDir::new(temp_dir.path()).into_iter();
        let len: walkdir::Result<u64> = entries
            .map(|res| {
                println!("{:?}", res);
                res.and_then(|entry| entry.metadata())
                    .map(|metadata| metadata.len())
            })
            .sum();
        len.expect("fail to get directory size")
    };

    let current_size = dir_size();
    println!("{}", current_size);
}

And the dependency I use is the same to project 2.

[dependencies]
walkdir = "2.2.7"
tempfile = "3.0.7"

As you can see, this example creates a temp file and flushes the stream buf on every single byte (Which I think is similar to the set method in kvs).

But strange things happened. The flush method doesn't work.

If we change nothing. The output of dir_size() is 0.

If we create temp directory in project directory:

    // create tmp directory in "C:\\Users\\<USER>\\AppData\\Local\\Temp"
    // let temp_dir = TempDir::new().expect("unable to create temporary working directory");

    // create tmp directory in project directory
    let temp_dir = TempDir::new_in("").expect("unable to create temporary working directory");

Or drop stream before measure:

    for _ in 0..10000 {
        stream.write(&[0]).unwrap();
        stream.flush().unwrap();
    }

    drop(stream);

Output of dir_size() is correct and shows 10000.

So It seems that either two ways can fix this problem. But after I tried both of them, I found only the second way is effective.

        let mut store = KvStore::open(temp_dir.path())?;
        for key_id in 0..1000 {
            let key = format!("key{}", key_id);
            let value = format!("{}", iter);
            store.set(key, value)?;
        }
        drop(store);

I drop store every iteration before size measurement. It is not a good solution I think.
Theoretically, file's metadata should be ready after flush, but it isn't.
So maybe we need to dig deeper into the problem, and find out what is happening in the system.

Unfortunately I have no idea on how to do the further experiments, if anyone have the idea, please help us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants