how to filter invalid result from the target efficiently? #3811

zhoucheng361 · 2023-12-14T03:56:11Z

I use stateful machine test posix compatibility between /tmp/ and jfs file system, I do operations like create_file and unlink etc for the two file system, and compare their result or exception strings.
The problem is when it fails in create file, I will return INVALID_FILE_PATH for the target, and then I filter the INVALID_FILE_PATH in unlink in order to not unlink a non-exist file。
But in this way, it will filter too much data and make the test inefficient, and hypothesis will give the error hypothesis.errors.FailedHealthCheck error: It looks like your strategy is filtering out a lot of data...
So what can I handle this case? Is there a way to filter the non-exist files from the target efficiently？

import os
from string import ascii_lowercase
from hypothesis import Verbosity, assume, strategies as st
from hypothesis.stateful import rule, precondition, RuleBasedStateMachine, Bundle, initialize, consumes
from hypothesis import  seed, settings

INVALID_FILE_PATH = 'invalid_file'
@settings(verbosity=Verbosity.debug, 
    max_examples=10, 
    stateful_step_count=10)
class PosixMachine(RuleBasedStateMachine):
    
    ROOT_DIR1='/tmp/fsrand'
    ROOT_DIR2='/mnt/jfs/fsrand'
    Files = Bundle('files')
    Folders = Bundle('folders')
    @initialize(target=Folders)
    def init_folders(self):
        return ''

    def __init__(self):
        super(PosixMachine, self).__init__()

    @rule(target=Files, parent = Folders, file_name = st.text(alphabet=ascii_lowercase, min_size=254))
    def create_file(self, parent, file_name):
        result1 = self.do_create_file(self.ROOT_DIR1, parent, file_name)
        result2 = self.do_create_file(self.ROOT_DIR2, parent, file_name)
        assert result1 == result2
        if isinstance(result1, tuple):
            return os.path.join(parent, file_name)
        else: # create file failed, so we should not put the file in the target
            print('create file failed')
            return INVALID_FILE_PATH
    
    def do_create_file(self, root_dir, parent, file_name):
        relpath = os.path.join(parent, file_name)
        abspath = os.path.join(root_dir, relpath)
        try:
            with open(abspath, 'x') as file:
                file.write('abc')
        except Exception as e :
            return str(e)
        return os.stat(abspath)

    @rule(file = Files.filter(lambda x: x != INVALID_FILE_PATH))
    def unlink(self, file):
        result1 = self.do_unlink(self.ROOT_DIR1, file)
        result2 = self.do_unlink(self.ROOT_DIR2, file)
        assert result1 == result2

    def do_unlink(self, root_dir, file):
        abspath = os.path.join(root_dir, file)
        try:
            os.unlink(abspath)
        except Exception as e:
            return str(e)
        return () 

if __name__ == '__main__':
    juicefs_machine = PosixMachine.TestCase()
    juicefs_machine.runTest()

The text was updated successfully, but these errors were encountered:

jobh · 2023-12-14T08:22:44Z

A couple of suggestions:

Instead of returning INVALID_FILE_PATH, it may be cleaner to assume that the path is valid.

assume(isinstance(result1, tuple))
return os.path.join(...)

However, that will probably not fix the problem, it just stops it propagating into the bundle. The real fix would be to amend the strategy itself to be less likely to produce invalid file names in the first place. For example, if forward slashes are known to be invalid then they can maybe be removed or replaced after they are generated, without invalidating the entire example:

file_name = file_name.replace("/", "")  # for example

This can be expressed as a reusable custom valid-file-name strategy.

[edit] Maybe I'm misunderstanding, and you want those invalid filenames as part of the test. If so, just drop the filter and unlink conditionally within the function.

Zac-HD · 2023-12-14T19:00:45Z

Instead of return INVALID_FILE_PATH, use return multiple() to end a rule without adding anything to the bundle.

As a separate tip, you might find the consumes() function useful when feeding the unlink() rule, since I assume that you don't want to do anything with that path afterwards.

I'd avoid using assume() - while clean, it would be pretty inefficient to solve this with rejection sampling.

zhoucheng361 · 2023-12-15T03:00:14Z

Instead of return INVALID_FILE_PATH, use return multiple() to end a rule without adding anything to the bundle.

As a separate tip, you might find the consumes() function useful when feeding the unlink() rule, since I assume that you don't want to do anything with that path afterwards.

I'd avoid using assume() - while clean, it would be pretty inefficient to solve this with rejection sampling.

Thanks a lot, return multiple() works perfect for me.
Yes, I want to consumes the file when unlink it succeed, but I don't want to consumes it if unlink failed, how can I handle this case?

Zac-HD · 2023-12-15T13:46:23Z

Consume it, return multiple() if unlink succeeded, or the file if unlink failed (so that it's put back into the bundle)

zhoucheng361 · 2023-12-18T07:16:25Z

Thanks, it is very good idea for me！

Zac-HD closed this as completed Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to filter invalid result from the target efficiently? #3811

how to filter invalid result from the target efficiently? #3811

zhoucheng361 commented Dec 14, 2023

jobh commented Dec 14, 2023 •

edited

Loading

Zac-HD commented Dec 14, 2023

zhoucheng361 commented Dec 15, 2023

Zac-HD commented Dec 15, 2023

zhoucheng361 commented Dec 18, 2023

how to filter invalid result from the target efficiently? #3811

how to filter invalid result from the target efficiently? #3811

Comments

zhoucheng361 commented Dec 14, 2023

jobh commented Dec 14, 2023 • edited Loading

Zac-HD commented Dec 14, 2023

zhoucheng361 commented Dec 15, 2023

Zac-HD commented Dec 15, 2023

zhoucheng361 commented Dec 18, 2023

jobh commented Dec 14, 2023 •

edited

Loading