Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to filter invalid result from the target efficiently? #3811

Closed
zhoucheng361 opened this issue Dec 14, 2023 · 5 comments
Closed

how to filter invalid result from the target efficiently? #3811

zhoucheng361 opened this issue Dec 14, 2023 · 5 comments

Comments

@zhoucheng361
Copy link

I use stateful machine test posix compatibility between /tmp/ and jfs file system, I do operations like create_file and unlink etc for the two file system, and compare their result or exception strings.
The problem is when it fails in create file, I will return INVALID_FILE_PATH for the target, and then I filter the INVALID_FILE_PATH in unlink in order to not unlink a non-exist file。
But in this way, it will filter too much data and make the test inefficient, and hypothesis will give the error hypothesis.errors.FailedHealthCheck error: It looks like your strategy is filtering out a lot of data...
So what can I handle this case? Is there a way to filter the non-exist files from the target efficiently?

import os
from string import ascii_lowercase
from hypothesis import Verbosity, assume, strategies as st
from hypothesis.stateful import rule, precondition, RuleBasedStateMachine, Bundle, initialize, consumes
from hypothesis import  seed, settings

INVALID_FILE_PATH = 'invalid_file'
@settings(verbosity=Verbosity.debug, 
    max_examples=10, 
    stateful_step_count=10)
class PosixMachine(RuleBasedStateMachine):
    
    ROOT_DIR1='/tmp/fsrand'
    ROOT_DIR2='/mnt/jfs/fsrand'
    Files = Bundle('files')
    Folders = Bundle('folders')
    @initialize(target=Folders)
    def init_folders(self):
        return ''

    def __init__(self):
        super(PosixMachine, self).__init__()

    @rule(target=Files, parent = Folders, file_name = st.text(alphabet=ascii_lowercase, min_size=254))
    def create_file(self, parent, file_name):
        result1 = self.do_create_file(self.ROOT_DIR1, parent, file_name)
        result2 = self.do_create_file(self.ROOT_DIR2, parent, file_name)
        assert result1 == result2
        if isinstance(result1, tuple):
            return os.path.join(parent, file_name)
        else: # create file failed, so we should not put the file in the target
            print('create file failed')
            return INVALID_FILE_PATH
    
    def do_create_file(self, root_dir, parent, file_name):
        relpath = os.path.join(parent, file_name)
        abspath = os.path.join(root_dir, relpath)
        try:
            with open(abspath, 'x') as file:
                file.write('abc')
        except Exception as e :
            return str(e)
        return os.stat(abspath)

    @rule(file = Files.filter(lambda x: x != INVALID_FILE_PATH))
    def unlink(self, file):
        result1 = self.do_unlink(self.ROOT_DIR1, file)
        result2 = self.do_unlink(self.ROOT_DIR2, file)
        assert result1 == result2

    def do_unlink(self, root_dir, file):
        abspath = os.path.join(root_dir, file)
        try:
            os.unlink(abspath)
        except Exception as e:
            return str(e)
        return () 

if __name__ == '__main__':
    juicefs_machine = PosixMachine.TestCase()
    juicefs_machine.runTest()
@jobh
Copy link
Contributor

jobh commented Dec 14, 2023

A couple of suggestions:

Instead of returning INVALID_FILE_PATH, it may be cleaner to assume that the path is valid.

assume(isinstance(result1, tuple))
return os.path.join(...)

However, that will probably not fix the problem, it just stops it propagating into the bundle. The real fix would be to amend the strategy itself to be less likely to produce invalid file names in the first place. For example, if forward slashes are known to be invalid then they can maybe be removed or replaced after they are generated, without invalidating the entire example:

file_name = file_name.replace("/", "")  # for example

This can be expressed as a reusable custom valid-file-name strategy.

[edit] Maybe I'm misunderstanding, and you want those invalid filenames as part of the test. If so, just drop the filter and unlink conditionally within the function.

@Zac-HD
Copy link
Member

Zac-HD commented Dec 14, 2023

Instead of return INVALID_FILE_PATH, use return multiple() to end a rule without adding anything to the bundle.

As a separate tip, you might find the consumes() function useful when feeding the unlink() rule, since I assume that you don't want to do anything with that path afterwards.

I'd avoid using assume() - while clean, it would be pretty inefficient to solve this with rejection sampling.

@zhoucheng361
Copy link
Author

Instead of return INVALID_FILE_PATH, use return multiple() to end a rule without adding anything to the bundle.

As a separate tip, you might find the consumes() function useful when feeding the unlink() rule, since I assume that you don't want to do anything with that path afterwards.

I'd avoid using assume() - while clean, it would be pretty inefficient to solve this with rejection sampling.

Thanks a lot, return multiple() works perfect for me.
Yes, I want to consumes the file when unlink it succeed, but I don't want to consumes it if unlink failed, how can I handle this case?

@Zac-HD
Copy link
Member

Zac-HD commented Dec 15, 2023

Consume it, return multiple() if unlink succeeded, or the file if unlink failed (so that it's put back into the bundle)

@Zac-HD Zac-HD closed this as completed Dec 15, 2023
@zhoucheng361
Copy link
Author

Thanks, it is very good idea for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants