Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support symbolic files with symbolic size #42

Open
likebreath opened this issue Feb 23, 2018 · 13 comments
Open

Support symbolic files with symbolic size #42

likebreath opened this issue Feb 23, 2018 · 13 comments
Assignees

Comments

@likebreath
Copy link
Collaborator

CRETE now supports only symbolic files with fixed size. This symbolic file support is limited, because file size commonly makes important effects on program's behavior. Support of Symbolic size is needed to enable CRETE explore program's branches that require different file size.

@likebreath
Copy link
Collaborator Author

likebreath commented Feb 23, 2018

@zhenkun @moralismercatus
We can start with support of symbolic files with different fixed sizes. A compile-tested-only patch can be found 4f93215. Please help me verify how it works and we can find a time to talk online. Let me know your thoughts.

@moralismercatus
Copy link
Collaborator

@likebreath @zhenkun
As a first step, I enabled CRETE to consume multiple seed test cases of the same size - without applying the above patch.

The test program has 4 unique paths, and I reused the corresponding test cases resulting from the original CRETE run with the default seed. I fed these 4 TCs into CRETE as the "multiple seeds". BTW, these TCs were in binary format, and not serialized format. The final result was as follows:

Total TCs: 13
Total traces: 4

Essentially, there was some redundancy in the TCs. My suspicion is that because the seeds didn't contain branch information, CRETE generated redundant TCs, but not redundant traces, as these were checked for redundancy.

Does my analysis sound about right, Bo?

Thanks,

@moralismercatus
Copy link
Collaborator

On second thought, a total of 8 TCs makes sense. 13 TCs does not.

@likebreath
Copy link
Collaborator Author

Yes, I think your interpretation is right.

One more comment. The total number of trace as 4 indicates only 4 test cases out of 13 are unique. I am not sure it is the right behavior as you mention tc number should be 8.

@likebreath
Copy link
Collaborator Author

likebreath commented Mar 13, 2018

@moralismercatus @zhenkun
Please find the fix for checking on duplicated test case with the recent change here. Please note that this branch is from my fork, so you need to add my fork to your local remotes to checkout it through git. Please don't hesitate to tell me when you have problems.

Please use the updated fix here.

@moralismercatus
Copy link
Collaborator

@likebreath, @zhenkun
Hi Bo, thanks for putting this up so promptly. This patch seems to have fixed the issue of TCs of differing "useful_data_size" being considered redundant.

Thanks,

@likebreath
Copy link
Collaborator Author

likebreath commented Mar 14, 2018

@moralismercatus

Thanks for confirming this promptly. We can find a time to talk in details about the questions you and Zhenkun had during the meeting earlier today. Also, if you can share me with a copy of your parser for multi-seeds, I can double check whether it is good to work with the patch. You may want to do a secret git gist and send me a link through email.

Let me know if you have any questions.

@moralismercatus
Copy link
Collaborator

moralismercatus commented Mar 14, 2018

@likebreath
Sure. I'll push the changes as soon as they're verified to be working. Another concern I have is in the following line:

if(out_result != file.size) {

I have encountered this exception being thrown. It would seem that since useful_data_size is being used to write to the file, it follows that the check should be out_result != useful_size. Am I mistaken?

@likebreath
Copy link
Collaborator Author

@moralismercatus
Good catch. That also needs to be adjusted with "useful_size". I have fixed it in on my branch here. Please note that I also adjusted "TestCaseElement::operator==()" to make the duplication check right on test case.

@moralismercatus
Copy link
Collaborator

@likebreath @zhenkun

Committed changes. Preliminary testing is successful.

moralismercatus@198e904

@likebreath
Copy link
Collaborator Author

@moralismercatus

The code looks good to me.

A minor comment, it will be easier to review the code if you can split changes into smaller change sets, so that each commit only contains changes for one small task.

@moralismercatus
Copy link
Collaborator

@likebreath @zhenkun

I'd like to share with you a sample program that we tested the feature with.

#include <iostream>
#include <fstream>

using namespace std;

size_t file_size(char* f)
{
    ifstream ifs( f, ios::binary | ios::ate );
    return ifs.tellg();
}

int main( int argc, char* argv[] ) {
    if( argc != 2 ) {
        std::cerr << "missing arg[1] == filename" << std::endl;
    return 1;
    }
    std::ifstream ifs( argv[1] );
    if( !ifs.good() ) {
        std::cerr << "unable to open: " << argv[1] << std::endl;
    return 1;
    }
    if( file_size( argv[1] ) == 1 )
    {
        char c;
        ifs >> c;
        if     ( c == 'x' )
            std::cout << "c == " << c << std::endl;
        else if( c == 'y' )
            std::cout << "c == " << c << std::endl;
        else if( c == 'z' )
            std::cout << "c == " << c << std::endl;
        else
            std::cout << "c == " << c << std::endl;
    }
    else if( file_size( argv[1] ) == 2 )
    {
        char d;
        ifs >> d;
        ifs >> d;
        if( d == 'a' )
            std::cout << "d == " << d << std::endl;
        else if( d == 'b' )
            std::cout << "d == " << d << std::endl;
        else
            std::cout << "d == " << d << std::endl;
    }
    else
        std::cout << "ASSERT!!" << std::endl;

    return 0;
}
<?xml version="1.0"?>
<crete>
  <exec>/home/test/test/harness</exec>
  <args>
    <arg concolic="false" value="/home/test/test/input.bin" size="" index="1"/>
  </args>
  <files>
    <file concolic="true" size="2" argv_index="1" path="/home/test/test/input.bin"/>
  </files>
</crete>

In brief, there are 2 mutually exclusive paths through the program depending on whether the file size is 1 or 2. In terms of feasible unique paths, there are 4+3=7.

If _ represents an unspecified value, then we can expect TCs:
file_size(1):
_
x
y
z
file_size(2):
__
_a
_b

In fact, when we provide seeds of sizes 1 and 2 (e.g., _ and __), we in actuality get the following:
__
__
_a
_b
x_
xa
xb
y_
ya
yb
z_
za
zb

We thus result in 13 TCs, and 13 traces before termination. This result is peculiar to me. Despite the mutual exclusivity of bytes 1 and 2, we're getting the full set of permutations, as if the if( file_size ) statements were not even present.

@likebreath
Copy link
Collaborator Author

I agree the behavior is not normal. Few things I will do to understand what happened:

  1. Find the test cases that should not be generated, and check the trace produced them.
  2. Try experiments with one 1 byte file and one 2 byte file, and check the behavior.

Keep me updated if you have new observations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants