In [1]:
import enum
import importlib
import inspect
import os
import shutil

from IPython.display import HTML

from sflkit import *
from sflkit.color import ColorCode
from sflkit import instrument_config, analyze_config
from sflkit.config import Config

from avicenna import *

## A faulty Program

First, we need a faulty program. We chose an implementation of the `middle(x, y, z)` function that returns the *middle* number of its three arguments. For example, `middle(1, 3, 2)` should return 2 because `1 < 2` and `2 < 3`. We introduced a fault in this implementation of `middle` that occurs in line 7 `m = y`. 

In [2]:
def middle(x, y, z):
    m = z
    if y < z:
        if x < y:
            m = y
        elif x < z:
            m = x 
    else:
        if x > y:
            m = y
        elif x > z:
            m = x # desired line
    return m

Next, we introduce a class to capture test runs' results efficiently. The `TestResult` is an enum with two possible values, `PASS`and `FAIL`. `PASS` donates a passing test case and `FAIL` a failing one.

In [3]:
class TestResult(enum.Enum):
    
    def __repr__(self):
        return self.value
    
    PASS = 'PASS'
    FAIL = 'FAIL'

Now we implement a test function that takes the three arguments of `middle(x, y, z)` and an expected result. This test function compares the return of `middle(x, y, z)` with the desired value and returns `PASS` if they match and `FAIL` otherwise.

In [4]:
def test(function, x, y, z, expected):
    try:
        if function(x, y, z) == expected:
            return TestResult.PASS
        else:
            return TestResult.FAIL
    except BaseException:
        return TestResult.FAIL

def test_middle(x, y, z, expected):
    return test(middle, x, y, z, expected)

In [5]:
source = inspect.getsource(middle)
print(source)

def middle(x, y, z):
    m = z
    if y < z:
        if x < y:
            m = y
        elif x < z:
            m = x 
    else:
        if x > y:
            m = y
        elif x > z:
            m = x # desired line
    return m



In [6]:
middle_py = 'middle.py'
tmp_py = 'tmp.py'

We produced the same results for the test cases, so it seems to work.

### Configuring SFLKit

The `Config` class provides comfortable access to `SFLKit` by defining the fundamental concepts we want to investigate.

We give some information for the config that we need to define. First, we need the path to the source we want to investigate, which we already have in `middle_py`. Next, we need an out, `tmp_py`. We also need:

The language of our subject is `'python'`.
Let's start with `'line'` as the predicates we want to investigate.
We define `'tarantula'` as our evaluation metric for the predicates, i.e., the similarity coefficient.
We also need a list of passing and failing tests used during the analysis.

In [7]:
language='python'
predicates='line'
metrics='Tarantula'
#passing='event-files/0,event-files/1'
#failing='event-files/2'

We define a function that gives as a `Config` object, so we do not need to create it manually every time we change something.

In [8]:
def get_config():
    return Config.create(path=middle_py, working=tmp_py, language=language, predicates=predicates)

Now we can define a function that instruments our subject. We leverage `SFLKit`'s `instrument_config()`, which takes a config we create with our defined `get_config()` and instruments the subject. We can also show the content of the instrumented python file with this function.

In [9]:
def instrument(out=False):
    instrument_config(get_config())

Now we instrument our `middle.py` subject and check the results.

In [10]:
instrument()

sflkit :: INFO     :: I found 11 events in middle.py.
sflkit :: INFO     :: I found 0 events in middle.py.


As you can see, the instrumentation added an import at the beginning to a lib that comes with `SFLKit`, cluing the execution of files together. Moreover, the instrumentation added a function call function of the lib in front of each executable line that tracks the executed lines.

## Get and Analyze Events

Now, we want to extract the events from the execution of tests. Therefore, we need to adjust our test execution function again because the shared library for tracking the events does not know when to start and stop. We need to reset this library before entering our `middle.py` and tell the library to dump the events after the function finishes.

In [11]:
def test_tmp(x, y, z, expected): 
    import tmp
    importlib.reload(tmp)
    tmp.sflkitlib.lib.reset()
    try:
        return test(tmp.middle, x, y, z, expected)
    finally:
        tmp.sflkitlib.lib.dump_events()

We define a path to write the generated event logs.

In [12]:
event_files = 'event-files'

Then, we need a function to generate the event log from the previous test cases. We change the environment variable `EVENTS_PATH` to the output path of the event log file before running each test.

In [13]:
# def run_tests():
#     if os.path.exists(event_files):
#         shutil.rmtree(event_files)
#     os.mkdir(event_files)
#     os.environ['EVENTS_PATH'] = os.path.join(event_files, '0')
#     test_tmp(3, 2, 1, expected=2)
#     os.environ['EVENTS_PATH'] = os.path.join(event_files, '1')
#     test_tmp(3, 1, 2, expected=2)
#     os.environ['EVENTS_PATH'] = os.path.join(event_files, '2')
#     test_tmp(2, 1, 3, expected=2)

With this, we can execute the tests and analyze the result with the help of `analyze_config()` from SFLKit.

In [14]:
# def analyze():
#     run_tests()
#     return analyze_config(get_config())

Let's execute the tests and analyze the event logs for lines and the Tarantula metric.

In [15]:
#results = analyze()

The results look something like this:

In [16]:
#results

This structure maps analysis objects and metrics to a list of sorted suggestions where the fault occurs.

Now, we can put all this together and produce a pretty output that shows us where the fault originates by leveraging `SFLKit`'s `ColorCode` object.

As you can see, the analysis indeed suggests the buggy line 7 as the most suspicious.

But what if lines are not enough to show the fault?

What if the metric we have chosen for evaluation is insufficient?

### Extension using Avicenna
We seek to utilize SFLKit's instrmentation of program code line events to determine constraints describing a specific line's reachability with Avicenna. 

1. We must instrument our test program, middle, and use that output.
2. We access the event files and check what line was called, oracle determines YES or NO.
3. If yes, learn from that event file, make constraint loop with Avicenna
4. If no, call other event files to check for line event YES.


Instrumentation for line events is coded as follows: (event_type, file_name, line_num, event_id)


First we need to instrument our event files for a given program or function.

In [17]:
def run_test_and_instrument(middle_input, event_files_path, iterator):
    
    os.environ['EVENTS_PATH'] = os.path.join(event_files_path, str(iterator))
    
    expected_return = sorted(middle_input)[1]
    test_tmp(middle_input[0],
             middle_input[1],
             middle_input[2],
             expected=expected_return) # instrumenting middle

The oracle has to be able to determine if a given line is called when a test case is run. This means we need to call sflkit's instrumentation for each test case when determining whether it is valuable and can be used for finding constraints that lead to the desired line.

In [18]:
import string
import csv
import logging

from pathlib import Path
#from avicenna import Avicenna
from debugging_framework.oracle import OracleResult
from isla.solver import ISLaSolver


We can now go through our event-files folder and iterate through each file, representing a test case. The relevant test cases will be denoted by an OracleResult.BUG for now. We can use this behavior-triggering input as our input for a run of Avicenna to find the constraints leading to this behavior.

In [19]:
# problems arose with mismatched input types. middle takes int inputs, but the input through avicenna works with 
# strings. therefore we need to convert these types for each function when required.
def oracle_run_test_and_instrument(middle_input, event_files_path, event_file_num):
    
    #if os.path.exists(event_files_path):
    #    shutil.rmtree(event_files_path) # ** can be removed, will collect trash at the end of run ** 
    #os.mkdir(event_files_path) # just needs to happen once before the run now 
    #os.environ['EVENTS_PATH'] = os.path.join(event_files_path, event_file_num)
    
    
    # convert inputs
    middle_input = middle_input.__str__()
    middle_input = middle_input.split(',')
    
    sorted_inputs = [int(middle_input[0]), int(middle_input[1]), int(middle_input[2])]
    sorted_inputs.sort()
    expected = sorted_inputs[1]
    
    test_tmp(int(middle_input[0]), int(middle_input[1]), int(middle_input[2]), expected=expected) # instrumenting middle

In [20]:
# add desired line too 
def oracle_middle_sfl(test_case):
        
    # #is file path valid?
    # if event_file_path.exists() == False or event_file_path.is_dir() == True:
    #     # add error handling 
    #     return OracleResult.UNDEF
    
    # check if desired_line was passed as int and not as str
        # QUESTION should we only accept str to begin with? check for numbers only?
    #if desired_line.type() == int:
    #   desired_line = str(desired_line)
    path = Path('./event-files/')
    oracle_run_test_and_instrument(test_case, path, '0') # replace this with the event file instrumentation
    
    with open('./event-files/0', newline='') as event_file:
        
        # split each line at , to get columns separated
        for line in event_file.readlines():
            reached_line = line.split(',')
            #print(reached_line)
            if reached_line[-2] == '12':   
                return OracleResult.FAILING
            
        return OracleResult.PASSING 

In [21]:
#print(oracle_middle_sfl('2,3,1').value)

In [22]:
#print(oracle_middle_sfl('8,24,4'))

In [23]:
#print(oracle_middle_sfl('121,2,23'))

In [24]:
#print(oracle_middle_sfl('2,1,3'))

The oracle used in this run of Avicenna will be the oracle checking against line-event calls. This requires us to run sflkit using Avicenna's new hypotheses after every run. 

- TODO : update Avicenna and fix up prototype for it 
- TODO : implement prototype functionality using SFLKit and Avicenna 0.2 

#### Testing inputs for regular middle bug. 

In [25]:
middle_grammar = {
    "<start>": ["<stmt>"],
    "<stmt>": ["<x>,<y>,<z>"],
    "<x>": ["<integer>"],
    "<y>": ["<integer>"],
    "<z>": ["<integer>"],
    "<integer>": ["<digit>", "<digit><integer>"],
    "<digit>": [str(num) for num in range(1, 10)]
}


middle_grammar_converted = {
    "<start>": ["<stmt>"],
    "<stmt>": ["str.to.int(<x>),str.to.int(<y>),str.to.int(<z>)"],
    "<x>": ["<integer>"],
    "<y>": ["<integer>"],
    "<z>": ["<integer>"],
    "<integer>": ["<digit>", "<digit><integer>"],
    "<digit>": [str(num) for num in range(1, 10)]
}

In [26]:
# 321, 312, 213
middle_inputs = ['2,3,1', '3,1,2', '2,1,3']

In [27]:
path = Path('./event-files/')
if os.path.exists(path):
    shutil.rmtree(path) 
os.mkdir(path) 
os.environ['EVENTS_PATH'] = os.path.join(path, '0')

In [47]:
avicenna = Avicenna(
    grammar=middle_grammar,
    initial_inputs=middle_inputs,
    oracle=oracle_middle_sfl,
    max_iterations=10,
    #max_excluded_features=4, 
)

{Input(('2,1,3', <OracleResult.PASSING: 'PASSING'>)), Input(('3,1,2', <OracleResult.PASSING: 'PASSING'>)), Input(('2,3,1', <OracleResult.PASSING: 'PASSING'>))}


AssertionError: Avicenna requires at least one failure-inducing input!

In [29]:
# import warnings
# # Suppress the specific SHAP warning
# warnings.filterwarnings(
#     "ignore",
#     message="LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray",
# )
# warnings.filterwarnings(
#     "ignore", 
#     message="No further splits with positive gain, best gain: -inf"
# )

In [30]:
logging.basicConfig(filename='avicenna.log', filemode='w', encoding='utf-8', level=logging.INFO, force=True)
# only 2 constraints used in the end why
best_invariant = avicenna.explain() # unparse with islaunparse for further use


before finalize
0.6758409785932722 0.921875


In [45]:
print(best_invariant)
print(best_invariant[0])

(ConjunctiveFormula(ForallFormula(BoundVariable("elem_1", "<y>"), Constant("start", "<start>"), ExistsFormula(BoundVariable("elem_2", "<z>"), Constant("start", "<start>"), SMTFormula('(> (str.to_int elem_1) (str.to_int elem_2))', BoundVariable("elem_1", "<y>"), BoundVariable("elem_2", "<z>"), ))), ForallFormula(BoundVariable("elem", "<y>"), Constant("start", "<start>"), SMTFormula('(>= (str.to_int elem) (str.to_int "9"))', BoundVariable("elem", "<y>"), ))), 0.6758409785932722, 0.921875)
(∀ elem_1 ∈ start: (∃ elem_2 ∈ start: (StrToInt(elem_1) > StrToInt(elem_2))) ∧ ∀ elem ∈ start: (StrToInt(elem) >= StrToInt("9")))


In [32]:
solver = ISLaSolver(
    grammar=middle_grammar,
    formula=best_invariant[0],
    enable_optimized_z3_queries=False,
)

# this isnt working rn why not raaaaaaaaaaaaaaaaaaaaaaa
# should be inputs of type 2, 3, 1
for _ in range(1,10):
    print(solver.solve())


31,16,4
96,16,4
254,16,4
7,16,4
8,16,4
1,16,4
5,16,4
6,16,4
34,16,4


In [33]:
# call func for middle, converts string input to usable integer values

def call_func_middle(inp: str):
    
    inp = inp.__str__()
    
    middle_input = inp.split(',')
    converted_inp =  [int(middle_input[0]), int(middle_input[1]), int(middle_input[2])]
    return converted_inp

In [34]:
test_path = Path('rsc/')
print(Path.exists(test_path))


True


In [35]:
from avicenna.avix import *
from avicenna.oracle_construction import * 

In [36]:
def middle_inp_conv(inp):
    inp = inp.__str__()
    middle_input = inp.split(',')
    
    converted_inp = [
        int(middle_input[0]),
        int(middle_input[1]),
        int(middle_input[2])
    ]
    
    return converted_inp

In [37]:
import tmp
importlib.reload(tmp)
tmp.sflkitlib.lib.reset()
avix_oracle = construct_oracle(program_under_test='middle',
                            inp_converter=middle_inp_conv,
                            timeout=10,
                            line = 7,
                            resource_path='rsc/',
                            program_oracle= None)


In [38]:
avix_oracle('2,1,3')

<OracleResult.FAILING: 'FAILING'>

In [39]:
# # double check if this works

# AviX.create_event_file(instrumented_function='middle',
#                        #instr_path='tmp',
#                        inp = '2,1,3',
#                        conversion_func=middle_inp_conv,
#                        event_path='rsc/event_file'
#                        )

In [40]:
avix = AviX(grammar=middle_grammar,
            initial_inputs=middle_inputs,
            oracle=avix_oracle,
            max_iterations=10,
            desired_line=7,
            put_path='middle.py',
            #instr_path='instrumented.py',
            min_precision = 0.7,)

In [41]:
logging.basicConfig(filename='avicenna.log', filemode='w', encoding='utf-8', level=logging.INFO, force=True)
invariants = avix.avixplain()

before finalize
0.6680672268907564 1.0


In [42]:
solver = ISLaSolver(
    grammar=middle_grammar,
    formula=invariants[0],
    enable_optimized_z3_queries=False,
)
#solver.solve()
for invariant in invariants:
    print(invariant)

(∀ elem_1 ∈ start: (∃ elem_2 ∈ start: (StrToInt(elem_1) > StrToInt(elem_2))) ∧ ∀ elem ∈ start: (StrToInt(elem) <= StrToInt("73")))
0.6680672268907564
1.0


In [43]:
from fuzzingbook.GrammarFuzzer import GrammarFuzzer, DerivationTree

middle_grammar_converted = {
    "<start>": ["<stmt>"],
    "<stmt>": ["str.to.int(<x>),str.to.int(<y>),str.to.int(<z>)"],
    "<x>": ["<integer>"],
    "<y>": ["<integer>"],
    "<z>": ["<integer>"],
    "<integer>": ["<digit>", "<digit><integer>"],
    "<digit>": [str(num) for num in range(1, 10)]
}

fuzzer = GrammarFuzzer(middle_grammar)

for i in range(10):
    print(fuzzer.fuzz())

634,9,8142
6,4,69
4184,99,7
3,7462,973
513,2,168
7,56,6
7,5615,286638
5,23,57
854,6,9
7,3,61


In [44]:
fuzzer_solver = ISLaSolver(grammar=middle_grammar,
                           formula=invariants[0])

for _ in range(1,10):
    print(solver.solve())

17,1,2
43,1,2
9,1,2
8,1,2
65,1,2
211,1,2
56,1,2
74,1,2
6,1,2


https://github.com/uds-se/sflkit

<img src="qrcode.png" style="width:500px">