# Concolic fuzzing

在信息流这一章中，我们已经看到了动态污染如何通过指示输入的哪一部分到达了有趣的地方来引导模糊。然而，动态污染跟踪所能传播的信息有限。例如，我们可能想了解当输入的某些属性发生变化时会发生什么?

In [None]:
from fuzzingbook.fuzzingbook_utils.Coverage import Coverage
from fuzzingbook.fuzzingbook_utils.Fuzzer import Fuzzer
from fuzzingbook.fuzzingbook_utils.GrammarFuzzer import GrammarFuzzer
import inspect
import z3
import re
import subprocess
import tempfile
import random

In [None]:
class ArcCoverage(Coverage):
    def traceit(self, frame, event, args):
        if event != 'return':
            f = inspect.getframeinfo(frame)
            self._trace.append((f.function, f.lineno))
        return self.traceit

    def arcs(self):
        t = [i for f, i in self._trace]
        return list(zip(t, t[1:]))

In [None]:
def factorial(n):
    if n < 0:
        return None
    if n == 0:
        return 1
    if n == 1:
        return 1
    v = 1
    while n != 0:
        v = v * n
        n = n - 1
    return v

In [None]:
with ArcCoverage() as cov:
    factorial(5)

In [None]:
cov.arcs()

可以看到，有的路径没有执行。

一种方法是使用符号变量来表示输入，对约束进行编码，并使用SMT求解器来解决相应约束的否定。

## SMT 求解器

[SMT求解技术的发展及最新应用研究综述](http://crad.ict.ac.cn/CN/abstract/abstract3470.shtml)

官网： [z3 -- github](https://github.com/Z3Prover/z3) | [z3的document](https://github.com/Z3Prover/z3/wiki/Documentation)

notice：[Z3Py Guide](https://ericpony.github.io/z3py-tutorial/guide-examples.htm) | [Programming Z3](https://theory.stanford.edu/~nikolaj/programmingz3.html)

`pip install z3-solver`

In [None]:
x = z3.Int('x')
y = z3.Int('y')

s = z3.Solver()
print(s)

s.add(x > 10, y == x + 2)
print(s)
print("Solving constraints in the solver s ...")
if s.check() == z3.sat:
    print(s.model())
else:
    print("un-solved")

print("Create a new scope...")
s.push()
s.add(y < 11)
print(s)
print("Solving updated set of constraints...")
if s.check() == z3.sat:
    print(s.model())
else:
    print("un-solved")

print("Restoring state...")
s.pop()
print(s)
print("Solving restored set of constraints...")
if s.check() == z3.sat:
    print(s.model())
else:
    print("un-solved")

In [None]:
# print(z3.get_version())
# set_option: Alias for 'set_param' for backward compatibility.
# set_param : Set Z3 global (or module) parameters
# 但是有哪些全局设置？找了一圈没有找见。
# 下面两个是，指定z3str3 solver，超时设置30 seconds
z3.set_option('smt.string_solver', 'z3str3')
z3.set_option('timeout', 30 * 1000)  # milliseconds

In [None]:
# 对约束进行编码需要声明一个对应于输入n的符号变量。
zn = z3.Int('n')
predicates = [z3.Not(zn < 0), z3.Not(zn == 0)]
print(predicates)
z3.solve(predicates)

使用cov可以看出哪些路径没有执行.使用SMT解析器,给出相反的约束,执行没有执行的路径.如此迭代,使得每条路径得以覆盖.如果这个过程可以自动化实现,那会很棒.

## A Concolic Tracer

In [None]:
class ConcolicTracer:
    def __init__(self,context=None):
        # context参数包含到目前为止看到的符号变量的声明，如果有前置条件，则包含前置条件。
        self.context = context if context is not None else ({},[])
        self.decls,self.path = self.context
    
    def __enter__(self):
        return self
    
    def __exit__(self,exc_type,exc_value,tb):
        return
    
    def __getitem__(self,fn):
        # 我们使用自省来确定函数的参数，该参数被连接到getitem方法中。
        # __getitem__,调用此方法以实现 self[key] 的求值
        # Python自省（反射）指南:https://www.cnblogs.com/huxi/archive/2011/01/02/1924317.html
        self.fn = fn
        self.fn_args = {i:None for i in inspect.signature(fn).parameters }
        return self
    
    def __call__(self,*args):
        # 此方法会在实例作为一个函数被“调用”时被调用；
        self.result = self.fn(*self.concolic(args))
        return self.result
    
    def concolic(self,args):
        # 稍后将修改它以生成符号变量。
        return args

In [None]:
with ConcolicTracer() as _:
    # ConcolicTracer的对象为_;
    # 对象_调用__getitem__方法,使得self.fn=factorial
    # ()调用__call__方法
    # 参数1,经过concolic修改,传递给fn函数.
    # 执行结果保存在
    result = _[factorial](1)
print(result)
_.context

## Concolic Proxy Objects

[staticmethod的使用](https://www.pynote.net/archives/1388):介绍了staticmethod的使用,装饰器,闭包.很好的文章

[python的值传递和引用传递](https://www.cnblogs.com/CheeseZH/p/5165283.html):在Python中，数字、字符或者元组等不可变对象类型都属于值传递，而字典dict或者列表list等可变对象类型属于引用传递。

In [None]:
def zproxy_create(cls,sname,z3var,context,zn,v=None):
    # 该方法给定一个类名，正确地创建该类的实例和相应的符号变量，并在上下文信息上下文中注册符号变量。
    zv = cls(context,z3var(zn),v) # 创建该类的实例;z3var(zn)为创建相应的符号变量;v是符号变量具体的值;
    context[0][zn] = sname # 在上下文中注册(引用传递);注意这里都是在第一个元素中修改的.
    return zv

### A Proxy Class for Booleans

In [None]:
class zbool:
    # 下面这个类方法,在数据预处理这里(先预处理,载调用init方法),很漂亮!!!
    @classmethod
    def create(cls,context,zn,v):
        # (zbool类,)传入上下文环境,待转化为符号变量的名称(字符串),符号变量具体的值
        return zproxy_create(cls,'Bool',z3.Bool,context,zn,v)
    
    def __init__(self,context,z,v=None):
        # 注意这里并没有强制检查z为Boolen类型
        self.context, self.z, self.v = context,  z, v
        self.decl, self.path = self.context
    
    def __not__(self):
        # object中没有这个方法,https://docs.python.org/zh-cn/3/reference/datamodel.html
        # 所以不能not zbool对象
        # 这里仅仅是定义一个普通的方法
        return zbool(self.context,z3.Not(self.z),not self.v)
    
    def __bool__(self):
        # 调用此方法以实现真值检测以及内置的 bool() 操作
        # 这个对象可以用来判断Trur or False
        # 这里把谓词,注册到条件上
        r,pred = (True,self.z) if self.v else (False,z3.Not(self.z))
        self.path.append(pred)
        return r

In [None]:
with ConcolicTracer() as _:
    # za, zb = z3.Ints('a b')

    # 调用zbool的类方法create
    # z3.Bool:Return a Boolean constant named `name`. If `ctx=None`, then the global context is used.
    # 使用z3.Bool,创建名为my_bool_arg的Boolean类型常量,它具体的值为True
    # 将符号变量my_bool_arg,加入总环境中,它的类型是bool;(存储方式为字典)
    # 返回的对象中包含五个成员变量,见init方法
    val = zbool.create(_.context,'my_bool_arg',True)
    val_2 = val.__not__()

    if val:
        print("success")

print(val.z, val.v)
print(type(val.z), type(val.v))
print(_.context)
print(val.context)
assert id(_.context) == id(val.context)
print(val_2.z, val_2.v)

### A Proxy Class for Integers

In [None]:
class zint(int):
    def __new__(cls,context,zn,v,*args,**kw):
        return int.__new__(cls,v,*args,**kw)
    
    @classmethod
    def create(cls,context,zn,v=None):
        # z3.Int:Return an integer constant named `name`. If `ctx=None`, then the global context is used.
        return zproxy_create(cls,'Int',z3.Int,context,zn,v)
    
    def __init__(self,context,z,v=None):
        self.z, self.v, self.context = z, v, context
    
    def __int__(self):
        return self.v
    
    # def __pos__(self):
    #     # 调用此方法以实现一元算术运算+
    #     # 后面有这个方法了
    #     return self.v
    
    def _zv(self,o):
        return (o.z, o.v) if isinstance(o, zint) else (z3.IntVal(o), o) 
    
    def __ne__(self, other):
        z, v = self._zv(other)
        return zbool(self.context, self.z != z, self.v != v)

    def __eq__(self, other):
        z, v = self._zv(other)
        return zbool(self.context, self.z == z, self.v == v)
    
    def __req__(self, other):
        return self.__eq__(other)


    def __lt__(self, other):
        z, v = self._zv(other)
        return zbool(self.context, self.z < z, self.v < v)

    def __gt__(self, other):
        z, v = self._zv(other)
        return zbool(self.context, self.z > z, self.v > v)

    def __le__(self, other):
        z, v = self._zv(other)
        return zbool(self.context, z3.Or(self.z < z, self.z == z),
                     self.v < v or self.v == v)

    def __ge__(self, other):
        z, v = self._zv(other)
        return zbool(self.context, z3.Or(self.z > z, self.z == z),
                     self.v > v or self.v == v)
    
    def __bool__(self):
        # 对象存在则为真。并不判断self.v

        # 因为条件判断需要原始的布尔值。
        # 如果返回这个，就好了。这个再调用下自己的__bool__
        # return zbool(self.context, self.z, self.v) <-- not allowed

        # force registering boolean condition
        # 这调用上面的__ne__，返回一个zint对象
        # if zint对象，调用该对象的__bool__方法
        if self != 0:
            return True
        return False

In [None]:
######## 整数的二元运算 ################
INT_BINARY_OPS = [
    '__add__',
    '__sub__',
    '__mul__',
    '__truediv__',
    # '__div__',
    '__mod__',
    # '__divmod__',
    '__pow__',
    # '__lshift__',
    # '__rshift__',
    # '__and__',
    # '__xor__',
    # '__or__',
    '__radd__',
    '__rsub__',
    '__rmul__',
    '__rtruediv__',
    # '__rdiv__',
    '__rmod__',
    # '__rdivmod__',
    '__rpow__',
    # '__rlshift__',
    # '__rrshift__',
    # '__rand__',
    # '__rxor__',
    # '__ror__',
]

def make_int_binary_wrapper(fname, fun, zfun):
    def proxy(self, other):
        z, v = self._zv(other)
        z_ = zfun(self.z, z)
        v_ = fun(self.v, v)
        if isinstance(v_, float):
            # we do not implement float results yet.
            assert round(v_) == v_
            v_ = round(v_)
        return zint(self.context, z_, v_)

    return proxy

def initialize():
    for fn in INITIALIZER_LIST:
        fn()


def init_concolic_1():
    for fname in INT_BINARY_OPS:
        fun = getattr(int, fname)
        # ArithRef Class Reference:https://z3prover.github.io/api/html/classz3py_1_1_arith_ref.html
        zfun = getattr(z3.ArithRef, fname)
        setattr(zint, fname, make_int_binary_wrapper(fname, fun, zfun))

###### 整数的一元运算 ###########
INT_UNARY_OPS = [
    '__neg__',
    '__pos__',
    # '__abs__',
    # '__invert__',
    # '__round__',
    # '__ceil__',
    # '__floor__',
    # '__trunc__',
]

def make_int_unary_wrapper(fname, fun, zfun):
    def proxy(self):
        return zint(self.context, zfun(self.z), fun(self.v))

    return proxy

def init_concolic_2():
    for fname in INT_UNARY_OPS:
        fun = getattr(int, fname)
        zfun = getattr(z3.ArithRef, fname)
        setattr(zint, fname, make_int_unary_wrapper(fname, fun, zfun))

# 目前需要初始化的之后init_concolic_1。后面多了，循环初始化：initialize()
INITIALIZER_LIST = []
INITIALIZER_LIST.append(init_concolic_1)
INITIALIZER_LIST.append(init_concolic_2)
init_concolic_1()
init_concolic_2()

In [None]:
with ConcolicTracer() as _:
    val = zint.create(_.context, 'int_arg', 0)
    print(val.z, val.v)

    print(val._zv(0))
    print(val._zv(val))

    ia = zint.create(_.context, 'int_a', 0)
    ib = zint.create(_.context, 'int_b', 0)
    v1 = ia == ib
    v2 = ia != ib
    v3 = 0 != ib
    print(v1.z, v2.z, v3.z)


    ia = zint.create(_.context, 'int_a', 0)
    ib = zint.create(_.context, 'int_b', 1)
    v1 = ia > ib
    v2 = ia < ib
    print(v1.z, v2.z)
    v3 = ia >= ib
    v4 = ia <= ib
    print(v3.z, v4.z)


    ia = zint.create(_.context, 'int_a', 0)
    ib = zint.create(_.context, 'int_b', 1)
    print((ia + ib).z)
    print((ia + 10).z)
    print((11 + ib).z)
    print((ia - ib).z)
    print((ia * ib).z)
    print((ia / ib).z)
    print((ia ** ib).z)


    ia = zint.create(_.context, 'int_a', 0)
    print((-ia).z)
    print((+ia).z)

    za = zint.create(_.context, 'int_a', 1)
    zb = zint.create(_.context, 'int_b', 0)
    # 并没有给zint类，赋予__and__属性
    # 这里的运算逻辑是，za的__bool__,返回true；zb的__bool__，返回false
    # true and false
    if za and zb:
        print('1')

_.context

### Translating to the SMT Expression Format

[SMT-LIB官网地址](http://smtlib.cs.uiowa.edu/language.shtml) | [The SMT-LIBv2 Language and Tools: A Tutorial](http://smtlib.github.io/jSMTLIB/SMTLIBTutorial.pdf) | [The SMT-LIBv2 Language and Tools: A Tutorial 的中文翻译](https://tongtianta.site/paper/69887)

In [None]:
def triangle(a, b, c):
    assert a > 0
    assert b > 0
    assert c > 0
    if a == b:
        if b == c:
            return 'equilateral'
        else:
            return 'isosceles'
    else:
        if b == c:
            return 'isosceles'
        else:
            if a == c:
                return 'isosceles'
            else:
                return 'scalene'

In [None]:
class ConcolicTracer(ConcolicTracer):
    def smt_expr(self, show_decl=False, simplify=False, path=[]):
        # 将self.decls中声明的变量，转换成s表达式格式
        # path中所有的约束条件合取，使用z3.And()；生成的s表达式，使用z3.simplify()进行简化;使用sexpr()返回。
        # 我尝试了下，加上sexpr()不是单纯的返回字符串。会
        r = []
        if show_decl:
            for decl in self.decls:
                v = self.decls[decl]
                v = '(_ BitVec 8)' if v == 'BitVec' else v
                r.append("(declare-const %s %s)" % (decl, v))
        path = path if path else self.path
        if path:
            path = z3.And(path)
            if show_decl:
                if simplify:
                    return '\n'.join([
                        *r,
                        "(assert %s)" % z3.simplify(path).sexpr()
                    ])
                else:
                    return '\n'.join(
                        [*r, "(assert %s)" % path.sexpr()])
                        # [*r, "(assert %s)" % path])
            else:
                return z3.simplify(path).sexpr()
        else:
            return ''

In [None]:
with ConcolicTracer() as _:
    za = zint.create(_.context, 'int_a', 1)
    zb = zint.create(_.context, 'int_b', 1)
    zc = zint.create(_.context, 'int_c', 1)
    triangle(za, zb, zc)
print(_.context)
print(_.smt_expr(show_decl=True))
z3.solve(_.path)

## Generating Fresh Names

In [None]:
COUNTER = 0
def fresh_name():
    global COUNTER
    COUNTER += 1
    return COUNTER

def reset_counter():
    global COUNTER
    COUNTER = 0

class ConcolicTracer(ConcolicTracer):
    def __enter__(self):
        reset_counter()
        return self

    def __exit__(self, exc_type, exc_value, tb):
        return

### Translating Arguments to Concolic Proxies

In [None]:
class ConcolicTracer(ConcolicTracer):
    def concolic(self, args):
        # 推测参数的类型，从而使用对应的包装器。int类型使用zint。
        # 新的符号名的创建：函数名+参数名+参数类型名+全局递增的一个数字
        # 函数的参数名和对应创建的符号名，使用字典格式，存储在fn_args中
        # 返回包装过后的参数对象（满足符号执行需求的参数：符号+具体的值）
        my_args = []
        for name, arg in zip(self.fn_args, args):
            t = type(arg).__name__
            zwrap = globals()['z' + t]
            vname = "%s_%s_%s_%s" % (self.fn.__name__, name, t, fresh_name())
            my_args.append(zwrap.create(self.context, vname, arg))
            self.fn_args[name] = vname
        return my_args

In [None]:
with ConcolicTracer() as _:
   result = _[factorial](5)
   print(type(result))
   print(result) # zint没有重定义__str__方法，为啥可以直接调用print方法？？
print(_.context)
print(_.smt_expr(show_decl=True)) # 会看到let，ditsinct等，是sexpr()作用的结果

### Evaluating the Concolic Expressions

In [None]:
class ConcolicTracer(ConcolicTracer):
    def zeval(self, python=False, log=False):
        # 如果python为真，使用z3的python的API进行求解
        # 如果python为假，使用将SMT表达式写入文件，使用命令行z3进行求解
        r, sol = (zeval_py if python else zeval_smt)(self.path, self, log)
        if r == 'sat':
            return r, {k: sol.get(self.fn_args[k], None) for k in self.fn_args}
        else:
            return r, None

def zeval_py(path, cc, log):
    # The command Solver() creates a general purpose solver. 
    # Constraints can be added using the method add.We say the constraints have been asserted in the solver.
    # The method check() solves the asserted constraints. 
    # The result is sat (satisfiable) if a solution was found. 
    #  We say the solution is a model for the set of asserted constraints. A model is an interpretation that makes each asserted constraint true. 
    for decl in cc.decls:
        if cc.decls[decl] == 'BitVec':
            v = "z3.%s('%s', 8)" % (cc.decls[decl], decl)
        else:
            v = "z3.%s('%s')" % (cc.decls[decl], decl)
        exec(v)
    s = z3.Solver()
    s.add(z3.And(path))
    if s.check() == z3.unsat:
        return 'No Solutions', {}
    elif s.check() == z3.unknown:
        return 'Gave up', None
    assert s.check() == z3.sat
    m = s.model()
    return 'sat', {d.name(): m[d] for d in m.decls()}


# 这个正则表达式什么意思？
# ?P<value>的意思就是命名一个名字为value的组，匹配规则符合后面的/d+
SEXPR_TOKEN = r'''(?mx)
    \s*(?:
        (?P<bra>\()|
        (?P<ket>\))|
        (?P<token>[^"()\s]+)|
        (?P<string>"[^"]*")
       )'''

def parse_sexp(sexp):
    # 解析z3的求解结果。
    # 参数sexp的格式为：https://blog.csdn.net/weixin_39408343/article/details/102680614#t1
    stack, res = [], []
    for elements in re.finditer(SEXPR_TOKEN, sexp):
        kind, value = [(t, v) for t, v in elements.groupdict().items() if v][0]
        if kind == 'bra':
            stack.append(res)
            res = []
        elif kind == 'ket':
            last, res = res, stack.pop(-1)
            res.append(last)
        elif kind == 'token':
            res.append(value)
        elif kind == 'string':
            res.append(value[1:-1])
        else:
            assert False
    return res


def zeval_smt(path, cc, log):
    # 通过smt_expr()获取表达式
    # 将s表达式写入文件；末尾添加check-sat,get-model
    s = cc.smt_expr(True, True, path)

    with tempfile.NamedTemporaryFile(mode='w', suffix='.smt') as f:
        f.write(s)
        f.write("\n(check-sat)")
        f.write("\n(get-model)")
        f.flush()

        if log:
            print(s, '(check-sat)', '(get-model)', sep='\n')
        # subprocess.getoutput(cmd):返回在 shell 中执行 cmd 产生的输出（stdout 和 stderr）
        output = subprocess.getoutput("z3 " + f.name)

    if log:
        print(output)
    o = parse_sexp(output)

    kind = o[0]
    if kind == 'unknown':
        return 'Gave up', None
    elif kind == 'unsat':
        return 'No Solutions', {}
    assert kind == 'sat'
    # 不知道，为什么，我这里调用z3的输出结果这，没有model。（结果是对的）
    # 所以，这里简单修改下。
    # 注，o[1]中的每个元素是一项。每一项的第一个(下标从零开始)为符号，倒数第一个是值，倒数第二个是类型
    # assert o[1][0] == 'model'
    # return 'sat', {i[1]: (i[-1], i[-2]) for i in o[1][1:]}
    return 'sat', {i[1]: (i[-1], i[-2]) for i in o[1][:]}

In [None]:
with ConcolicTracer() as _:
    _[factorial](5)
print(_.zeval(python=True))
print(_.zeval())

到这里。简单总结下。
* 将传入的参数，转换成自定义类型。（这个类型包含三部分：参数类型，符号，具体值）
* 对于上面自定义的类型，在条件分支和(顺序执行的时候)，使用该类型的bool运算和整数运算(目前仅仅完成这两种)
* 在第二步中，记录对应的条件。这些条件组合起来，即是约束条件。
* 使用python的API，看是否有满足上面约束条件的解。（如果想执行另外某个相反的路径，对应的位置取翻，再次尝试看有无约束条件的解）。这些解，即是输入。
* 执行路径千万条。可能微小的输入改变，都会改变很大的路径覆盖情况。满足解，是通过路径找对应的输入。
* 仅仅使用z3的python的API，或许并不一定能满足我们的要求。SMT-LIB是统一的。我们可以将需要求解的内容，转换成s表达式，从而可以使用任意的SMT的Solver进行求解

### A Proxy Class for Strings

这个proxy的比较粗糙。大概了解下。

In [None]:
class zstr(str):
    def __new__(cls, context, zn, v):
        return str.__new__(cls, v)

    @classmethod
    def create(cls, context, zn, v=None):
        return zproxy_create(cls, 'String', z3.String, context, zn, v)

    def __init__(self, context, z, v=None):
        self.context, self.z, self.v = context, z, v
        self._len = zint(context, z3.Length(z), len(v))
        #self.context[1].append(z3.Length(z) == z3.IntVal(len(v)))
    
    def _zv(self, o):
        return (o.z, o.v) if isinstance(o, zstr) else (z3.StringVal(o), o)

In [None]:
def zord(context, c):
    bn = "bitvec_%d" % fresh_name()
    v = z3.BitVec(bn, 8)
    context[0][bn] = 'BitVec'
    # z3.Unit:Create a singleton sequence.不知道这什么东西。可能是把这8位看成整体，表示一个数。
    z = (z3.Unit(v) == c)
    context[1].append(z)
    return v

def zchr(context, i):
    sn = 'string_%d' % fresh_name()
    s = z3.String(sn)
    context[0][sn] = 'String'
    z = z3.And([s == z3.Unit(i), z3.Length(s) == 1])
    context[1].append(z)
    return s

In [None]:
with ConcolicTracer() as _:
    zc = z3.String('arg_%d' % fresh_name())
    zi = zord(_.context, zc) # zi是个b位的BitVec
print(_.context)
z3.solve(_.path + [zi == 65]) # 看下哪个字母的值是65
# 当，bitvec_2 = 65, arg_1 = "A"的时候
# 满足条件zi == 65
# 有点迷啊

In [None]:
with ConcolicTracer() as _:
    i = z3.BitVec('bv_%d' % fresh_name(), 8)
    zc = zchr(_.context, i)
print(_.context)
z3.solve(_.path + [zc == z3.StringVal('a')])

In [None]:
class zstr(zstr):
    def __eq__(self, other):
        z, v = self._zv(other)
        return zbool(self.context, self.z == z, self.v == v)

    def __req__(self, other):
        return self.__eq__(other)

    def __add__(self, other):
        z, v = self._zv(other)
        return zstr(self.context, self.z + z, self.v + v)

    def __radd__(self, other):
        return self.__add__(other)
    
    def __getitem__(self, idx):
        if isinstance(idx, slice):
            start, stop, step = idx.indices(len(self.v))
            assert step == 1  # for now
            assert stop >= start  # for now
            rz = z3.SubString(self.z, start, stop - start)
            rv = self.v[idx]
        elif isinstance(idx, int):
            rz = z3.SubString(self.z, idx, 1)
            rv = self.v[idx]
        else:
            assert False  # for now
        return zstr(self.context, rz, rv)

    def __iter__(self):
        return zstr_iterator(self.context, self)
    
    class zstr_iterator():
        def __init__(self, context, zstr):
            self.context = context
            self._zstr = zstr
            self._str_idx = 0
            self._str_max = zstr._len  # intz is not an _int_

        def __next__(self):
            if self._str_idx == self._str_max:  # intz#eq
                raise StopIteration
            c = self._zstr[self._str_idx]
            self._str_idx += 1
            return c

        def __len__(self):
            return self._len
        
    
    def upper(self):
        empty = ''
        ne = 'empty_%d' % fresh_name()
        result = zstr.create(self.context, ne, empty)
        self.context[1].append(z3.StringVal(empty) == result.z)
        cdiff = (ord('a') - ord('A'))
        for i in self:
            oz = zord(self.context, i.z)
            uz = zchr(self.context, oz - cdiff)
            rz = z3.And([oz >= ord('a'), oz <= ord('z')])
            ov = ord(i.v)
            uv = chr(ov - cdiff)
            rv = ov >= ord('a') and ov <= ord('z')
            if zbool(self.context, rz, rv):
                i = zstr(self.context, uz, uv)
            else:
                i = zstr(self.context, i.z, i.v)
            result += i
        return result


    def lower(self):
        empty = ''
        ne = 'empty_%d' % fresh_name()
        result = zstr.create(self.context, ne, empty)
        self.context[1].append(z3.StringVal(empty) == result.z)
        cdiff = (ord('a') - ord('A'))
        for i in self:
            oz = zord(self.context, i.z)
            uz = zchr(self.context, oz + cdiff)
            rz = z3.And([oz >= ord('A'), oz <= ord('Z')])
            ov = ord(i.v)
            uv = chr(ov + cdiff)
            rv = ov >= ord('A') and ov <= ord('Z')
            if zbool(self.context, rz, rv):
                i = zstr(self.context, uz, uv)
            else:
                i = zstr(self.context, i.z, i.v)
            result += i
        return result
    

    def startswith(self, other, beg=0, end=None):
        assert end is None  # for now
        assert isinstance(beg, int)  # for now
        zb = z3.IntVal(beg)

        others = other if isinstance(other, tuple) else (other, )

        last = False
        for o in others:
            z, v = self._zv(o)
            r = z3.IndexOf(self.z, z, zb)
            last = zbool(self.context, r == zb, self.v.startswith(v))
            if last:
                return last
        return last
    

    def find(self, other, beg=0, end=None):
        assert end is None  # for now
        assert isinstance(beg, int)  # for now
        zb = z3.IntVal(beg)
        z, v = self._zv(other)
        zi = z3.IndexOf(self.z, z, zb)
        vi = self.v.find(v, beg, end)
        return zint(self.context, zi, vi)
    

    def rstrip(self, chars=None):
        if chars is None:
            chars = string.whitespace
        if self._len == 0:
            return self
        else:
            last_idx = self._len - 1
            cz = z3.SubString(self.z, last_idx.z, 1)
            cv = self.v[-1]
            zcheck_space = z3.Or([cz == z3.StringVal(char) for char in chars])
            vcheck_space = any(cv == char for char in chars)
            if zbool(self.context, zcheck_space, vcheck_space):
                return zstr(self.context, z3.SubString(self.z, 0, last_idx.z),
                            self.v[0:-1]).rstrip(chars)
            else:
                return self
    
    def lstrip(self, chars=None):
        if chars is None:
            chars = string.whitespace
        if self._len == 0:
            return self
        else:
            first_idx = 0
            cz = z3.SubString(self.z, 0, 1)
            cv = self.v[0]
            zcheck_space = z3.Or([cz == z3.StringVal(char) for char in chars])
            vcheck_space = any(cv == char for char in chars)
            if zbool(self.context, zcheck_space, vcheck_space):
                return zstr(self.context, z3.SubString(
                    self.z, 1, self._len.z), self.v[1:]).lstrip(chars)
            else:
                return self
    
    def strip(self, chars=None):
        return self.lstrip(chars).rstrip(chars)
    

    def split(self, sep=None, maxsplit=-1):
        assert sep is not None  # default space based split is complicated
        assert maxsplit == -1  # for now.
        zsep = z3.StringVal(sep)
        zl = z3.Length(zsep)
        # zi would be the length of prefix
        zi = z3.IndexOf(self.z, zsep, z3.IntVal(0))
        # Z3Bug: There is a bug in the `z3.IndexOf` method which returns
        # `z3.SeqRef` instead of `z3.ArithRef`. So we need to fix it.
        zi = z3.ArithRef(zi.ast, zi.ctx)

        vi = self.v.find(sep)
        if zbool(self.context, zi >= z3.IntVal(0), vi >= 0):
            zprefix = z3.SubString(self.z, z3.IntVal(0), zi)
            zmid = z3.SubString(self.z, zi, zl)
            zsuffix = z3.SubString(self.z, zi + zl,
                                   z3.Length(self.z))
            return [zstr(self.context, zprefix, self.v[0:vi])] + zstr(
                self.context, zsuffix, self.v[vi + len(sep):]).split(
                    sep, maxsplit)
        else:
            return [self]


In [None]:
# For easier debugging, we abort any calls to methods in str that are not overridden by zstr.
def make_str_abort_wrapper(fun):
    def proxy(*args, **kwargs):
        raise Exception('%s Not implemented in `zstr`' % fun.__name__)
    return proxy

def init_concolic_3():
    strmembers = inspect.getmembers(zstr, callable)
    zstrmembers = {m[0] for m in strmembers if len(
        m) == 2 and 'zstr' in m[1].__qualname__}
    for name, fn in inspect.getmembers(str, callable):
        # Omitted 'splitlines' as this is needed for formatting output in
        # IPython/Jupyter
        if name not in zstrmembers and name not in [
            'splitlines',
            '__class__',
            '__contains__',
            '__delattr__',
            '__dir__',
            '__format__',
            '__ge__',
            '__getattribute__',
            '__getnewargs__',
            '__gt__',
            '__hash__',
            '__le__',
            '__len__',
            '__lt__',
            '__mod__',
            '__mul__',
            '__ne__',
            '__reduce__',
            '__reduce_ex__',
            '__repr__',
            '__rmod__',
            '__rmul__',
            '__setattr__',
            '__sizeof__',
                '__str__']:
            setattr(zstr, name, make_str_abort_wrapper(fn))


INITIALIZER_LIST.append(init_concolic_3)
init_concolic_3()

In [None]:
with ConcolicTracer() as _:
    def tstr2(s):
        if s[0] == 'h' and s[1] == 'e' and s[3] == 'l':
            return True
        else:
            return False
    r = _[tstr2]('hello')
print(_.context)
_.zeval()

## Examples

### Triangle

In [None]:
with ConcolicTracer() as _:
    print(_[triangle](1, 2, 3))
print(_.path)
print(_.zeval())

In [None]:
za, zb, zc = [z3.Int(s) for s in _.context[0].keys()]

# _.path[1]=z3.Not(_.path[1]). # 如何在不改变原来path的情况下，使用新的路径;使用深度拷贝？

# fn_args = [i for i in inspect.signature(_.zeval).parameters ]
# print(fn_args)
# ['python', 'log']，也没有字典的形参啊
# {0,za==zb}作为条件，被判断为真，使用python进行求解。并不是修改路径内容。
assert _.zeval({0,za==zb}) == _.zeval(python=True)

## Fuzzing with Constraints

### SimpleConcolicFuzzer

SimpleConcolicFuzzer从一个由其他fuzzer生成的样本输入开始。然后，它运行ConcolicTracer下测试的函数，并收集路径谓词。然后，**随机否定路径内的谓词，并使用z3解决该问题，以产生一个保证走不同于原始路径的新输出**。

大概的数据结构思路：

父节点存储子节点的s表达式。注意将该表达式转换成非Not的表达式。

子节点是存储在父节点的children的字典中，{0:children_node,1:children_node}。

换句话说，添加节点，是父节点的两者这一的决策过程。同时将‘所添加节点’的可能决策路径放入一个叶子的集合中。

如果某一决策路径已经出现，将其移出现有的叶子节点，并放入完成路径的集合中。

大概的模糊测试思路：

从叶子节点中随机选择一个元素（目前没有出现过的谓词序列），使用约束求解器进行求解。如果求解成功，将结果返回。即该结果作为输入值，将会探寻新的路径。

In [None]:
class TraceNode:
    def __init__(self, smt_val, parent, info):
        # This is the smt that lead to this node
        self._smt_val = z3.simplify(smt_val) if smt_val is not None else None

        # This is the predicate that this node might perform at a future point
        self.smt = None
        self.info = info
        self.parent = parent
        self.children = {}
        self.path = None
        self.tree = None
        self._pattern = None
        self.log = True

    def no(self): return self.children.get(self.tree.no_bit)

    def yes(self): return self.children.get(self.tree.yes_bit)

    def get_children(self): return (self.no(), self.yes())

    def __str__(self):
        return 'TraceNode[%s]' % ','.join(self.children.keys())


    def bit(self):
        if self._smt_val is None:
            return None
        return self.tree.no_bit if self._smt_val.decl(
        ).name() == 'not' else self.tree.yes_bit

    def pattern(self):
        # 从根节点到叶子节点，bit的决策
        if self._pattern is not None:
            return self._pattern
        path = self.get_path_to_root()
        assert path[0]._smt_val is None
        assert path[0].parent is None

        self._pattern = ''.join([p.bit() for p in path[1:]])
        return self._pattern


    def add_child(self, elt, i, cc, string):
        if elt == z3.BoolVal(True):
            # No more exploration here. Simply unregister the leaves of *this*
            # node and possibly register them in completed nodes, and exit
            for bit in [self.tree.no_bit, self.tree.yes_bit]:
                child_leaf = self.pattern() + ':' + bit
                if child_leaf in self.tree.leaves:
                    del self.tree.leaves[child_leaf]
            self.tree.completed_paths[self.pattern()] = self
            return None

        child_node = TraceNode(smt_val=elt,
                               parent=self,
                               info={'num': i, 'cc': cc, 'string': string})
        child_node.tree = self.tree

        # bit represents the path that child took from this node.
        bit = child_node.bit()

        # first we update our smt decision
        if bit == self.tree.yes_bit:  # yes, which means the smt can be used as is
            if self.smt is not None:
                assert self.smt == child_node._smt_val
            else:
                self.smt = child_node._smt_val
        # no, which means we have to negate it to get the decision.
        elif bit == self.tree.no_bit:
            smt_ = z3.simplify(z3.Not(child_node._smt_val))
            if self.smt is not None:
                assert smt_ == self.smt
            else:
                self.smt = smt_
        else:
            assert False

        if bit in self.children:
            #    if self.log:
            #print(elt, child_node.bit(), i, string)
            #print(i,'overwriting', bit,'=>',self.children[bit],'with',child_node)
            child_node = self.children[bit]
            #self.children[bit] = child_node
            #child_node.children = old.children
        else:
            self.children[bit] = child_node

        # At this point, we have to unregister any leaves that correspond to this child from tree,
        # and add the plausible children of this child as leaves to be explored. Note that
        # if it is the end (z3.True), we do not have any more children.
        child_leaf = self.pattern() + ':' + bit
        if child_leaf in self.tree.leaves:
            del self.tree.leaves[child_leaf]

        pprefix = child_node.pattern() + ':'

        # Plausible children.
        for bit in [self.tree.no_bit, self.tree.yes_bit]:
            self.tree.leaves[pprefix +
                             bit] = PlausibleChild(child_node, bit, self.tree)
        return child_node


    def get_path_to_root(self):
        if self.path is not None:
            return self.path
        parent_path = []
        if self.parent is not None:
            parent_path = self.parent.get_path_to_root()
        self.path = parent_path + [self]
        return self.path

In [None]:
class PlausibleChild:
    def __init__(self, parent, cond, tree):
        self.parent = parent
        self.cond = cond
        self.tree = tree
        self._smt_val = None

    def __repr__(self):
        return 'PlausibleChild[%s]' % (self.parent.pattern() + ':' + self.cond)
    

    def smt_val(self):
        if self._smt_val is not None:
            return self._smt_val
        # if the parent has other children, then that child would have updatd the parent's smt
        # Hence, we can use that child's smt_value's opposite as our value.
        assert self.parent.smt is not None
        if self.cond == self.tree.no_bit:
            self._smt_val = z3.Not(self.parent.smt)
        else:
            self._smt_val = self.parent.smt
        return self._smt_val

    def cc(self):
        if self.parent.info.get('cc') is not None:
            return self.parent.info['cc']
        # if there is a plausible child node, it means that there can
        # be at most one child.
        sibilings = list(self.parent.children.values())
        assert len(sibilings) == 1
        # We expect at the other child to have cc
        return sibilings[0].info['cc']
    
    def path_expression(self):
        path_to_root = self.parent.get_path_to_root()
        assert path_to_root[0]._smt_val is None
        return [i._smt_val for i in path_to_root[1:]] + [self.smt_val()]

In [None]:
class TraceTree:
    def __init__(self):
        self.root = TraceNode(smt_val=None, parent=None, info={'num': 0})
        self.root.tree = self
        self.leaves = {}
        self.no_bit, self.yes_bit = '0', '1'

        pprefix = ':'
        for bit in [self.no_bit, self.yes_bit]:
            self.leaves[pprefix + bit] = PlausibleChild(self.root, bit, self)
        self.completed_paths = {}
    
    def add_trace(self, tracer, string):
        last = self.root
        i = 0
        for i, elt in enumerate(tracer.path):
            # 注意，这里的last被替换，不在是self.root
            last = last.add_child(elt=elt, i=i + 1, cc=tracer, string=string)
        last.add_child(elt=z3.BoolVal(True), i=i + 1, cc=tracer, string=string)

In [None]:
class SimpleConcolicFuzzer(Fuzzer):
    def __init__(self):
        self.ct = TraceTree()
        self.max_tries = 1000
        self.last = None
        self.last_idx = None
    
    def add_trace(self, trace, s):
        self.ct.add_trace(trace, s)

    def next_choice(self):
        #lst = sorted(list(self.ct.leaves.keys()), key=len)
        c = random.choice(list(self.ct.leaves.keys()))
        #c = lst[0]
        return self.ct.leaves[c]
    
    def get_newpath(self):
        node = self.next_choice()
        path = node.path_expression()
        return path, node.cc()
    

    def fuzz(self):
        # 对新的路径，使用约束求解器求解，返回求解得到的结果
        if self.ct.root.children == {}:
            # a random value to generate comparisons. This would be
            # the initial value around which we explore with concolic
            # fuzzing.
            return ' '
        for i in range(self.max_tries):   # 这里写的不好，有待优化
            path, last = self.get_newpath()
            s, v = zeval_smt(path, last, log=False)
            if s != 'sat':
                #raise Exception("Unexpected UNSAT")
                continue
            val = list(v.values())[0]
            elt, typ = val
            if len(elt) == 2 and elt[0] == '-':  # negative numbers are [-, x]
                elt = '-%s' % elt[1]
            # make sure that we do not retry the tried paths
            # The tracer we add here is incomplete. This gets updated when
            # the add_trace is called from the concolic fuzzer context.
            # self.add_trace(ConcolicTracer((last.decls, path)), elt)
            if typ == 'Int':
                return int(elt)
            elif typ == 'String':
                return elt
            return elt
        return None

In [None]:
def hang_if_no_space(s):
    i = 0
    while True:
        if i < len(s):
            if s[i] == ' ':
                break
        i += 1

In [None]:
with ConcolicTracer() as _:
    _[hang_if_no_space]('ab d')

# 调用SimpleConcolicFuzzer的init方法：self.ct=TraceTree()
# 接着调用TraceTree的init方法：root = TraceNode(smt_val=None, parent=None, info={'num': 0})；self.leaves[pprefix + bit] = PlausibleChild(self.root, bit, self)
# 接着调用TraceNode的init方法和PlausibleChild的init方法
scf = SimpleConcolicFuzzer()


# scf.ct.add_trace(_, 'ab d')
scf.add_trace(_,'ab d')


node = scf.next_choice()

In [None]:
[i._smt_val for i in scf.ct.root.get_children()[0].get_children()[
    0].get_children()[1].get_path_to_root()]

In [None]:
for key in scf.ct.leaves:
    print(key, '\t', scf.ct.leaves[key])

In [None]:
# scf = SimpleConcolicFuzzer()
for i in range(10):
    v = scf.fuzz()
    print(repr(v))
    if v is None:
        continue
    with ConcolicTracer() as _:
        _[hang_if_no_space](v)
    scf.add_trace(_, v)

SimpleConcolicFuzzer是相当有效的探索路径附近的路径与给定的样本输入。然而，当涉及到选择走哪条路时，它不是非常明智的。我们看看另一个模糊器，它将获得的谓词提升到语法中，并实现更好的模糊。

## ConcolicGrammarFuzzer

某些时候，求解器无论如何也无法有解。比如`Not(str.substr(str.substr(db_select_s_str_1, 7, 29), 23, 6) =="inventory")`。从字符串的第七的位置取长度为29的子串。然后再从该子串的第23的位置，取长度为6的子串。使得最后的子串等于"inventory"。由于长度不相等，这是无论如何也无法满足的。因为动态符号执行每次到这里都是失败，这可能导致程序无法深入的测试。

如果该输入有语法生成，那么我们可以在语法中加入，直接生成"inventory"的条件。

但是，程序是相对静止，数据是多变的。人工手动修改语法并不是一个好的办法。那如何自动实现这个功能？

我们举例讲解。

对于一个这样的字符串“'select I/k-z-A(z)/W(q,S) from G.2fN0'”，程序的比较中，需要最后一部分为'inventory'。

第一步：我们通过语法分析，得知“G.2fN0”是由文法规则中的‘<table>’推导出来的。

第二步：对z3字符串表达式分析，并收集所有与原始实参的子字符串的直接字符串比较。得知，“inventory”要和字符串从30位置开始，长度为6的子字符串（“G.2fN0”）比较。

第三步：结合第一步和第二部，我们在语法中，添加可以直接从‘<table>’推导出来'inventory'的规则。