New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature MemStruct #284

Merged
merged 44 commits into from Jan 18, 2016

Conversation

Projects
None yet
3 participants
@fmonjalet
Contributor

fmonjalet commented Nov 24, 2015

This PR introduces an API to easily interact with C structures in miasm's sandbox.

The example/jitter/memstruct.py example file may be the best introduction to this feature. As a spoiler, here is how a linked list can be represented with this API (extracted from the aforementioned file):

class ListNode(MemStruct):
    fields = [
        # The "<I" is the struct-like format of the pointer in memory, in this
        # case a Little Endian 32 bits unsigned int
        # One way to handle reference to ListNode in ListNode is to use the
        # special marker MemSelf.
        # You could also set or modify ListNode.fields after the class
        # declaration and call ListNode.gen_fields()
        ("next", Ptr("<I", MemSelf)),
        # Ptr(_, MemVoid) is analogous to void*, MemVoid is just an empty
        # MemStruct type
        ("data", Ptr("<I", MemVoid)),
    ]

class LinkedList(MemStruct):
    fields = [
        ("head", Ptr("<I", ListNode)),
        ("tail", Ptr("<I", ListNode)),
        # Num can take any one-field struct-like format, including floats and
        # doubles
        ("size", Num("<I")),
    ]

# [...]

link = LinkedList(jitter.vm, some_addr)
link.deref_head.data = other_addr
link.size += 1
# etc

Lots of FIXME/TODO are left there for now and lots of choices can be discussed, please tell me what you think!

-- Florent

@@ -0,0 +1,229 @@
#!/usr/bin/env python
"""This script is just a short example of common usages for miasm2.analysis.mem.

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

It seems that memstruct are more related to miasm2.core, as DiGraph than analysis module, such as miasm2.analysis.depgraph.

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

I agree, if it is ok with @serpilliere too, I will move it.

self.memset()
def get_head(self):
"""Returns the head ListNode instance"""

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

You choose to add a docstring here, but it is missing in others methods

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

That's right. Do you want me to add a docstring to the other methods, or just drop this one since it seems quite obvious?

print
# If you followed, memstr and data.array point to the same object, so:
raw_miams = '\x00'.join('Miams') + '\x00'*3

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

This is what miasm2.os_dep.commons.set_str_unic do

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Yes, but the point is to show the actual bytes that are written to memory, I think it is clearer for an example.

assert data.array.cast(MemStr, "utf16") == memstr
# Default is "ansi"
assert data.array.cast(MemStr) != memstr
assert data.array.cast(MemStr, "utf16").value == memstr.value

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

The equality data.array.cast(MemStr, "utf16") == memstr does not check that value attributes are the same?

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

No, these are two different strings in memory, corresponding to the same python str value, but with two different concrete encoding. That's why the value attribute is not used for equality. And even if it was, the encoding should also be used,and these two instances would not be eq.

However, the current implementation tests str(self) == str(other), so two different strings with different values and different encoding could currently be eq, if they have the same str(). This may not be the expected behaviour.

So the question is, should two MemStr be equal when they have the same memory representation AND the same encoding (i.e. "lala" in utf8 != "lala" in latin1)or only when they have the same memory representation? (I would go for the first)

# Let's play with strings
memstr = datastr.deref_data
# Note that memstr is MemStr(..., "utf16")
memstr.value = 'Miams'

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Is the typo intentional?

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Of course :). Do you want me to fix it, for trademark issues?

@@ -0,0 +1,1259 @@
"""This module provides classes to manipulate C structures backed by a VmMngr
object (a miasm VM virtual memory).

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

VM and virtual memory seem redundant to me

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

VM stands for Virtual Machine here, I will change it to sandbox to avoid confusion, thanks.

As you saw previously, to use this module, you just have to inherit from
MemStruct and define a list of (<field_name>, <field_definition>). Availabe

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Typo: Availabe

structure will be automatically allocated in memory:
my_heap = miasm2.os_dep.common.heap()
set_allocator(my_heap)

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

What is the expected interface of the allocator?
EDIT: It is defined later, but it may be duplicated here, in the doc (for instance allocator: func(VmMngr) -> integer_address)

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Will be fixed, thanks.

# Cache for dynamically generated MemStructs
DYN_MEM_STRUCT_CACHE = {}
def set_allocator(alloc_func):

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

It seems that you only use set_allocator externally, but this method doesn't provide any "improvement" (as check) compared to a mem.ALLOCATOR = func. Or you are keeping it for the associated docstring?

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

I'm keeping it for two reasons:

  • abstraction: the ability to hide the mechanism if it changes + possible checks, as you mentioned;
  • Ease of import: the user will probably want to import classes from the mem module, but it seems that you have to import the module in itself to set a global (or did I miss something?). Importing the function is easier.
return ' '*size + ('\n' + ' '*size).join(s.split('\n'))
# FIXME: copied from miasm2.os_dep.common and fixed

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

if this function is better than the one in miasm2.os_dep.common, do not hesitate to replace and import it

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Thanks, what would you think to remove the function from miasm.os_dep.common import it from miasm.core.mem instead?
Since this module is aimed at providing utilities to interract with the virtual memory, I think these functions may belong to it. Moreover, str encoding is not directly related to the underlying OS.

That said, if we do that, we have to beware of API breakage, since the unicode function is called get_str_utf16 rather than the more ambiguous get_str_unic. Also, these functions take a vm as a parameter, and not a jitter.

tmp = addr
while ((max_char is None or l < max_char) and
vm.get_mem(tmp, 1) != "\x00"):
tmp += 1

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

If you plan to rewrite this function:

Actually, tmp is always equal to addr + l, so this variable is useless.
Additionally, there is a double memory access (one for strlen, the second one for getting the actual string), which may not reflect the reality, and could be inefficient.
You may rewrite it with a list joined at the end.

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

It seems to be a good time to apply this fix, I will do it in this PR.

return vm.get_mem(addr, l).decode("latin1")
# TODO: get_raw_str_utf16 for length calculus

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

This function can be useful in others cases, do not hesitate to move it to os_dep.commons

vm.set_mem(addr, s + "\x00")
def set_str_utf16(vm, addr, s):

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

As for set_str_ansi, you may move these functions to os_dep.commons. The use of vm seems to be the usual case (and more compliant with the API of get_str_*)

def mem(field):
"""Generate a MemStruct subclass from a field. The field's value can
be accessed through self.value or self.deref_value if field is a Ptr.
"""

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Can you specify the expected format of @field?

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Yes, missing from the doc, sorry (it is a MemField instance)

def set(self, vm, addr, val):
"""Set a VmMngr memory from a value.
Args:

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

To be more homogeneous with other docstring in Miasm, you can use the following format :

"""Set a VmMngr memory from a value
@vm: VmMngr instance
@addr: the start adress in memory to set
@val: the python value to serialize in @vm at @addr
"""

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

I'll fix it, thanks.

"""
self._self_type = self_type
def size(self):

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

To be more homogenous in API, you can use a property for size

super(Ptr, self).__init__(fmt)
if isinstance(dst_type, MemField):
# Patch the field to propagate the MemSelf replacement
dst_type._get_self_type = lambda: self._get_self_type()

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Don't you prefer the form dst_type._get_self_type = self._get_self_type?

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

The two versions are actually not equivalent:

  • dst_type._get_self_type = lambda: self._get_self_type() will make dst_type always call the current _get_self_type of self at the time of the call.
  • dst_type._get_self_type = self._get_self_type will make dst_type call the _get_self_type function of self that was set at the moment of the method assignment. Note that self._get_self_type will later be patched by MemStruct.gen_fields.

Since inner types are created before outer types (e.g. Ptr(_, MemSelf) before Ptr(_, Ptr(_, MemSelf))), the _get_self_type of the outer is not patched at the moment it patches the inner class. This is why delayed evaluation of _get_self_type is necessary. This also is why this is a function and not an attribute.

To be honest, MemSelf implementation is a hacky mess; I may have been tired or confused the day I imagined that, and it stayed stuck to my mind. :)
I would really be interested in cleaner suggestion to implement that.

PS/EDIT: this remark made me find and fix a bug related to that (mixed with the DYN_MEM_STRUCT_CACHE); the patch will also come in this PR.

def _unpack(self, raw_str):
return struct.unpack(self._fmt, raw_str)
def size(self):

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Idem for property

return self.__class__ == other.__class__ and self._fmt == other._fmt
def __hash__(self):
return hash(hash(self.__class__) + hash(self._fmt))

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

In Miasm, we usually use the form (not sure it is a better one, but it consider elements as different "dimensions"):
hash((self.__class_, self._fmt))

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

I think it is functionally equivalent, but your version is cleaner + shorter, I'll fix that.

("size", Num("<I")),
]
def __init__(self, vm, *args, **kwargs):

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Please choose between the form (*args, **kwargs) and (vm, addr, *args, **kwargs)

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Will be fixed

def _unpack(self, raw_str):
upck = super(Num, self)._unpack(raw_str)
if len(upck) > 1:

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

May a != 1 is more precise

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Right.

"""
def __init__(self, fmt, dst_type, *type_args, **type_kwargs):
"""Args:

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Idem for docstring args

MemStruct when instanciating it (e.g. for MemStr encoding or
MemArray field_type).
"""
if not isinstance(dst_type, MemField) and\

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

You can also use the form:

if (... and
    ... and
    ...):

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Will be fixed

@addr. Equivalent to a pointer dereference assignment in C.
"""
# Sanity check
if self.dst_type != val.__class__:

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

So if the val class is subclassed, you will always raise a warning. Is this an expected behavior, rather than a isinstance?

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

This kind of thing, in a C or C++ compiler, would raise a warning or an error. The idea is that if you subclass you MemStruct to add fields, an assignment of the subclass to the parent should raise a warning. If you subclass it just to add methods, it's okay, but I do not see real usecases for that.

In both cases, casting your subclass to the superclass (via the .cast(<superclass>)) method, as you would do in C, is the way to go if you want to get rid of the warning. This is the way to indicate that you know what you are doing. Is this explanation ok for you?

self._type_kwargs == other._type_kwargs
def __hash__(self):
return hash(super(Ptr, self).__hash__() + hash(self._dst_type) +

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Idem for hash(tuple)

def get(self, vm, addr):
return self._il_type(vm, addr)
def size(self):

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Idem property

self._type_kwargs == other._type_kwargs
def __hash__(self):
return hash(hash(self.__class__) + hash(self._il_type) +

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Idem hash

offset += self.field_type.size()
else:
raise NotImplementedError(

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

Even if it is not the case in the rest of Miasm code, NotImplementedError should be reserved for abstract method.
When a method is not implemented, you can raise a RuntimeError, for instance.

Source:

 exception NotImplementedError

    This exception is derived from RuntimeError. In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method.

We should probably create an exception in Miasm for this kind of case (FdsException? 😃)

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Thanks, fixed.

+1 for FdsException. :)

assert ex.get_addr("f1") == ex.get_addr("f2")
"""
def __init__(self, field_list):

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

What do you think about using a *field_list instead, avoiding the creation of a list for the caller?

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

It could be handy, but I think it would make things non homogeneous, or at least less clear, for subclasses like BitField. What do you think?

def __init__(self, field_list):
"""field_list is a [(name, field)] list, see the class doc"""
self.field_list = field_list

This comment has been minimized.

@commial

commial Nov 25, 2015

Member

In the module, you could manage field_list as an OrderedDict, giving a direct access to .keys, .values, ... and a test for unicity of keys.

This comment has been minimized.

@fmonjalet

fmonjalet Nov 25, 2015

Contributor

Actually, in an Union, the fields are not ordered. The doc talks about a list of field, but in fact any iterable is fine (I may update it in that sense).

Maybe it could be useful for the MemStruct._attrs attribute, but I am not convinced it is worth it yet.

fmonjalet added some commits Nov 27, 2015

MemStruct: Big refactoring, Mem* -> Pinned*
This commit is the first phase of the Type refactor. The PinnedType
class has been separated from the more specific PinnedStruct class.
MemStruct: big refactor in process
Doc is currently incoherent, impl will also be completed
MemStruct: Array/PinnedArray homogeneity
Array access logic has moved to Array, Pinned(Sized)Array just contains
the logic to interface with memory
MemStruct/Types: Array idx fix + MemStr.from_str feature
MemStr.from_str allows to allocate and set a string automatically if
ALLOCATOR is set. This avoids allocating a buffer and filling it later.
Types: adding the ("field", SomeMemType) syntax
Shorthand for ("field", SomeMemStruct.get_type()) in a Struct or
MemStruct fields definition.
Types: Support anonymous Struct/Union/BitField
See the test addition for an example. A Struct, Union, or BitField field
with no name will be considered anonymous: all its fields will be added
to the parent Struct/Union/BitField. This implements this kind of C
declaration:

struct foo {
    int a;
    union {
        int bar;
        struct {
            short baz;
            short foz;
        };
    };
}
Types: typo, MemStruct.get_offset should be a classmethod
Also added tests and MemArray.get_offset

@fmonjalet fmonjalet force-pushed the fmonjalet:feature_memstruct branch from 6892d3d to b421c50 Jan 18, 2016

@fmonjalet

This comment has been minimized.

Contributor

fmonjalet commented Jan 18, 2016

Ready to merge for me, as soon as tests pass.

@serpilliere

This comment has been minimized.

Contributor

serpilliere commented Jan 18, 2016

Hey, Santa Claus brought another heavy gift! 🎅
Thanks for the feature: It will be very useful to rewrite cleaner Windows structures!

serpilliere added a commit that referenced this pull request Jan 18, 2016

@serpilliere serpilliere merged commit d49e05f into cea-sec:master Jan 18, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pylint Note: +0.15
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment