Feature MemStruct #284
Feature MemStruct #284
Conversation
@@ -0,0 +1,229 @@ | |||
#!/usr/bin/env python | |||
"""This script is just a short example of common usages for miasm2.analysis.mem. |
commial
Nov 25, 2015
Contributor
It seems that memstruct are more related to miasm2.core
, as DiGraph
than analysis module, such as miasm2.analysis.depgraph
.
It seems that memstruct are more related to miasm2.core
, as DiGraph
than analysis module, such as miasm2.analysis.depgraph
.
self.memset() | ||
|
||
def get_head(self): | ||
"""Returns the head ListNode instance""" |
commial
Nov 25, 2015
Contributor
You choose to add a docstring here, but it is missing in others methods
You choose to add a docstring here, but it is missing in others methods
fmonjalet
Nov 25, 2015
Author
Contributor
That's right. Do you want me to add a docstring to the other methods, or just drop this one since it seems quite obvious?
That's right. Do you want me to add a docstring to the other methods, or just drop this one since it seems quite obvious?
|
||
# If you followed, memstr and data.array point to the same object, so: | ||
raw_miams = '\x00'.join('Miams') + '\x00'*3 |
commial
Nov 25, 2015
Contributor
This is what miasm2.os_dep.commons.set_str_unic
do
This is what miasm2.os_dep.commons.set_str_unic
do
fmonjalet
Nov 25, 2015
Author
Contributor
Yes, but the point is to show the actual bytes that are written to memory, I think it is clearer for an example.
Yes, but the point is to show the actual bytes that are written to memory, I think it is clearer for an example.
assert data.array.cast(MemStr, "utf16") == memstr | ||
# Default is "ansi" | ||
assert data.array.cast(MemStr) != memstr | ||
assert data.array.cast(MemStr, "utf16").value == memstr.value |
commial
Nov 25, 2015
Contributor
The equality data.array.cast(MemStr, "utf16") == memstr
does not check that value
attributes are the same?
The equality data.array.cast(MemStr, "utf16") == memstr
does not check that value
attributes are the same?
fmonjalet
Nov 25, 2015
Author
Contributor
No, these are two different strings in memory, corresponding to the same python str value, but with two different concrete encoding. That's why the value attribute is not used for equality. And even if it was, the encoding should also be used,and these two instances would not be eq.
However, the current implementation tests str(self) == str(other)
, so two different strings with different values and different encoding could currently be eq, if they have the same str()
. This may not be the expected behaviour.
So the question is, should two MemStr
be equal when they have the same memory representation AND the same encoding (i.e. "lala" in utf8 != "lala" in latin1)or only when they have the same memory representation? (I would go for the first)
No, these are two different strings in memory, corresponding to the same python str value, but with two different concrete encoding. That's why the value attribute is not used for equality. And even if it was, the encoding should also be used,and these two instances would not be eq.
However, the current implementation tests str(self) == str(other)
, so two different strings with different values and different encoding could currently be eq, if they have the same str()
. This may not be the expected behaviour.
So the question is, should two MemStr
be equal when they have the same memory representation AND the same encoding (i.e. "lala" in utf8 != "lala" in latin1)or only when they have the same memory representation? (I would go for the first)
# Let's play with strings | ||
memstr = datastr.deref_data | ||
# Note that memstr is MemStr(..., "utf16") | ||
memstr.value = 'Miams' |
commial
Nov 25, 2015
Contributor
Is the typo intentional?
Is the typo intentional?
fmonjalet
Nov 25, 2015
Author
Contributor
Of course :). Do you want me to fix it, for trademark issues?
Of course :). Do you want me to fix it, for trademark issues?
@@ -0,0 +1,1259 @@ | |||
"""This module provides classes to manipulate C structures backed by a VmMngr | |||
object (a miasm VM virtual memory). |
commial
Nov 25, 2015
Contributor
VM and virtual memory seem redundant to me
VM and virtual memory seem redundant to me
fmonjalet
Nov 25, 2015
Author
Contributor
VM stands for Virtual Machine here, I will change it to sandbox to avoid confusion, thanks.
VM stands for Virtual Machine here, I will change it to sandbox to avoid confusion, thanks.
As you saw previously, to use this module, you just have to inherit from | ||
MemStruct and define a list of (<field_name>, <field_definition>). Availabe |
commial
Nov 25, 2015
Contributor
Typo: Availabe
Typo: Availabe
structure will be automatically allocated in memory: | ||
my_heap = miasm2.os_dep.common.heap() | ||
set_allocator(my_heap) |
commial
Nov 25, 2015
Contributor
What is the expected interface of the allocator?
EDIT: It is defined later, but it may be duplicated here, in the doc (for instance allocator: func(VmMngr) -> integer_address
)
What is the expected interface of the allocator?
EDIT: It is defined later, but it may be duplicated here, in the doc (for instance allocator: func(VmMngr) -> integer_address
)
fmonjalet
Nov 25, 2015
Author
Contributor
Will be fixed, thanks.
Will be fixed, thanks.
# Cache for dynamically generated MemStructs | ||
DYN_MEM_STRUCT_CACHE = {} | ||
|
||
def set_allocator(alloc_func): |
commial
Nov 25, 2015
Contributor
It seems that you only use set_allocator externally, but this method doesn't provide any "improvement" (as check) compared to a mem.ALLOCATOR = func
. Or you are keeping it for the associated docstring?
It seems that you only use set_allocator externally, but this method doesn't provide any "improvement" (as check) compared to a mem.ALLOCATOR = func
. Or you are keeping it for the associated docstring?
fmonjalet
Nov 25, 2015
Author
Contributor
I'm keeping it for two reasons:
- abstraction: the ability to hide the mechanism if it changes + possible checks, as you mentioned;
- Ease of import: the user will probably want to import classes from the
mem
module, but it seems that you have to import the module in itself to set a global (or did I miss something?). Importing the function is easier.
I'm keeping it for two reasons:
- abstraction: the ability to hide the mechanism if it changes + possible checks, as you mentioned;
- Ease of import: the user will probably want to import classes from the
mem
module, but it seems that you have to import the module in itself to set a global (or did I miss something?). Importing the function is easier.
return ' '*size + ('\n' + ' '*size).join(s.split('\n')) | ||
|
||
|
||
# FIXME: copied from miasm2.os_dep.common and fixed |
commial
Nov 25, 2015
Contributor
if this function is better than the one in miasm2.os_dep.common
, do not hesitate to replace and import it
if this function is better than the one in miasm2.os_dep.common
, do not hesitate to replace and import it
fmonjalet
Nov 25, 2015
Author
Contributor
Thanks, what would you think to remove the function from miasm.os_dep.common
import it from miasm.core.mem
instead?
Since this module is aimed at providing utilities to interract with the virtual memory, I think these functions may belong to it. Moreover, str encoding is not directly related to the underlying OS.
That said, if we do that, we have to beware of API breakage, since the unicode function is called get_str_utf16
rather than the more ambiguous get_str_unic
. Also, these functions take a vm
as a parameter, and not a jitter
.
Thanks, what would you think to remove the function from miasm.os_dep.common
import it from miasm.core.mem
instead?
Since this module is aimed at providing utilities to interract with the virtual memory, I think these functions may belong to it. Moreover, str encoding is not directly related to the underlying OS.
That said, if we do that, we have to beware of API breakage, since the unicode function is called get_str_utf16
rather than the more ambiguous get_str_unic
. Also, these functions take a vm
as a parameter, and not a jitter
.
tmp = addr | ||
while ((max_char is None or l < max_char) and | ||
vm.get_mem(tmp, 1) != "\x00"): | ||
tmp += 1 |
commial
Nov 25, 2015
Contributor
If you plan to rewrite this function:
Actually, tmp
is always equal to addr + l
, so this variable is useless.
Additionally, there is a double memory access (one for strlen, the second one for getting the actual string), which may not reflect the reality, and could be inefficient.
You may rewrite it with a list joined at the end.
If you plan to rewrite this function:
Actually, tmp
is always equal to addr + l
, so this variable is useless.
Additionally, there is a double memory access (one for strlen, the second one for getting the actual string), which may not reflect the reality, and could be inefficient.
You may rewrite it with a list joined at the end.
fmonjalet
Nov 25, 2015
Author
Contributor
It seems to be a good time to apply this fix, I will do it in this PR.
It seems to be a good time to apply this fix, I will do it in this PR.
return vm.get_mem(addr, l).decode("latin1") | ||
|
||
|
||
# TODO: get_raw_str_utf16 for length calculus |
commial
Nov 25, 2015
Contributor
This function can be useful in others cases, do not hesitate to move it to os_dep.commons
This function can be useful in others cases, do not hesitate to move it to os_dep.commons
vm.set_mem(addr, s + "\x00") | ||
|
||
|
||
def set_str_utf16(vm, addr, s): |
commial
Nov 25, 2015
Contributor
As for set_str_ansi
, you may move these functions to os_dep.commons
. The use of vm
seems to be the usual case (and more compliant with the API of get_str_*
)
As for set_str_ansi
, you may move these functions to os_dep.commons
. The use of vm
seems to be the usual case (and more compliant with the API of get_str_*
)
def mem(field): | ||
"""Generate a MemStruct subclass from a field. The field's value can | ||
be accessed through self.value or self.deref_value if field is a Ptr. | ||
""" |
commial
Nov 25, 2015
Contributor
Can you specify the expected format of @field
?
Can you specify the expected format of @field
?
fmonjalet
Nov 25, 2015
Author
Contributor
Yes, missing from the doc, sorry (it is a MemField instance)
Yes, missing from the doc, sorry (it is a MemField instance)
def set(self, vm, addr, val): | ||
"""Set a VmMngr memory from a value. | ||
Args: |
commial
Nov 25, 2015
Contributor
To be more homogeneous with other docstring in Miasm, you can use the following format :
"""Set a VmMngr memory from a value
@vm: VmMngr instance
@addr: the start adress in memory to set
@val: the python value to serialize in @vm at @addr
"""
To be more homogeneous with other docstring in Miasm, you can use the following format :
"""Set a VmMngr memory from a value
@vm: VmMngr instance
@addr: the start adress in memory to set
@val: the python value to serialize in @vm at @addr
"""
fmonjalet
Nov 25, 2015
Author
Contributor
I'll fix it, thanks.
I'll fix it, thanks.
""" | ||
self._self_type = self_type | ||
|
||
def size(self): |
commial
Nov 25, 2015
Contributor
To be more homogenous in API, you can use a property for size
To be more homogenous in API, you can use a property for size
super(Ptr, self).__init__(fmt) | ||
if isinstance(dst_type, MemField): | ||
# Patch the field to propagate the MemSelf replacement | ||
dst_type._get_self_type = lambda: self._get_self_type() |
commial
Nov 25, 2015
Contributor
Don't you prefer the form dst_type._get_self_type = self._get_self_type
?
Don't you prefer the form dst_type._get_self_type = self._get_self_type
?
fmonjalet
Nov 25, 2015
Author
Contributor
The two versions are actually not equivalent:
dst_type._get_self_type = lambda: self._get_self_type()
will make dst_type
always call the current _get_self_type
of self
at the time of the call.
dst_type._get_self_type = self._get_self_type
will make dst_type
call the _get_self_type
function of self
that was set at the moment of the method assignment. Note that self._get_self_type
will later be patched by MemStruct.gen_fields
.
Since inner types are created before outer types (e.g. Ptr(_, MemSelf)
before Ptr(_, Ptr(_, MemSelf))
), the _get_self_type
of the outer is not patched at the moment it patches the inner class. This is why delayed evaluation of _get_self_type
is necessary. This also is why this is a function and not an attribute.
To be honest, MemSelf
implementation is a hacky mess; I may have been tired or confused the day I imagined that, and it stayed stuck to my mind. :)
I would really be interested in cleaner suggestion to implement that.
PS/EDIT: this remark made me find and fix a bug related to that (mixed with the DYN_MEM_STRUCT_CACHE
); the patch will also come in this PR.
The two versions are actually not equivalent:
dst_type._get_self_type = lambda: self._get_self_type()
will makedst_type
always call the current_get_self_type
ofself
at the time of the call.dst_type._get_self_type = self._get_self_type
will makedst_type
call the_get_self_type
function ofself
that was set at the moment of the method assignment. Note thatself._get_self_type
will later be patched byMemStruct.gen_fields
.
Since inner types are created before outer types (e.g. Ptr(_, MemSelf)
before Ptr(_, Ptr(_, MemSelf))
), the _get_self_type
of the outer is not patched at the moment it patches the inner class. This is why delayed evaluation of _get_self_type
is necessary. This also is why this is a function and not an attribute.
To be honest, MemSelf
implementation is a hacky mess; I may have been tired or confused the day I imagined that, and it stayed stuck to my mind. :)
I would really be interested in cleaner suggestion to implement that.
PS/EDIT: this remark made me find and fix a bug related to that (mixed with the DYN_MEM_STRUCT_CACHE
); the patch will also come in this PR.
def _unpack(self, raw_str): | ||
return struct.unpack(self._fmt, raw_str) | ||
|
||
def size(self): |
commial
Nov 25, 2015
Contributor
Idem for property
Idem for property
return self.__class__ == other.__class__ and self._fmt == other._fmt | ||
|
||
def __hash__(self): | ||
return hash(hash(self.__class__) + hash(self._fmt)) |
commial
Nov 25, 2015
Contributor
In Miasm, we usually use the form (not sure it is a better one, but it consider elements as different "dimensions"):
hash((self.__class_, self._fmt))
In Miasm, we usually use the form (not sure it is a better one, but it consider elements as different "dimensions"):
hash((self.__class_, self._fmt))
fmonjalet
Nov 25, 2015
Author
Contributor
I think it is functionally equivalent, but your version is cleaner + shorter, I'll fix that.
I think it is functionally equivalent, but your version is cleaner + shorter, I'll fix that.
("size", Num("<I")), | ||
] | ||
|
||
def __init__(self, vm, *args, **kwargs): |
commial
Nov 25, 2015
Contributor
Please choose between the form (*args, **kwargs)
and (vm, addr, *args, **kwargs)
Please choose between the form (*args, **kwargs)
and (vm, addr, *args, **kwargs)
fmonjalet
Nov 25, 2015
Author
Contributor
Will be fixed
Will be fixed
|
||
def _unpack(self, raw_str): | ||
upck = super(Num, self)._unpack(raw_str) | ||
if len(upck) > 1: |
commial
Nov 25, 2015
Contributor
May a != 1
is more precise
May a != 1
is more precise
fmonjalet
Nov 25, 2015
Author
Contributor
Right.
Right.
""" | ||
|
||
def __init__(self, fmt, dst_type, *type_args, **type_kwargs): | ||
"""Args: |
commial
Nov 25, 2015
Contributor
Idem for docstring args
Idem for docstring args
MemStruct when instanciating it (e.g. for MemStr encoding or | ||
MemArray field_type). | ||
""" | ||
if not isinstance(dst_type, MemField) and\ |
commial
Nov 25, 2015
Contributor
You can also use the form:
if (... and
... and
...):
You can also use the form:
if (... and
... and
...):
fmonjalet
Nov 25, 2015
Author
Contributor
Will be fixed
Will be fixed
@addr. Equivalent to a pointer dereference assignment in C. | ||
""" | ||
# Sanity check | ||
if self.dst_type != val.__class__: |
commial
Nov 25, 2015
Contributor
So if the val class is subclassed, you will always raise a warning. Is this an expected behavior, rather than a isinstance
?
So if the val class is subclassed, you will always raise a warning. Is this an expected behavior, rather than a isinstance
?
fmonjalet
Nov 25, 2015
Author
Contributor
This kind of thing, in a C or C++ compiler, would raise a warning or an error. The idea is that if you subclass you MemStruct
to add fields, an assignment of the subclass to the parent should raise a warning. If you subclass it just to add methods, it's okay, but I do not see real usecases for that.
In both cases, casting your subclass to the superclass (via the .cast(<superclass>)
) method, as you would do in C, is the way to go if you want to get rid of the warning. This is the way to indicate that you know what you are doing. Is this explanation ok for you?
This kind of thing, in a C or C++ compiler, would raise a warning or an error. The idea is that if you subclass you MemStruct
to add fields, an assignment of the subclass to the parent should raise a warning. If you subclass it just to add methods, it's okay, but I do not see real usecases for that.
In both cases, casting your subclass to the superclass (via the .cast(<superclass>)
) method, as you would do in C, is the way to go if you want to get rid of the warning. This is the way to indicate that you know what you are doing. Is this explanation ok for you?
self._type_kwargs == other._type_kwargs | ||
|
||
def __hash__(self): | ||
return hash(super(Ptr, self).__hash__() + hash(self._dst_type) + |
commial
Nov 25, 2015
Contributor
Idem for hash(tuple)
Idem for hash(tuple)
def get(self, vm, addr): | ||
return self._il_type(vm, addr) | ||
|
||
def size(self): |
commial
Nov 25, 2015
Contributor
Idem property
Idem property
self._type_kwargs == other._type_kwargs | ||
|
||
def __hash__(self): | ||
return hash(hash(self.__class__) + hash(self._il_type) + |
commial
Nov 25, 2015
Contributor
Idem hash
Idem hash
offset += self.field_type.size() | ||
|
||
else: | ||
raise NotImplementedError( |
commial
Nov 25, 2015
Contributor
Even if it is not the case in the rest of Miasm code, NotImplementedError
should be reserved for abstract method.
When a method is not implemented, you can raise a RuntimeError
, for instance.
exception NotImplementedError
This exception is derived from RuntimeError. In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method.
We should probably create an exception in Miasm for this kind of case (FdsException
? 😃 )
Even if it is not the case in the rest of Miasm code, NotImplementedError
should be reserved for abstract method.
When a method is not implemented, you can raise a RuntimeError
, for instance.
exception NotImplementedError
This exception is derived from RuntimeError. In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method.
We should probably create an exception in Miasm for this kind of case (FdsException
?
fmonjalet
Nov 25, 2015
Author
Contributor
Thanks, fixed.
+1 for FdsException
. :)
Thanks, fixed.
+1 for FdsException
. :)
assert ex.get_addr("f1") == ex.get_addr("f2") | ||
""" | ||
|
||
def __init__(self, field_list): |
commial
Nov 25, 2015
Contributor
What do you think about using a *field_list
instead, avoiding the creation of a list for the caller?
What do you think about using a *field_list
instead, avoiding the creation of a list for the caller?
fmonjalet
Nov 25, 2015
Author
Contributor
It could be handy, but I think it would make things non homogeneous, or at least less clear, for subclasses like BitField
. What do you think?
It could be handy, but I think it would make things non homogeneous, or at least less clear, for subclasses like BitField
. What do you think?
|
||
def __init__(self, field_list): | ||
"""field_list is a [(name, field)] list, see the class doc""" | ||
self.field_list = field_list |
commial
Nov 25, 2015
Contributor
In the module, you could manage field_list as an OrderedDict
, giving a direct access to .keys, .values, ...
and a test for unicity of keys.
In the module, you could manage field_list as an OrderedDict
, giving a direct access to .keys, .values, ...
and a test for unicity of keys.
fmonjalet
Nov 25, 2015
Author
Contributor
Actually, in an Union
, the fields are not ordered. The doc talks about a list of field, but in fact any iterable is fine (I may update it in that sense).
Maybe it could be useful for the MemStruct._attrs
attribute, but I am not convinced it is worth it yet.
Actually, in an Union
, the fields are not ordered. The doc talks about a list of field, but in fact any iterable is fine (I may update it in that sense).
Maybe it could be useful for the MemStruct._attrs
attribute, but I am not convinced it is worth it yet.
This commit is the first phase of the Type refactor. The PinnedType class has been separated from the more specific PinnedStruct class.
Doc is currently incoherent, impl will also be completed
Array access logic has moved to Array, Pinned(Sized)Array just contains the logic to interface with memory
MemStr.from_str allows to allocate and set a string automatically if ALLOCATOR is set. This avoids allocating a buffer and filling it later.
Shorthand for ("field", SomeMemStruct.get_type()) in a Struct or MemStruct fields definition.
See the test addition for an example. A Struct, Union, or BitField field with no name will be considered anonymous: all its fields will be added to the parent Struct/Union/BitField. This implements this kind of C declaration: struct foo { int a; union { int bar; struct { short baz; short foz; }; }; }
Also added tests and MemArray.get_offset
Ready to merge for me, as soon as tests pass. |
Hey, Santa Claus brought another heavy gift! |
This PR introduces an API to easily interact with C structures in miasm's sandbox.
The
example/jitter/memstruct.py
example file may be the best introduction to this feature. As a spoiler, here is how a linked list can be represented with this API (extracted from the aforementioned file):Lots of FIXME/TODO are left there for now and lots of choices can be discussed, please tell me what you think!
-- Florent