New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dalvik executable (DEX) module #604

Closed
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@CalebFenton

CalebFenton commented Feb 9, 2017

This is a module for parsing Dalvik executables for Android. It makes a lot of information available to yara rules that opens the door for some useful yara rules, e.g. for compiler fingerprinting. Here are some examples: https://github.com/rednaga/APKiD/blob/master/apkid/rules/dex/compilers.yara

This module was originally proposed here: #484

The code's fairly robust. I've used it on 100k dex files with a very small number of failures and @strazzere did some extensive fuzzing to harden against possible exploitation.

I'm not just dropping this here. I'm willing to fix it up to make it acceptable, but I'm not sure if this is something y'all even want.

}
PDEX_HEADER dex_header = dex_get_header(block_data, block->size);
if (dex_header != NULL) {

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

if dex_header is NULL should not return SUCCESS

@trufae

trufae Apr 14, 2017

Contributor

if dex_header is NULL should not return SUCCESS

This comment has been minimized.

@CalebFenton

CalebFenton May 29, 2017

It looks like pe.c and dotnet.c will fail gracefully by returning SUCCESS if the header's can't be parsed. I'm just trying to simulate this. If I don't do this and return INVALID_FILE or something, any rule which imports dex and doesn't explicitly check the dex magic itself will error out. Am I misreading how pe.c and other modules are supposed to work?

@CalebFenton

CalebFenton May 29, 2017

It looks like pe.c and dotnet.c will fail gracefully by returning SUCCESS if the header's can't be parsed. I'm just trying to simulate this. If I don't do this and return INVALID_FILE or something, any rule which imports dex and doesn't explicitly check the dex magic itself will error out. Am I misreading how pe.c and other modules are supposed to work?

Show outdated Hide outdated libyara/modules/dex.c
return ERROR_SUCCESS;
}
bool is_dex_corrupt(PDEX_HEADER dex_header) {

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

static?

@trufae

trufae Apr 14, 2017

Contributor

static?

Show outdated Hide outdated libyara/modules/dex.c
}
bool is_dex_corrupt(PDEX_HEADER dex_header) {
if (*dex_header->endian_tag != 0x12345678) {

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

use constant define instead of inlined magic

@trufae

trufae Apr 14, 2017

Contributor

use constant define instead of inlined magic

set_integer(*dex_header->checksum, module, "header.checksum");
unsigned long signature_size = member_size(DEX_HEADER, signature);
char *signature = yr_malloc(signature_size + 1);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

check if allocation != NULL

@trufae

trufae Apr 14, 2017

Contributor

check if allocation != NULL

Show outdated Hide outdated libyara/modules/dex.c
void load_string_ids(PDEX_HEADER dex_header, uint8_t *data, size_t data_size, YR_OBJECT *module) {
uint32_t offset = *dex_header->string_ids_offset;
int string_ids_size = sizeof(STRING_ID_ITEM[*dex_header->string_ids_size]);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

this is const

@trufae

trufae Apr 14, 2017

Contributor

this is const

Show outdated Hide outdated libyara/modules/dex.c
unsigned int uleb_size = 0;
uint64_t utf16_size = read_uleb128(&string_data, &uleb_size);
uint64_t string_size = get_bytes_until_null(&string_data);
char *string = yr_malloc(string_size + 1);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

check null

@trufae

trufae Apr 14, 2017

Contributor

check null

Show outdated Hide outdated libyara/modules/dex.c
return;
}
TYPE_ID_ITEM *type_ids = yr_malloc(type_ids_size);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

check null

@trufae

trufae Apr 14, 2017

Contributor

check null

Show outdated Hide outdated libyara/modules/dex.c
memcpy(proto_ids, data + offset, proto_ids_size);
proto_ids_size = *dex_header->proto_ids_size * sizeof(PROTO_ID_ITEM);
int i,p;

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

missing space

@trufae

trufae Apr 14, 2017

Contributor

missing space

Show outdated Hide outdated libyara/modules/dex.c
yr_free(proto_ids);
}

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

empty line

@trufae

trufae Apr 14, 2017

Contributor

empty line

Show outdated Hide outdated libyara/modules/dex.c
memcpy(field_ids, data + offset, field_ids_size);
field_ids_size = *dex_header->field_ids_size * sizeof(FIELD_ID_ITEM);
int i = 0;

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

those vars are initialized twice

@trufae

trufae Apr 14, 2017

Contributor

those vars are initialized twice

Show outdated Hide outdated libyara/modules/dex.c
int i = 0;
int p = 0;
for (i = 0, p = 0; p < field_ids_size; i += 1, p += sizeof(FIELD_ID_ITEM)) {
uint16_t class_idx = *field_ids[i].class_idx;

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

this will not work on big endian machines

@trufae

trufae Apr 14, 2017

Contributor

this will not work on big endian machines

This comment has been minimized.

@CalebFenton

CalebFenton May 26, 2017

Could you help me figure out why this is? I'm not sure how to fix it.

@CalebFenton

CalebFenton May 26, 2017

Could you help me figure out why this is? I'm not sure how to fix it.

This comment has been minimized.

@radare

radare May 27, 2017

You should always construct the integer values by applying or operations on the bytes directly. You can ifdef assuming a host and target endian, but compilers should be smart enough to optimize those cases if you use some generic code for this.

You can have a look at the r_endian.h we use in radare2, to get an idea about the primitives you'll need and how to use them

@radare

radare May 27, 2017

You should always construct the integer values by applying or operations on the bytes directly. You can ifdef assuming a host and target endian, but compilers should be smart enough to optimize those cases if you use some generic code for this.

You can have a look at the r_endian.h we use in radare2, to get an idea about the primitives you'll need and how to use them

Show outdated Hide outdated libyara/modules/dex.c
return;
}
MAP_LIST *map_list = yr_malloc(map_data_size);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

null check

@trufae

trufae Apr 14, 2017

Contributor

null check

Show outdated Hide outdated libyara/modules/dex.c
uint32_t offset = *dex_header->map_offset;
uint8_t *pmap_list = data + offset;
size_t map_size = *pmap_list;
size_t map_data_size = sizeof(MAP_LIST) + (map_size * sizeof(MAP_ITEM));

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

dont use multiply for sizes, it may overflow size_t. use calloc instead of malloc to handle this properly

@trufae

trufae Apr 14, 2017

Contributor

dont use multiply for sizes, it may overflow size_t. use calloc instead of malloc to handle this properly

Show outdated Hide outdated libyara/modules/dex.c
char *get_string_item(uint32_t index, YR_OBJECT *module) {
SIZED_STRING *sized_string = get_string(module, "string_ids[%i].value", index);
char *string = yr_malloc(sized_string->length + 1);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

null check

@trufae

trufae Apr 14, 2017

Contributor

null check

Show outdated Hide outdated libyara/modules/dex.c
uint8_t *ptype_list = data + offset;
size_t type_list_size = *ptype_list;
int size = sizeof(TYPE_LIST) + (type_list_size * sizeof(TYPE_ITEM));
if ((unsigned long)(size) > data_size) {

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

scary cast

@trufae

trufae Apr 14, 2017

Contributor

scary cast

Show outdated Hide outdated libyara/modules/dex.c
return "";
}
TYPE_LIST *type_list = yr_malloc(size);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

null check

@trufae

trufae Apr 14, 2017

Contributor

null check

Show outdated Hide outdated libyara/modules/dex.c
int string_size = 0;
int i = 0;
for (i = 0; i < type_list_size; i++) {
char *param = get_string_item(get_integer(module, "type_ids[%i].descriptor_idx", *type_list->type_items[i].type_idx), module);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

null chk

@trufae

trufae Apr 14, 2017

Contributor

null chk

Show outdated Hide outdated libyara/modules/dex.c
params[i] = param;
}
char *parameters = yr_malloc(string_size + 1);

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

null chk

@trufae

trufae Apr 14, 2017

Contributor

null chk

Show outdated Hide outdated libyara/modules/dex.c
return parameters;
}

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

empty line

@trufae

trufae Apr 14, 2017

Contributor

empty line

Show outdated Hide outdated libyara/modules/dex.c
}
uint64_t read_uleb128(uint8_t **buf, unsigned *uleb_size) {

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

missing input buffer size which may turn into out of bounds access

@trufae

trufae Apr 14, 2017

Contributor

missing input buffer size which may turn into out of bounds access

Show outdated Hide outdated libyara/modules/dex.c
}
uint64_t get_bytes_until_null(const uint8_t **buf) {

This comment has been minimized.

@trufae

trufae Apr 14, 2017

Contributor

this is done by strlen, when working with buffers is usually good to know the size of it to avoid oobs

@trufae

trufae Apr 14, 2017

Contributor

this is done by strlen, when working with buffers is usually good to know the size of it to avoid oobs

@plusvic plusvic added this to the v3.7.0 milestone May 22, 2017

@CalebFenton

This comment has been minimized.

Show comment
Hide comment
@CalebFenton

CalebFenton May 26, 2017

Thanks for the detailed CR @trufae. I really appreciate it. This is, in fact, my first real CR for C code. I'll push a new commit when I get home. Sorry it took so long!

CalebFenton commented May 26, 2017

Thanks for the detailed CR @trufae. I really appreciate it. This is, in fact, my first real CR for C code. I'll push a new commit when I get home. Sorry it took so long!

CalebFenton added some commits May 26, 2017

Check malloc, fix potential oobs
Also, add support DEX036, DEX037
Return accurate errors when something weird happens
@trufae

This comment has been minimized.

Show comment
Hide comment
@trufae

trufae Jul 5, 2017

Contributor

Fails to build on windows because of this:

libtool: compile:  x86_64-w64-mingw32-gcc -DPACKAGE_NAME=\"yara\" -DPACKAGE_TARNAME=\"yara\" -DPACKAGE_VERSION=\"3.6.0\" "-DPACKAGE_STRING=\"yara 3.6.0\"" -DPACKAGE_BUGREPORT=\"vmalvarez@virustotal.com\" -DPACKAGE_URL=\"\" -DPACKAGE=\"yara\" -DVERSION=\"3.6.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_LIBM=1 -DHAVE_LIBM=1 -I. -Wall -Wno-deprecated-declarations -std=gnu99 -I./include -O3 -fvisibility=hidden -MT modules/dex.lo -MD -MP -MF modules/.deps/dex.Tpo -c modules/dex.c  -DDLL_EXPORT -DPIC -o modules/.libs/dex.o
In file included from /usr/lib/gcc/x86_64-w64-mingw32/4.8/include/stddef.h:1:0,
                 from /usr/share/mingw-w64/include/stdint.h:32,
                 from /usr/lib/gcc/x86_64-w64-mingw32/4.8/include/stdint.h:9,
                 from modules/dex.c:2:
modules/dex.c:203:12: error: static declaration of ‘_errno’ follows non-static declaration
 static int errno;
            ^
/usr/share/mingw-w64/include/stddef.h:18:31: note: previous declaration of ‘_errno’ was here
   _CRTIMP extern int *__cdecl _errno(void);
                               ^
modules/dex.c:203:12: warning: ‘_errno’ used but never defined [enabled by default]
 static int errno;
            ^
make[1]: *** [modules/dex.lo] Error 1
make[1]: Leaving directory `/home/travis/build/VirusTotal/yara/libyara'
make: *** [all-recursive] Error 1

apart from this it lgtm

Contributor

trufae commented Jul 5, 2017

Fails to build on windows because of this:

libtool: compile:  x86_64-w64-mingw32-gcc -DPACKAGE_NAME=\"yara\" -DPACKAGE_TARNAME=\"yara\" -DPACKAGE_VERSION=\"3.6.0\" "-DPACKAGE_STRING=\"yara 3.6.0\"" -DPACKAGE_BUGREPORT=\"vmalvarez@virustotal.com\" -DPACKAGE_URL=\"\" -DPACKAGE=\"yara\" -DVERSION=\"3.6.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_LIBM=1 -DHAVE_LIBM=1 -I. -Wall -Wno-deprecated-declarations -std=gnu99 -I./include -O3 -fvisibility=hidden -MT modules/dex.lo -MD -MP -MF modules/.deps/dex.Tpo -c modules/dex.c  -DDLL_EXPORT -DPIC -o modules/.libs/dex.o
In file included from /usr/lib/gcc/x86_64-w64-mingw32/4.8/include/stddef.h:1:0,
                 from /usr/share/mingw-w64/include/stdint.h:32,
                 from /usr/lib/gcc/x86_64-w64-mingw32/4.8/include/stdint.h:9,
                 from modules/dex.c:2:
modules/dex.c:203:12: error: static declaration of ‘_errno’ follows non-static declaration
 static int errno;
            ^
/usr/share/mingw-w64/include/stddef.h:18:31: note: previous declaration of ‘_errno’ was here
   _CRTIMP extern int *__cdecl _errno(void);
                               ^
modules/dex.c:203:12: warning: ‘_errno’ used but never defined [enabled by default]
 static int errno;
            ^
make[1]: *** [modules/dex.lo] Error 1
make[1]: Leaving directory `/home/travis/build/VirusTotal/yara/libyara'
make: *** [all-recursive] Error 1

apart from this it lgtm

@trufae

This comment has been minimized.

Show comment
Hide comment
@trufae

trufae Jul 5, 2017

Contributor

use different variable name, errno is suposed to be used by libc not by the library code

Contributor

trufae commented Jul 5, 2017

use different variable name, errno is suposed to be used by libc not by the library code

@strazzere

This comment has been minimized.

Show comment
Hide comment
@strazzere

strazzere Jul 6, 2017

@trufae Done - tested on my Windows 7 machine as well, though might be good to triple check as that build environment isn't often up to date (analysis vm mainly)

strazzere commented Jul 6, 2017

@trufae Done - tested on my Windows 7 machine as well, though might be good to triple check as that build environment isn't often up to date (analysis vm mainly)

@plusvic plusvic modified the milestones: v3.7.0, v3.8.0 Oct 31, 2017

@trufae

This comment has been minimized.

Show comment
Hide comment
@trufae

trufae Nov 2, 2017

Contributor

that looks red for appveyour and it cant be merged because there are conflicts with master, pls update

Contributor

trufae commented Nov 2, 2017

that looks red for appveyour and it cant be merged because there are conflicts with master, pls update

@plusvic

I believe this module needs an important rework before being merged into master branch.

char *magic = yr_malloc(magic_size + 1);
if (magic == NULL) {
error_num = ERROR_INSUFICIENT_MEMORY;
return -1;

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

Using a global variable for the error code is not thread-safe. It seems that you are trying to follow the convention used by the standard C library, which uses errno, but errno is stored in thread-local storage. You can simply return ERROR_INSUFICIENT_MEMORY and ERROR_SUCCESS if everything went fine.

@plusvic

plusvic Nov 2, 2017

Collaborator

Using a global variable for the error code is not thread-safe. It seems that you are trying to follow the convention used by the standard C library, which uses errno, but errno is stored in thread-local storage. You can simply return ERROR_INSUFICIENT_MEMORY and ERROR_SUCCESS if everything went fine.

static int load_header(PDEX_HEADER dex_header, YR_OBJECT *module) {
unsigned long magic_size = member_size(DEX_HEADER, magic);
char *magic = yr_malloc(magic_size + 1);

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

There's a memory leak here. The memory allocated by this yr_malloc is never freed. Actually you don't need to do a yr_malloc at all, you can simply do:

set_sized_string(dex_header->magic, magic_size, module, ""header.magic")

@plusvic

plusvic Nov 2, 2017

Collaborator

There's a memory leak here. The memory allocated by this yr_malloc is never freed. Actually you don't need to do a yr_malloc at all, you can simply do:

set_sized_string(dex_header->magic, magic_size, module, ""header.magic")

}
memcpy(signature, dex_header->signature, signature_size);
signature[signature_size] = '\0';
set_string(signature, module, "header.signature");

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

Same thing with signature you don't need to do a yr_malloc use set_sized_string instead.

@plusvic

plusvic Nov 2, 2017

Collaborator

Same thing with signature you don't need to do a yr_malloc use set_sized_string instead.

static int load_string_ids(PDEX_HEADER dex_header, uint8_t *data, size_t data_size, YR_OBJECT *module) {
uint32_t offset = yr_le32toh(*dex_header->string_ids_offset);
uint32_t string_ids_size = sizeof(STRING_ID_ITEM[*dex_header->string_ids_size]);

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

This won't do what you intend. It seems that your intention is calculating the size of an array of STRING_ID_ITEM structs with string_ids_size items, but sizeof is a compile-time operation, so you won't get a size based on the value of string_ids_size in the header at runtime. In fact this sizeof will translate to 0, at least with clang.

@plusvic

plusvic Nov 2, 2017

Collaborator

This won't do what you intend. It seems that your intention is calculating the size of an array of STRING_ID_ITEM structs with string_ids_size items, but sizeof is a compile-time operation, so you won't get a size based on the value of string_ids_size in the header at runtime. In fact this sizeof will translate to 0, at least with clang.

#define MODULE_NAME dex
typedef struct {

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

Why all fields in these structs are 1-sized arrays? O_o

This leads to a lot of confusing expressions like *dex_header->string_ids_size where they should be simply dex_header->string_ids_size. In fact I was wondering how *dex_header->string_ids_size compiled at all if dex_header was a pointer to a struct, until I discovered all these 1-element arrays.

@plusvic

plusvic Nov 2, 2017

Collaborator

Why all fields in these structs are 1-sized arrays? O_o

This leads to a lot of confusing expressions like *dex_header->string_ids_size where they should be simply dex_header->string_ids_size. In fact I was wondering how *dex_header->string_ids_size compiled at all if dex_header was a pointer to a struct, until I discovered all these 1-element arrays.

}
memcpy(string_ids, data + offset, string_ids_size);
string_ids_size = *dex_header->string_ids_size * sizeof(STRING_ID_ITEM);

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

This time string_ids_size is calculated right, but this should be done earlier in this function.

@plusvic

plusvic Nov 2, 2017

Collaborator

This time string_ids_size is calculated right, but this should be done earlier in this function.

string_ids_size = *dex_header->string_ids_size * sizeof(STRING_ID_ITEM);
int i;
int p;
for (i = 0, p = 0; p < string_ids_size; i += 1, p += sizeof(STRING_ID_ITEM)) {

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

There's no need to keep both p and i for controlling this loop. You can use just:

i < *dex_header->string_ids_size

@plusvic

plusvic Nov 2, 2017

Collaborator

There's no need to keep both p and i for controlling this loop. You can use just:

i < *dex_header->string_ids_size

return 0;
}
size_t string_size = strlen((char *)string_data);
char *string = yr_malloc(string_size + 1);

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

You don't need to do a yr_malloc use set_sized_string instead.

@plusvic

plusvic Nov 2, 2017

Collaborator

You don't need to do a yr_malloc use set_sized_string instead.

error_num = ERROR_INSUFICIENT_MEMORY;
return -1;
}
memcpy(type_ids, data + offset, type_ids_size);

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

There's no verification that offset is within buffer boundaries.

@plusvic

plusvic Nov 2, 2017

Collaborator

There's no verification that offset is within buffer boundaries.

error_num = ERROR_INSUFICIENT_MEMORY;
return -1;
}
memcpy(class_defs, data + offset, class_defs_size);

This comment has been minimized.

@plusvic

plusvic Nov 2, 2017

Collaborator

There's no verification that offset is within buffer boundaries.

@plusvic

plusvic Nov 2, 2017

Collaborator

There's no verification that offset is within buffer boundaries.

@CalebFenton

This comment has been minimized.

Show comment
Hide comment
@CalebFenton

CalebFenton Nov 2, 2017

Thanks for the review @plusvic. I look forward to incorporating the changes.

CalebFenton commented Nov 2, 2017

Thanks for the review @plusvic. I look forward to incorporating the changes.

@plusvic

This comment has been minimized.

Show comment
Hide comment
@plusvic

plusvic Feb 7, 2018

Collaborator

A new module for dex files has been already committed in e6e4360

Collaborator

plusvic commented Feb 7, 2018

A new module for dex files has been already committed in e6e4360

@plusvic plusvic closed this Feb 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment