Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow in _yr_re_emit() #674

Closed
fumfel opened this issue May 29, 2017 · 10 comments

Comments

Projects
None yet
4 participants
@fumfel
Copy link

commented May 29, 2017

Stack overflow in _yr_re_emit()

Tested on Git HEAD: cdbacf5

Payload

To reproduce: yara yara_so_yr_re_emit.yar strings

ASAN:

==29878==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc175c7d7c (pc 0x00000045d4bb bp 0x7ffc175c9190 sp 0x7ffc175c7d70 T0)
    #0 0x45d4ba in _yr_re_emit XYZ/yara/libyara/re.c:754
    #1 0x45ecb9 in _yr_re_emit XYZ/yara/libyara/re.c:931
    ---------------------- !SNIP! ----------------------
    #246 0x45ecb9 in _yr_re_emit XYZ/yara/libyara/re.c:931
    #247 0x45ecb9 in _yr_re_emit XYZ/yara/libyara/re.c:931
    #248 0x45ecb9 in _yr_re_emit XYZ/yara/libyara/re.c:931
    #249 0x45ecb9 in _yr_re_emit XYZ/yara/libyara/re.c:931
    #250 0x45ecb9 in _yr_re_emit XYZ/yara/libyara/re.c:931
    #251 0x45ecb9 in _yr_re_emit XYZ/yara/libyara/re.c:931

SUMMARY: AddressSanitizer: stack-overflow XYZ/yara/libyara/re.c:754 _yr_re_emit
==29878==ABORTING

plusvic added a commit that referenced this issue May 29, 2017

plusvic added a commit that referenced this issue May 29, 2017

@fumfel

This comment has been minimized.

Copy link
Author

commented Jun 1, 2017

Slightly different issue in 5e2d279

Payload (about 78 kb)

ASAN:

==28982==ERROR: AddressSanitizer: stack-overflow on address 0x7fff4a83c8ec (pc 0x00000045c45b bp 0x7fff4a83dd00 sp 0x7fff4a83c8e0 T0)
    #0 0x45c45a in _yr_re_emit XYZ/yara/libyara/re.c:738
    #1 0x45dba9 in _yr_re_emit XYZ/yara/libyara/re.c:915
    ---------------------- !SNIP! ----------------------
    #246 0x45dba9 in _yr_re_emit XYZ/yara/libyara/re.c:915
    #247 0x45dba9 in _yr_re_emit XYZ/yara/libyara/re.c:915
    #248 0x45dba9 in _yr_re_emit XYZ/yara/libyara/re.c:915
    #249 0x45dba9 in _yr_re_emit XYZ/yara/libyara/re.c:915
    #250 0x45dba9 in _yr_re_emit XYZ/yara/libyara/re.c:915
    #251 0x45dba9 in _yr_re_emit XYZ/yara/libyara/re.c:915

SUMMARY: AddressSanitizer: stack-overflow XYZ/yara/libyara/re.c:738 _yr_re_emit
==28982==ABORTING

plusvic pushed a commit that referenced this issue Jun 1, 2017

Victor Manuel Alvarez

@plusvic plusvic closed this Jun 5, 2017

@wxsBSD

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2017

I just debugged an issue for a friend who had a bunch of rules break due to the limit of 2000. He has some rules which have very long hex strings in them, and bumping the value up works fine for him but I'm curious if you would consider bumping it up to 3000 (that more than covers the one rule I saw from him). I realize arbitrary limits are, by definition, arbitrary but I'm hoping we can bump this as to not break rules that exist.

@plusvic

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2017

Yes sure, as long as it avoids the stack overflow we should go for the largest number.

plusvic added a commit that referenced this issue Jun 6, 2017

@wxsBSD

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2017

Just to make sure I could reproduce the problem I checked out cdbacf5 and built with --enable-address-sanitizer --disable-optimization --enable-debug but I'm unable to reproduce it.

wxs@wxs-mbp yara % ./yara ~/Desktop/yara_so_yr_re_emit.txt /bin/ls
/Users/wxs/Desktop/yara_so_yr_re_emit.txt(8): error: syntax error, unexpected _IDENTIFIER_, expecting _CONDITION_
wxs@wxs-mbp yara %

My plan was once I knew how to trigger it I would keep find the largest value for RE_MAX_AST_LIMIT which wouldn't crash. Am I just not triggering this correctly?

@plusvic

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2017

@wxsBSD actually I had a similar problem and had to create my own rule to trigger the bug. I just used a very long regexp, like /xxxxxxxxx...{a few thousand more}.....xxxxxxxxxxxx/. Maybe @fumfel has a smaller stack size limit and the bug is triggered in his system but not in yours. Check your stack size limit with ulimit -s. In my case is 8192KB, @fumfel, what's the maximum stack size in your test system?

@wxsBSD

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2017

I forgot about ulimit -s. I think it's probably best to leave it at 3000 for now and see if anyone complains, and if so we can make it runtime dependent or bump it again.

@fumfel

This comment has been minimized.

Copy link
Author

commented Jun 7, 2017

@plusvic 8192KB, but AFAIR ASAN for their purposes have different stack limits - 2/4 MB.

@marnao

This comment has been minimized.

Copy link

commented Jun 8, 2017

We had many rules break with the 2000 limit. I had to go all the way up to 6000. Is this safe? Just potentially uses more resources? Not sure I understand the implications.

@plusvic

This comment has been minimized.

Copy link
Collaborator

commented Jun 8, 2017

@marnao a 6000 limit should be safe too. In a 64 bits system my estimation is that each recursive call to _yr_re_emit() consumes around 180 bytes of stack space. So, setting the limit to 10000 will consume around 1.8MB of stack. With the typical 8MB stack space that's less than 25%.

However I wonder what kind of regular expressions are hitting the 2000 limit, I'm afraid that by raising the limit I could be encouraging massive regular expressions that don't actually make sense.

@marnao

This comment has been minimized.

Copy link

commented Jun 8, 2017

@plusvic thanks for the quick response as always. That's very helpful context.

I think you have a legitimate concern about encouraging poor use cases. Some of our worst offending rules were externally sourced, i.e. we did not write them. I can't speak for these, but the signatures that we wrote that resulted in lengthy hex strings were either:

  1. automatically generated signatures on PE artifacts
  2. static constants used to identify a particular kind of compression

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.