Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database.get.string() incorrectly includes null terminator in string #20

Closed
x0rloser opened this issue Nov 15, 2018 · 3 comments
Closed
Assignees

Comments

@x0rloser
Copy link

If a C style string "abc" exists in the idb at address 0, then database.get.string(0) will return the string "abc\0" instead of "abc". It will therefore have a length of 4 instead of 3.

Attempting to use the returned string in python formatting such as: print("foo %s bar" % database.get.string(0)) will print foo abc instead of the expected foo abc bar

I am using the latest ida-minsc code from github with IDA32 v7.1 64bit for windows.

@arizvisa
Copy link
Owner

arizvisa commented Nov 15, 2018

Okay. You caught me. I was being lazy and didn't want to implement the decoding of all the available strtypes since db.get.string() works regardless of whether a string is defined at the address or not and lets you specify the length too.

The reason why idaapi.get_strlit_contents(ea, idaapi.get_item_size(ea), idaapi.get_str_type(ea)) never made it into db.get.string is because it actually utf8 encodes utf-16le (or any mbcs really). This changes the bytes in the actual string when really it should return it as a unicode type. So to preserve those bytes, I always just used .rstrip() and then .decode() on all the strings I read from a database.

An example of this utf8-encoding on a utf-16le string (IDA 7.1.180227): 9a 3f 43 56 4f 54 57 3e 43 62 00 00

Python>repr(idaapi.get_strlit_contents(h(), idaapi.get_item_size(h()), idaapi.get_str_type(h())))
'\xe3\xbe\x9a\xe5\x99\x83\xe5\x91\x8f\xe3\xb9\x97\xe6\x89\x83'

But then with db.get.string(), if you process it you get the following correct string.

Python>repr(db.get.string().decode('utf-16le'))
u'\u3f9a\u5643\u544f\u3e57\u6243\x00'

I just didn't want to implement all of the ASCSTR types, but I guess I will.

Since you're only using C-style strings, you can put something like the following in your ~/.idapythonrc.py until I get it done.

db.get.string = staticmethod(fcompose(db.get.string, operator.methodcaller('rstrip')))

@arizvisa
Copy link
Owner

@x0rloser, PR #21 should fix this. I submitted the PR for review, then I reviewed my own code and realized that I'm just so awesome and so I merged it upstream because I'm always right and of course it fixes the issue...so, yeah...

Anyways, you should just need to do a git pull to re-sync, and you should be good to go. It supports the other stuff without needing UTF-8 like I mentioned. If that didn't fix it for you for some reason or there's some particular case I missed, just re-open this issue and lmk.

Thanks for your contribution, sir

@x0rloser
Copy link
Author

can confirm is fixed (for C strings at least). awesomeness ftw ;)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants