Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringDataItem decoding removed #689

Merged
merged 17 commits into from May 12, 2019
Merged

Conversation

nlachfr
Copy link
Contributor

@nlachfr nlachfr commented May 1, 2019

As told in #686, not decoding the strings during parsing could bring a huge performance gain. I did a new class MUTF8String for replacing the results of decode(b) and patch_string(decode(b)) and capable of decoding when needed.

Test with patch_string(decode(b)) :

Runtime : 28.01447200s

              Name               | TotTime  |  Calls   | Source

patch_string                     | 7.291792 |   537811 | mutf8.py:73    
__next__                         | 4.955242 | 16682452 | mutf8.py:57    
decode                           | 3.113555 |   537811 | mutf8.py:1     
read                             | 1.876643 |  5206598 | bytecode.py:795   
<built-in method builtins.ord>   | 1.295802 | 37497133 |
read_null_terminated_string      | 1.071799 |   102793 | dvm.py:97    
<built-in method builtins.chr>   | 0.871598 | 16144642 |
get_byte                         | 0.658616 |  1021791 | dvm.py:200   
<built-in method builtins.len>   | 0.657512 | 17236692 |
get_string                       | 0.512339 |   498603 | dvm.py:7277  
get                              | 0.416270 |   537811 | dvm.py:1904  
<built-in method builtins.isinst | 0.388776 |  5339444 |
__init__                         | 0.388250 |    77741 | dvm.py:6717  
<built-in method _struct.unpack> | 0.256094 |  2400031 |
__init__                         | 0.210497 |    77741 | dvm.py:6483  
readuleb128                      | 0.201282 |   566395 | dvm.py:204   
__init__                         | 0.191760 |   100931 | dvm.py:2421  
<method 'append' of 'bytearray'  | 0.182567 |  2552524 |
<listcomp>                       | 0.182123 |        2 | dvm.py:7028  
__init__                         | 0.181986 |    82413 | dvm.py:2820  

Test with MUTF8String(b) :

Runtime : 10.38462600s

              Name               | TotTime  |  Calls   | Source

read                             | 1.849554 |  5206598 | bytecode.py:795   
read_null_terminated_string      | 1.111465 |   102793 | dvm.py:97    
get_string                       | 0.661906 |   498603 | dvm.py:7276  
get_byte                         | 0.621419 |  1021791 | dvm.py:200   
<built-in method builtins.isinst | 0.423100 |  5993425 |
from_bytes                       | 0.414873 |   653981 | mutf8.py:105   
__init__                         | 0.355376 |    77741 | dvm.py:6482  
__init__                         | 0.311230 |   653981 | mutf8.py:93    
__init__                         | 0.302810 |    77741 | dvm.py:6716  
<built-in method _struct.unpack> | 0.245051 |  2400031 |
get_type_ref                     | 0.201302 |   306893 | dvm.py:7342  
readuleb128                      | 0.199724 |   566395 | dvm.py:204   
<built-in method builtins.ord>   | 0.184095 |  5207851 |
<method 'append' of 'bytearray'  | 0.183919 |  2552524 |
add_type_item                    | 0.175883 |       35 | dvm.py:7234  
__init__                         | 0.165159 |   100931 | dvm.py:2420  
get                              | 0.164683 |   537811 | dvm.py:1902  
_load_elements                   | 0.144982 |    44096 | dvm.py:3373  
__init__                         | 0.142053 |     9894 | dvm.py:1456  
get_type                         | 0.126710 |   306893 | dvm.py:7327 

It's also possible to optimize read_null_terminated_string as I did in #686.

Resolves : #684

androguard/core/bytecode.py Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented May 2, 2019

Codecov Report

Merging #689 into master will increase coverage by 0.07%.
The diff coverage is 85.91%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #689      +/-   ##
=========================================
+ Coverage   72.53%   72.6%   +0.07%     
=========================================
  Files          50      50              
  Lines       16057   16188     +131     
=========================================
+ Hits        11647   11754     +107     
- Misses       4410    4434      +24
Impacted Files Coverage Δ
androguard/decompiler/dad/decompile.py 51.97% <ø> (ø) ⬆️
androguard/core/bytecode.py 43.62% <ø> (-0.23%) ⬇️
androguard/cli/main.py 34.39% <100%> (ø) ⬆️
androguard/core/analysis/analysis.py 77.15% <100%> (+0.25%) ⬆️
tests/test_dexcodeparsing.py 94.87% <100%> (ø) ⬆️
tests/test_strings.py 96.29% <100%> (-0.14%) ⬇️
androguard/decompiler/dad/writer.py 92.71% <100%> (-3.52%) ⬇️
androguard/decompiler/decompiler.py 26.91% <100%> (ø) ⬆️
tests/test_decompiler.py 97.5% <100%> (ø) ⬆️
androguard/core/mutf8.py 81.3% <81.3%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55f282b...a39ff67. Read the comment docs.

Copy link
Contributor Author

@nlachfr nlachfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seems to be ok according to tests.

Test with last updates :

Runtime : 7.62316800s

              Name               | TotTime  |  Calls   | Source

read                             | 0.971308 |  2654552 | bytecode.py:795   
get_string                       | 0.612781 |   498603 | dvm.py:7280  
from_bytes                       | 0.356392 |   653981 | mutf8.py:105   
__init__                         | 0.348029 |    77741 | dvm.py:6486  
get_byte                         | 0.323302 |  1021791 | dvm.py:204   
__init__                         | 0.312980 |   653981 | mutf8.py:93    
__init__                         | 0.303062 |    77741 | dvm.py:6720  
<built-in method builtins.isinst | 0.255098 |  3441379 |
<built-in method _struct.unpack> | 0.241028 |  2400031 |
get_type_ref                     | 0.224705 |   306893 | dvm.py:7346  
readuleb128                      | 0.195157 |   566395 | dvm.py:208   
add_type_item                    | 0.175564 |       35 | dvm.py:7238  
__init__                         | 0.165144 |   100931 | dvm.py:2424  
get                              | 0.164950 |   537811 | dvm.py:1906  
__init__                         | 0.141590 |     9894 | dvm.py:1460  
_load_elements                   | 0.141182 |    44096 | dvm.py:3377  
read_null_terminated_string      | 0.127051 |   102793 | dvm.py:97    
get_type                         | 0.126590 |   306893 | dvm.py:7331  
__init__                         | 0.115519 |    11024 | dvm.py:3258  
__init__                         | 0.111835 |    82413 | dvm.py:2823

It may require to update documentation.

What do you think about moving MUTF8String into androguard.core.bytecodes.dvm ? And maybe renaming it as DalvikString ?

@reox
Copy link
Member

reox commented May 3, 2019

wow, you have been very busy :)

mh MUTF8 is not only used in Dalvik but in the whole java world... So I would leave it as MUTF8String.
We should consider making the module depth more flat - so maybe move it where it fits? dunno really..

@nlachfr
Copy link
Contributor Author

nlachfr commented May 3, 2019

@reox Maybe put mutf8.py package to androguard.core ?

@nlachfr nlachfr changed the title [WIP] StringDataItem decoding removed StringDataItem decoding removed May 8, 2019
@nlachfr
Copy link
Contributor Author

nlachfr commented May 9, 2019

I moved the package to androguard.core.mutf8 instead of androguard.core.bytecodes.mutf8.
Do you think it's ready to merge @reox ?

@reox
Copy link
Member

reox commented May 10, 2019

sure! I pushed some bugfixes, that unfortunately broke your PR, sorry!

you could also merge all your three PRs into a single one. I think it is enough to merge the branches locally and push one - that should automatically close the other two and add them to one PR

@reox
Copy link
Member

reox commented May 10, 2019

ah I meant your two PRs :D

@nlachfr
Copy link
Contributor Author

nlachfr commented May 10, 2019

#686 merged and PR is ready to be merged :)

you could also merge all your three PRs into a single one. I think it is enough to merge the branches locally and push one - that should automatically close the other two and add them to one PR

I don't know how it works, #686 is still there

@reox
Copy link
Member

reox commented May 12, 2019

Mhh last time I did that, the other PR would close... maybe that changed or I did something different :D

anyways, merging these now!

@reox reox merged commit c5104fb into androguard:master May 12, 2019
@reox
Copy link
Member

reox commented May 12, 2019

Ah... Just realized: The other PR was closed automatically when this one was merged!
So I just had wrong memories ;)

@nlachfr nlachfr deleted the dexparser/#684-1 branch May 12, 2019 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving DEX parsing time
3 participants