Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static fields showing up in wrong class #989

Closed
ehrenb opened this issue Jan 8, 2024 · 10 comments
Closed

static fields showing up in wrong class #989

ehrenb opened this issue Jan 8, 2024 · 10 comments

Comments

@ehrenb
Copy link
Contributor

ehrenb commented Jan 8, 2024

I was in the process of writing more test cases for fields, when I noticed that a static field was being attached to the wrong class. The 'i' and 'j' static fields are within the TestLoops$Loop class, not TestLoops. Instance fields seem to be connected properly (see TestSynthetic$1) though.

Using TestActivity.apk:

from androguard.misc import AnalyzeAPK
fpath = 'tests/data/APK/TestActivity.apk'
a, d, dx = AnalyzeAPK(fpath)

class_name = 'Ltests/androguard/TestLoops;'
print(class_name)
test_class_analysis = dx.get_class_analysis(class_name)
print('fields: ')
for f in test_class_analysis.get_fields():
    print(f'field: {f.name} {f.get_field().get_access_flags()}')

class_name = 'Ltests/androguard/TestLoops$Loop;'
print(class_name)
test_class_analysis = dx.get_class_analysis(class_name)
print('fields: ')
for f in test_class_analysis.get_fields():
    print(f'field: {f.name} {f.get_field().get_access_flags()}')


class_name = 'Ltests/androguard/TestSynthetic$1;'
print(class_name)
print('fields: ')
test_class_analysis = dx.get_class_analysis(class_name)
for f in test_class_analysis.get_fields():
    print(f'field: {f.name} {f.get_field().get_access_flags()}')

output:

Ltests/androguard/TestLoops;           # has static fields of $Loop?
fields: 
field: i 9
field: j 9
Ltests/androguard/TestLoops$Loop;   # no fields?
fields: 
Ltests/androguard/TestSynthetic$1;   # shows instance fields correctly
fields: 
field: val$o 4114

In smali, TestLoops$Loop.smali:

.class public Ltests/androguard/TestLoops$Loop;
.super Ljava/lang/Object;
.source "TestLoops.java"


# annotations
.annotation system Ldalvik/annotation/EnclosingClass;
    value = Ltests/androguard/TestLoops;
.end annotation

.annotation system Ldalvik/annotation/InnerClass;
    accessFlags = 0xc
    name = "Loop"
.end annotation


# static fields
.field public static i:I

.field public static j:I


# direct methods
.method protected constructor <init>()V
    .locals 0

    .prologue
    .line 8
    invoke-direct {p0}, Ljava/lang/Object;-><init>()V

    return-void
.end method

In smali, TestLoops.smali, there are no fields:

.class public Ltests/androguard/TestLoops;
.super Ljava/lang/Object;
.source "TestLoops.java"


# annotations
.annotation system Ldalvik/annotation/MemberClasses;
    value = {
        Ltests/androguard/TestLoops$Loop;
    }
.end annotation


# direct methods
.method public constructor <init>()V
    .locals 0

    .prologue
    .line 6
    invoke-direct {p0}, Ljava/lang/Object;-><init>()V

    return-void
.end method
...

TestSynthetic$1.smali

.class Ltests/androguard/TestSynthetic$1;
.super Ljava/lang/Thread;
.source "TestSynthetic.java"


# annotations
.annotation system Ldalvik/annotation/EnclosingMethod;
    value = Ltests/androguard/TestSynthetic;->TestSynthetic1()V
.end annotation

.annotation system Ldalvik/annotation/InnerClass;
    accessFlags = 0x0
    name = null
.end annotation


# instance fields
.field private final synthetic val$o:Ljava/lang/Object;
...

I also used Kaitai Struct to take a look at the dex, and the 'i' and 'j' fields are indeed within TestLoops$Loop:

kaitai

ehrenb added a commit to ehrenb/androguard that referenced this issue Jan 8, 2024
@ehrenb ehrenb changed the title fields showing up in wrong class static fields showing up in wrong class Jan 8, 2024
@ehrenb
Copy link
Contributor Author

ehrenb commented Jan 14, 2024

I think my understanding of what ClassAnalysis is intended to do caused some confusion. After reading through the code a bit, ClassAnalysis seems to only intend to represent xrefs of a class while still offering some API into the underlying DEX. For example, accordin to the docstring, get_fields and get_methods should only return methods and fields that the current class (that the ClassAnalysis object is wrapping) calls:

class ClassAnalysis:
    """
    ClassAnalysis contains the XREFs from a given Class.
    It is also used to wrap :class:`~androguard.core.dex.ClassDefItem`, which
    contain the actual class content like bytecode.

    Also external classes will generate xrefs, obviously only XREF_FROM are
    shown for external classes.

    :param classobj: class:`~androguard.core.dex.ClassDefItem` or :class:`ExternalClass`
    """

But this is really confusing to a user that decides to use the Analysis object from AnalyzeAPK, since the method names are 'get_fields' and 'get_methods', which don't indicate (nor do the docstrings of them) they are only supposed to return xrefs from. If this is the true intention of ClassAnalysis, then it is even more confusing that methods seem to be added to ClassAnalysis regardless of them being xrefs. Fields in ClassAnalysis only seem to get added when an xref is created via ClassAnalysis.add_field_xref_read() and ClassAnalysis.add_field_xref_write(). All methods within a particular class seem to get added to a ClassAnalysis _methods field, not just xref froms, when building up an Analysis object:

analysis.py Analysis object

    def add(self, vm):
        """
        Add a DEX to this Analysis.

        :param androguard.core.bytecodes.dvm.DEX vm: :class:`dvm.DEX` to add to this Analysis
        """
        self.vms.append(vm)

        logger.info("Adding DEX file version {}".format(vm.version))

        # TODO: This step can easily be multithreaded, as there is no dependecy between the objects at this stage
        tic = time.time()
        for i, current_class in enumerate(vm.get_classes()):
            self.classes[current_class.get_name()] = ClassAnalysis(current_class)
            new_class = self.classes[current_class.get_name()]
            # Fix up the hidden api annotations (Android 10)
            hidden_api = vm.get_hidden_api()
            if hidden_api:
                rf, df = hidden_api.get_flags(i)
                new_class.set_restriction_flag(rf)
                new_class.set_domain_flag(df)

            for method in current_class.get_methods():
                self.methods[method] = MethodAnalysis(vm, method)

                new_class.add_method(self.methods[method]).  # All methods being added to ClassAnalysis here

                # Store for faster lookup during create_xrefs
                m_hash = (current_class.get_name(), method.get_name(), str(method.get_descriptor()))
                self.__method_hashes[m_hash] = self.methods[method]

The below is an updated version of the first script I posted above, it shows that if we access the underlying Dex classes (via orig_class or get_vm_class()) static and instance fields, we can see that the fields exist as expected, but not in the wrapped ClassAnalysis get_fields:

from androguard.misc import AnalyzeAPK
from pprint import pprint
fpath = 'tests/data/APK/TestActivity.apk'
a, d, dx = AnalyzeAPK(fpath)

####################################################################
print('#'*64)
class_name = 'Ltests/androguard/TestLoops;'
print(class_name)
test_class_analysis = dx.get_class_analysis(class_name)
class_data_item = test_class_analysis.orig_class.class_data_item
print('static_fields: ')
[encoded_field.show() for encoded_field in class_data_item.static_fields]
print('instance_fields: ')
[encoded_field.show() for encoded_field in class_data_item.instance_fields]

print('fields: ')
for f in test_class_analysis.get_fields():
    print(f'field: {f.name} {f.get_field().get_access_flags()}')
    f.field.show()
####################################################################

####################################################################
print('#'*64)
class_name2 = 'Ltests/androguard/TestLoops$Loop;'
print(class_name2)
test_class_analysis2 = dx.get_class_analysis(class_name2)


class_data_item2 = test_class_analysis2.orig_class.class_data_item
print('static_fields: ')
[encoded_field.show() for encoded_field in class_data_item2.static_fields]
print('instance_fields: ')
[encoded_field.show() for encoded_field in class_data_item2.instance_fields]

print('fields: ')
for f in test_class_analysis2.get_fields():
    print(type(f))
    print(f'field: {f.name} {f.get_field().get_access_flags()}')
####################################################################
    
####################################################################
print('#'*64)
class_name3 = 'Ltests/androguard/TestSynthetic$1;'
print(class_name3)
test_class_analysis3 = dx.get_class_analysis(class_name3)

class_data_item3 = test_class_analysis3.orig_class.class_data_item
print('static_fields: ')
[encoded_field.show() for encoded_field in class_data_item3.static_fields]
print('instance_fields: ')
[encoded_field.show() for encoded_field in class_data_item3.instance_fields]
print('fields: ')
for f in test_class_analysis3.get_fields():
    print(f'field: {f.name} {f.get_field().get_access_flags()}')
    f.field.show()
####################################################################

output

...
################################################################
Ltests/androguard/TestLoops;
static_fields: 
instance_fields: 
fields: 
field: i 9
########## Field Information
Ltests/androguard/TestLoops$Loop;->i I [access_flags=public static]
field: j 9
########## Field Information
Ltests/androguard/TestLoops$Loop;->j I [access_flags=public static]
################################################################
Ltests/androguard/TestLoops$Loop;
static_fields: 
########## Field Information
Ltests/androguard/TestLoops$Loop;->i I [access_flags=public static]
########## Field Information
Ltests/androguard/TestLoops$Loop;->j I [access_flags=public static]
instance_fields: 
fields: 
################################################################
Ltests/androguard/TestSynthetic$1;
static_fields: 
instance_fields: 
########## Field Information
Ltests/androguard/TestSynthetic$1;->val$o Ljava/lang/Object; [access_flags=private final synthetic]
fields: 
field: val$o 4114
########## Field Information
Ltests/androguard/TestSynthetic$1;->val$o Ljava/lang/Object; [access_flags=private final synthetic]

TestLoop contains the fields because it sets $Loop.i and $Loop.j at some point, which causes those fields to be added to TestLoop's fields because they're xrefs. The $Loop class doesn't have an xref to itself, so it never gets those fields added to its _fields attribute.

Screen Shot 2024-01-14 at 12 54 57 PM

Should ClassAnalysis.get_fields/methods() be returning ALL fields and methods, or just xrefs? If all fields and methods, we can simply return a list of the underlying DEX class' methods and fields, right?

@erev0s
Copy link
Collaborator

erev0s commented Jan 14, 2024

i will try to take a close look at this in the following days but from a first glance the get_fields/get_methods implies returning all fields and methods and not just the xrefs.
maybe a slight reformat of the code to provide both cases seems proper.

@ehrenb
Copy link
Contributor Author

ehrenb commented Jan 14, 2024

i will try to take a close look at this in the following days but from a first glance the get_fields/get_methods implies returning all fields and methods and not just the xrefs. maybe a slight reformat of the code to provide both cases seems proper.

Thanks, I'm happy to make PRs where helpful, just let me know. I have unit tests for access flags for fields and methods in a fork, but will hold until these changes make it in.

@erev0s
Copy link
Collaborator

erev0s commented Jan 27, 2024

having my reservations regarding my understanding over the reasons why this was initially developed this way i can say the following:

  • the Analysis class should contain all info necessary about the DEX (including the XREFs)
  • ClassAnalysis class should contain the XREFs from that class for the fields/methods
  • ClassDefItem is 'wrapped' by ClassAnalysis and can be used to access other info about the class (similar to what you did with the updated code you provided using .orig_class.
  • the MethodAnalysis class does contain related info about XREFs of methods etc

having said that, tested a bit the get_methods and saw that it does not simply add the methods of a class, but also the XREFs of methods in the class. So get_methods intentionally adds all the methods it can identify within a class. Not sure though why it was designed as such.

besides this issue with the get_methods, having methods to retrieve all the fields/methods identified in a class seems to be out of the scope of the ClassAnalysis class and more of the ClassDefItem.

@erev0s
Copy link
Collaborator

erev0s commented Jan 30, 2024

i will try to take a close look at this in the following days but from a first glance the get_fields/get_methods implies returning all fields and methods and not just the xrefs. maybe a slight reformat of the code to provide both cases seems proper.

Thanks, I'm happy to make PRs where helpful, just let me know. I have unit tests for access flags for fields and methods in a fork, but will hold until these changes make it in.

maybe we can discuss further regarding adding these unit tests and along the way improve the "user friendliness" of the source

@ehrenb
Copy link
Contributor Author

ehrenb commented Jan 31, 2024

i will try to take a close look at this in the following days but from a first glance the get_fields/get_methods implies returning all fields and methods and not just the xrefs. maybe a slight reformat of the code to provide both cases seems proper.

Thanks, I'm happy to make PRs where helpful, just let me know. I have unit tests for access flags for fields and methods in a fork, but will hold until these changes make it in.

maybe we can discuss further regarding adding these unit tests and along the way improve the "user friendliness" of the source

The unit tests were added here: https://github.com/ehrenb/androguard/blob/master/tests/test_dex.py. I noticed that one of the field tests was failing, and that's what led me to the issue in this thread. For these test, I can access using the ClassAnalysis' 'orig_class' attribute for now and that will be fine (as is in my above example) and create a PR. Once new "user friendliness" designs are implemented, it probably makes sense to port those changes to the test case as well.

I will try to write up my thoughts on what "user friendliness" changes may look like. I'm thinking this could be cleared up with some method name changes and definitive examples for use.

@erev0s
Copy link
Collaborator

erev0s commented Feb 3, 2024

I will try to write up my thoughts on what "user friendliness" changes may look like. I'm thinking this could be cleared up with some method name changes and definitive examples for use.

That would be perfect and it can help in creating a plan on what needs to be done further. Unfortunately I have very very limited time these days to help sort things faster.

@ehrenb
Copy link
Contributor Author

ehrenb commented Feb 4, 2024

I will try to write up my thoughts on what "user friendliness" changes may look like. I'm thinking this could be cleared up with some method name changes and definitive examples for use.

That would be perfect and it can help in creating a plan on what needs to be done further. Unfortunately I have very very limited time these days to help sort things faster.

My understanding has changed slightly, I don't think the *Analysis classes were supposed to exclusively contain xrefs. Analysis was intended to add all app-internal classes/methods/fields by wrapping them in their respective ClassAnalysis, MethodAnalysis, and FieldAnalysis (*Analysis for shorthand) classes to enhance them. Along the way, if xrefs are discovered that pull in external-to-the-app code, they also get new *Analysis entries and are added into the Analysis

This is apparent by their addition in the Analysis.add() function:

...
self.classes[current_class.get_name()] = ClassAnalysis(current_class)
...
new_class.add_method(self.methods[method])

But fields were seemingly never wrapped and added like the above, maybe this was left undone? They were only wrapped and added into a ClassAnalysis when an XREF occurred. That may have led to my confusion about why only certain fields were in the Analysis with my test. We can ensure all of the app-internal fields are wrapped by adding a ClassAnalysis.add_field() function and calling it in Analysis.add().

Then, Analysis.create_xref() is invoked to "enhance" by adding xref links and potentially additional ExternalClass and ExternalMethod as ClassAnalysis and MethodAnalysis respectively.

Anyways, I've spent a decent amount of time trying to understand the intent of this code. If we move forward with the above understanding about the design (which I now believe wasn't completely finished), here are some things we can add to complete it:

  • Python type hints (typing)
    • Will help ensure api docs will stay consistent with the actual code base once we start auto generating API docs :)
    • Should help reduce time spent unrolling some of the code
  • core.analysis.analysis.Analysis.add() - this enhances the underlying DEX with wrapped versions of classes/methods/fields by adding ClassAnalysis, MethodAnalysis, FieldAnalysis for them. It also may add additional classes/methods/fields than the DEX because it has learned about and added ExternalClasses that were XREFd.
    • Add Analysis.fields property that returns Analysis.get_fields() for completeness, since we already allow accessing classes, strings, and methods by attribute.
    • Wrap and add fields in FieldAnalysis by creating ClassAnalysis.add_field() and invoking in Analysis.add(), this is a similar workflow to how methods and classes are already wrapped. This will ensure that all FieldAnalysis are created and seeded into their respective ClassAnalysis. If this doesn't happen, then classes that contain fields that are never internally xref'd will not have their fields attached at all. This is what happened with above, where it seemed TestLoop$Loop had no record of the i and j fields, but they were within TestLoop because it writes to them.
    • I've coded this up here: https://github.com/ehrenb/androguard/blob/refactor-analysis/androguard/core/analysis/analysis.py#L1459
      • Will need to fix the test_analysis.py unittest, as the number of fields returned will have changed since the new seeding. Since Analysis is not purely what's in the DEX header, we shouldn't be testing against the DEX header values in this test. If testing against the DEX header, we should rely on the underlying DEX vm class.
      • The same concept as above needs to be applied for strings/StringAnalysis

@erev0s
Copy link
Collaborator

erev0s commented Feb 8, 2024

@ehrenb there are a few things scattered around the source that are either forgotten or stopped midway while developing them due to lack of time etc, so I would not be surprised if fields are just not completed in that sense.

Additionally, its a project that is several years old, without actively being maintained since 2019 and therefore proper testing of the code is a bit behind on schedule. This is the reason why one of the first moves I did was to reinstate the tests.

To conclude, I am up to proceed with your ideas, when I will get a bit more time I will review things and consider what more can be done

@ehrenb
Copy link
Contributor Author

ehrenb commented Feb 8, 2024

@ehrenb there are a few things scattered around the source that are either forgotten or stopped midway while developing them due to lack of time etc, so I would not be surprised if fields are just not completed in that sense.

Additionally, its a project that is several years old, without actively being maintained since 2019 and therefore proper testing of the code is a bit behind on schedule. This is the reason why one of the first moves I did was to reinstate the tests.

To conclude, I am up to proceed with your ideas, when I will get a bit more time I will review things and consider what more can be done

Understood, getting testing up and running will help reveal some of these issues/designs that aren't so obvious (such as the fields!). I'll work on PRs for the above tasks I mentioned, then can help tackle whatever else comes up (issue backlog or new features, docs, etc).

ehrenb added a commit to ehrenb/androguard that referenced this issue Feb 25, 2024
…. fix failing test mentioned in androguard#989 so that it assumes testing for  wrapped fields.  make new tests for DEX-class level testing that tests counts of parsed values in the DEX header
@erev0s erev0s closed this as completed Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants