Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xmlpull-style parser and serialiser #18

Open
MuntashirAkon opened this issue Mar 25, 2023 · 30 comments
Open

Use xmlpull-style parser and serialiser #18

MuntashirAkon opened this issue Mar 25, 2023 · 30 comments

Comments

@MuntashirAkon
Copy link

This is just what I've felt while using the library and only my expert opinion. I can also help you implementing this if you choose to do this.

The use of xmlpull-style parser and serialiser is quite common in Android ecosystem. The framework itself uses this library for parsing both normal and binary XML (newly introduced in Android 12) files. So, it allows greater control such as easy to convert between one XML format to another by simultaneous use of parser and serialiser. This also make it easier to separate reading and writing the XML contents. So, I propose, possibly, creating a wrapper around the XMLDocument class.

@REAndroid
Copy link
Owner

_ .. I can also help you implementing this if you choose to do this._

I really appreciate and you are welcome do this.

The framework itself uses this library for parsing both normal and binary XML (newly introduced in Android 12) files

I tried to implement exactly this, Here are my reasons not to use the standard XmlPullParser.

  • XML and speed don't go together, android uses native C/C++ to execute traditional pull parser thus it doesn't care if you parse 10x on single document to get one job done. You can see easily the effect of parsing xml document on APKEditor compiling takes at least 8x time than decompiling. Keep in mind that we already loaded the entire resource table(arsc) on memory and we can't afford to do memory intensive operations.
  • Class conflict: this lib is intended to work everywhere, android already has classes of org.xmlpull.* on framework and dex loader will ignore our class.

We can't separate XML and android but I can't see any good reason converting to/from readable XML, we have the power to do any editing live.

@MuntashirAkon
Copy link
Author

When I proposed the use of xmlpull, I didn't say that I'd use it to parse raw XML files. I was only implying that the binary XML parser (i.e. ResXmlDocument) should also be wrapped around an XmlPullParser so that we can have more control over it. It is only a simple wrapper and will not add any additional memory overheads. These standardised methods would be beneficial when you're trying to convert binary XML to a readable XML or only trying to make simple changes to it. If you've looked at the implementation of XmlResourceParser of the AOSP, you'd find that they provide a dumbed-down implementation of XmlPullParser. Developers all have their unique requirements. So, providing a binary XML to raw XML converter should not be part of the library. Instead, it should be the job of the developers to do that, and XmlPullParser (along with other tools such as XmlSerializer and other built-in and third-party transformers) greatly simplifies that. It will also save you from taking the pain of developing a binary XML to raw XML converter fulfilling all the requests from the developers (I've already made a few, for instance). I hope you've now understood my point. (I've, in fact, created a parser myself which works great but I'm a bit confused with namespaces which I've left unimplemented.)

For raw XML files, I don't see any reason to package the MXParser with this library in Android (since Android already offers an efficient FastPullParser alternative). So, you can solve this issue by creating two separate modules alongside the core module: One for Android without the xmlpull dependency (and possibly, other optimizations) and another for other platforms with the dependency, and then release them separately. There are a few libraries that does this already.

@REAndroid
Copy link
Owner

Sorry i misunderstood you, you are right it make sense for bin XML to have standard serializer. A developer have to go thru all lines of ResXmlDocument code to understand what it does. It is easy to implement XmlPullParser on ResXmlDocument.

in fact, created a parser myself which works great but I'm a bit confused with namespaces which I've left unimplemented

if it is on your public repo please share me the link

Now everything on arsc side becoming stable and now i have a chance to fully concentrate on XML

@MuntashirAkon
Copy link
Author

if it is on your public repo please share me the link

I haven't yet pushed it.

A developer have to go thru all lines of ResXmlDocument code to understand what it does. It is easy to implement XmlPullParser on ResXmlDocument.

Precisely. Working with ResXmlDocument is not very straight-forward and requires some documentation in order to make sense of the API. I would always closely follow Android because the developers are expected to already familiar with Android ecosystem.

@REAndroid
Copy link
Owner

Check ResXmlPullParser

@MuntashirAkon
Copy link
Author

Check ResXmlPullParser

Almost perfect! This is my initial review:

  1. It returns null on missing references. It should return reference numbers (@0xxxxxxxx or ?0xxxxxxx) so that they might be parsed later on.
  2. In addition to above, it does not offer an option to provide EntryStore (which is similar to ResourceTable in the AOSP)
  3. It does not handle namespace attributes i.e. xmlns and xmlns:prefix attributes (which should be present by default). Optionally, the xmlpull implementation in Android should support the following features:
    i. FEATURE_PROCESS_NAMESPACES - Whether to process namespaces (this is actually sort of prefix-handling if you've read the description)
    ii. FEATURE_REPORT_NAMESPACE_ATTRIBUTES - Whether to display namespace attributes when FEATURE_PROCESS_NAMESPACES is set to true.

@REAndroid
Copy link
Owner

1. It returns `null` on missing references. It should return reference numbers (`@0xxxxxxxx` or `?0xxxxxxx`) so that they might be parsed later on.

On which method ?

2. In addition to above, it does not offer an option to provide `EntryStore` (which is similar to `ResourceTable` in the AOSP)

For what purpose ?

3. It does not handle namespace attributes i.e. `xmlns` and `xmlns:prefix` attributes (which should be present by default). Optionally, the xmlpull implementation in Android should support the following features:

Will do on next commit

@MuntashirAkon
Copy link
Author

On which method ?

The default XmlPullParser methods i.e. getAttributeValue(index) and getAttributeValue(namespace, name)

For what purpose ?

For resolving references. With this implementation, a nasty workaround is required to do this. Example:

    public static String getAttributeName(@NonNull ResXmlPullParser parser, @NonNull EntryStore entryStore, int index) {
        int resourceId = parser.getAttributeNameResource(index);
        String name;
        if (resourceId == 0) {
            name = parser.getAttributeName(index);
        } else {
            EntryGroup group = entryStore.getEntryGroup(resourceId);
            if (group == null) {
                name = String.format("@0x%08x", resourceId);
            } else {
                name = group.getSpecName();
            }
        }
        return name;
    }

    public static String getAttributeValue(@NonNull ResXmlPullParser parser, @NonNull EntryStore entryStore, int currentPackageId, int index) {
        int resourceId = parser.getAttributeNameResource(index);
        // Use Reflection to fetch ResXmlAttribute to resolve references
        ResXmlAttribute attr = null;
        try {
            Method getResXmlAttributeAt = ResXmlPullParser.class.getDeclaredMethod("geResXmlAttributeAt", int.class);
            getResXmlAttributeAt.setAccessible(true);
            attr = (ResXmlAttribute) getResXmlAttributeAt.invoke(parser, index);
        } catch (Throwable th) {
            th.printStackTrace();
        }
        if (attr == null) {
            return parser.getAttributeValue(index);
        }
        ValueType valueType = attr.getValueType();
        int raw = attr.getData();
        String value;
        if (valueType == ValueType.STRING) {
            value = ValueDecoder.escapeSpecialCharacter(attr.getValueAsString());
        } else {
            value = ValueDecoder.decode(entryStore,
                    currentPackageId,
                    resourceId,
                    valueType,
                    raw);
        }
        return value;
    }

@REAndroid
Copy link
Owner

getAttributeName If you can provide the values I can implement ResXmlPullParser#setDecoder(EntryStore entryStore, int currentPackageId)

getAttributeValue I don't see any purpose of this method except for logging, if you gonna to process this value latter you have to do some encoding again. e.g. dimension value 24.0dp. Anyways if dev can provide EntryStore and currentPackageId we can implement.

If you loaded your bin xml from ApkModule, we can retrieve all decoding materials ResXmlDocument.getApkFile().getTableBlock()

    public boolean getAttributeBooleanValue(int index, boolean defaultValue);
    public int getAttributeResourceValue(int index, int defaultValue);
    public int getAttributeIntValue(int index, int defaultValue); // this one needs fix
    public int getAttributeUnsignedIntValue(int index, int defaultValue);
    public float getAttributeFloatValue(int index, float defaultValue);

I didn't test this on android os, I appreciate if you share me your test result

@MuntashirAkon
Copy link
Author

getAttributeName If you can provide the values I can implement ResXmlPullParser#setDecoder(EntryStore entryStore, int currentPackageId)

Sorry, I don't know what values are you looking for. But this is precisely what is needed, I think.

getAttributeValue I don't see any purpose of this method except for logging, if you gonna to process this value latter you have to do some encoding again. e.g. dimension value 24.0dp. Anyways if dev can provide EntryStore and currentPackageId we can implement.

I understand that. But it has use-case when a dev is working with XmlPullParser interface rather than XmlResourceParser. This is why I proposed that it should return what we are already used to (e.g. for references, it is @0xxxxxxxxx, ?0xxxxxxxxx and so on). or deferencing the values using EntryStore. null isn't exactly a helpful output.

I didn't test this on android os, I appreciate if you share me your test result

I'm exclusively testing the library on Android. So, it's working as expected.

@MuntashirAkon
Copy link
Author

I have done a bit more analysis on this, and I think you should make geResXmlAttributeAt protected (maybe final) so that it's possible to subclass getAttributeValue methods to output whatever we want (we require lower level access to ResXmlAttribute for handling special cases). This, I think, is better than handling EntryStore in the ResXmlPullParser. Nonetheless, getAttributeValue should never null if it has a value (of any kind). What do you think?

@REAndroid
Copy link
Owner

it has use-case when a dev is working with XmlPullParser interface rather than XmlResourceParser

Convincing

and I think you should make geResXmlAttributeAt protected (maybe final) so that it's possible to subclass getAttributeValue methods to output whatever we want

I will do on next push, but i don't like protected , final and Immutable things unless it is very necessary for functionality or to avoid confusion resulting from too many public methods, some dev might need extend/override.

Since you are testing on android os, can you test class loader is using class XmlPullParser from our's or frameworks ? You can test simply by add the following method on XmlPullParser

public interface XmlPullParser{
    public static void checkThisClassUsed(){
        throw new RuntimeException("This class is used");
    }
   

Then call method XmlPullParser.checkThisClassUsed() somewhere (e.g application.onCreate) , you will get our exception or MethodNotFound

@MuntashirAkon
Copy link
Author

Since you are testing on android os, can you test class loader is using class XmlPullParser from our's or frameworks ?

It uses the framework implementation as expected.

Point 3 from the comment above is not yet implemented, it appears. It still does not return namespace attributes.

@REAndroid
Copy link
Owner

Here I tried to implement FEATURE_PROCESS_NAMESPACES & FEATURE_REPORT_NAMESPACE_ATTRIBUTES .

I didn't check the exact implementation of AOSP, I just assumed:

  • FEATURE_REPORT_NAMESPACE_ATTRIBUTES means treat/count namespaces as attribute
  • FEATURE_PROCESS_NAMESPACES means append prefix on attribute name

@MuntashirAkon
Copy link
Author

I didn't check the exact implementation of AOSP, I just assumed:

  • FEATURE_REPORT_NAMESPACE_ATTRIBUTES means treat/count namespaces as attribute
  • FEATURE_PROCESS_NAMESPACES means append prefix on attribute name

Your assumptions are almost correct. As per XmlPullParser specification, FEATURE_REPORT_NAMESPACE_ATTRIBUTES works only when both FEATURE_REPORT_NAMESPACE_ATTRIBUTES and FEATURE_PROCESS_NAMESPACES are set to true.

@REAndroid
Copy link
Owner

Here I tried strict implementation of XMLPULL. I am not sure I got it

REAndroid added a commit that referenced this issue Jun 18, 2023
@REAndroid
Copy link
Owner

@MuntashirAkon
If you are still on this project, here is the serializer part d9daea5

@MuntashirAkon
Copy link
Author

Thanks! I am actually quite busy rewriting some logic in my app (also had an accident two weeks ago), so haven't got the time reply to your last message. I'll surely test them all and reply as soon as I'm back on track.

REAndroid added a commit that referenced this issue Jun 19, 2023
@MuntashirAkon
Copy link
Author

It appears my tests are failing because both unknown reference markers, namely @ and ?, are now replaced by UNK_REF. So, basically all my tests were failing. I'm not sure how this is useful as opposed to those conventional markers:

expected:<...er"
    android:id="[@0x7f0902c2"
    android:layout_width="?0x7f0403c0"
    android:layout_height="?]0x7f0403c0">
  </Ima...> but was:<...er"
    android:id="[UNK_REF0x7f0902c2"
    android:layout_width="UNK_REF0x7f0403c0"
    android:layout_height="UNK_REF]0x7f0403c0">
  </Ima...>

All APK editing software use those conventional markers. So, it would be very confusing to the users if we use something different.

@REAndroid
Copy link
Owner

The reason I changed to UNK_* format is to support xml parsers for correct parsing of attribute name
<app @0x7f0902c2="abcd"> will throw exceptions due to character "@".

@MuntashirAkon
Copy link
Author

The reason I changed to UNK_* format is to support xml parsers for correct parsing of attribute name
<app @0x7f0902c2="abcd"> will throw exceptions due to character "@".

Two observations:

  1. UNK_ (or unk_) cannot be a unique identifier for attribute names. I would suggest using a shorter prefix that has hyphens in it (for example, r-) since it's legal in XML but illegal in Android XML.
  2. You might want to use this naming convention only for attribute names and not their values in order to support convension that reversers typically use. Don't know how difficult would it be though.

@REAndroid
Copy link
Owner

1. `UNK_` (or `unk_`) _cannot_ be a unique identifier for attribute names. I would suggest using a shorter prefix that has hyphens in it (for example, `r-`) since it's legal in XML but illegal in Android XML.

Decoding to XML is intended to un - obfuscated or lightly obfuscated resources, the dev is responsible to correct all obfuscation on binary resources or use JSON. r- can not unique either, I am not also happy with UNK_ . I am open to any suggestion with 1) Valid for all XML standard 2) Easily indicates it is unknown resource id

2. You might want to use this naming convention only for attribute names and not their values

My intention at the beginning was just for attribute names but I lost it somewhere.

  • Other brutal solution for this is create separate PackageBlock that stores all unknown ids and drop it latter just after compiling.
  • UNK_ATTR0x***** practically unique , someone might rename valid resource ids with this pattern but we can correct it ApkModuleDecoder#validateResourceNames

@MuntashirAkon
Copy link
Author

What I wrote above were only suggestions. You should be the one to decide how you would set standards, not us. But whatever you do, you should also take the conventional usage in mind and note down those that aren't so that we can also pass that to our users.

I think you should really think about making the output stable, that is, regardless of the changes in the API, the output should be consistent. And to do that, you should think about how cases like this would be taken care of before you do anything else. This way, developers like me can make the best use of this library without constantly thinking about the output representation each time an update is made. Changes in output representation also has other side-effects in real life. Suppose, a user has stored the output XML and wants to convert it into AXML later on but only to realise that it won't work because the parser can no longer parse the XML file properly.


Now, I think attribute names should be small and consistent. The attribute name documentation only suggest a handful of usable ASCII characters. So, if you're really looking to develop some meaningful standards for attribute names, here's what I think is more practical and android-way to solve this issue: Use the http://schemas.android.com/tools (tools) namespace. This reserved namespace is particularly intended for development purposes and cannot be used by others. So, if you have an attribute 0x7f0403c0, you may simply use tools:0x7f0403c0 as the attribute name. This should also be straightforward for reversers as they're also familiar with the tools namespace.

@REAndroid
Copy link
Owner

Thank you for brief suggestion !

The attribute name documentation only suggest a handful of usable ASCII

Characters 0xd8 Ø and 0xf8 ø catches my attention I am thinking of app:Øx7f0403c0 but since this char is on extended ASCII and not available on standard keyboards thus it affects practicality. BTW thank you for the link helps me a lot !

you may simply use tools:0x7f0403c0 as the attribute name

Name can not start with digits 0 or do mean letter O ?

Keep in mind that namespace is not available on some cases like values of styles and enums/flag

@MuntashirAkon
Copy link
Author

Name can not start with digits 0 or do mean letter O ?

As per the the docs, the actual name is http://schemas.android.com/tools:0x7f0403c0 not just 0x7f0403c0. So, this should be allowed.

@REAndroid
Copy link
Owner

As per the the docs, the actual name is http://schemas.android.com/tools:0x7f0403c0 not just 0x7f0403c0. So, this should be allowed.

Prefix doesn't count as a part of attribute name, unless you mod parser (I did this when we had used MXParser). Check it again

@MuntashirAkon
Copy link
Author

Prefix doesn't count as a part of attribute name, unless you mod parser (I did this when we had used MXParser). Check it again

You're right. XML validator seems to complain when tools:0x7f0403c0 is used as an attribute name. We still need to do something about 0x. Some posibilities:

  1. r0x7f0403c0
  2. res0x7f0403c0
  3. rx7f0403c0
  4. resx7f0403c0
  5. res7f0403c0
  6. r7f0403c0
  7. ...

@REAndroid
Copy link
Owner

Nice suggestions , I will try with r0x7f0403c0

REAndroid added a commit that referenced this issue Jul 19, 2023
@REAndroid
Copy link
Owner

REAndroid commented Jul 19, 2023

Now unknown resource ids are decoded as:

  • attribute name r0x12345678
  • reference value type @0x12345678
  • attribute value type ?0x12345678

NB: Use the new xml encode/decode class XmlCoder the old methods are going to be depreciated

@MuntashirAkon
Copy link
Author

Now unknown resource ids are decoded as:

  • attribute name r0x12345678
  • reference value type @0x12345678
  • attribute value type ?0x12345678

All tests are passing again. Thank you so much! I'll test out the serialiser more thoroughly and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants