Disambiguate obfuscated/non-obfuscated names #718

justhecuke · 2022-08-30T00:34:15Z

My team and I have been developing passes for use internally but we commonly hit an issue where we interact with deobfuscated names (perhaps from a hardcoded list, usage.txt from proguard, or other sources) but then use them with names retrieved from DexClass or DexType reachable from scope which are generally obfuscated. This has caused many problems.

Is this also an issue for others? And does anyone have ideas for a fix?

My idea is to create two different types for obfuscated and deobfuscated strings and have all names derived from the APK be instances of obfuscated strings and where APIs that read deobfuscated names (ones that developers write for input names) use the deobfuscated type.

I'm willing to do the work to refactor, but this would be a huge change and I wanted to get feedback before starting work on what may be a doomed project.

The text was updated successfully, but these errors were encountered:

NTillmann · 2022-08-30T00:53:30Z

I don't fully understand the problem... I'd rather want to avoid another parallel obfuscated name infrastructure.

Are you passing a rename map into Redex, e.g. from a previous Proguard run?

A few thoughts / suggestions:

"deobfuscated names" in Redex should always reflect the original name of an entity.
Run obfuscation / renaming passes towards the very end of your pass list.
Every pass can override an "eval_pass" method. All "eval_pass" methods run before any "run_pass". This would be a good place to look up some named entities (DexClass, DexMethod, etc.), and store them in some pass state. Note that this is only a good idea if those entities have keep rules, so that the pointers to them won't get released during optimizations such as RemoveUnreachablePass.

Does that help?

justhecuke · 2022-08-30T17:46:17Z

We run proguard obfuscation before redex since some passes require a proguard mapping file in order to run properly. I'm not sure why this is the case but it is (or was when we first adopted redex). This means we have a bunch of obfuscated classes in the mix as well as the rename pass to further shrink the renaming although this runs second to last right before RegAllocPass.

This leads to an error-prone situation where some pass developers forget that they are working with obfuscated/deobfuscated strings and try to do comparisons between them which always fails leading to problems. We've hit this several times already and were trying to come up with a fix to make the situation less error-prone.

While using eval_pass may fix the immediate issue of mixing obfuscated and deobfuscated strings, it requires that the developer realize they are doing the wrong thing which is the actual problem we are having. Once we realize we've made this mistake again, the fix is usually relatively simple -- DexClass allows you to get its deobfuscated name so it's mostly a matter of creating a deobfuscated name -> obfuscated class/type mapping to normalize on using obfuscated names.

Every developer we have working on redex passes has made this mistake at least once.

thezhangwei · 2022-08-30T19:25:52Z

It sounds like in your situation the deobfuscated_name is already "obfuscated" by Proguard, is that right?

At one point, we were in the same spot. Maybe @ssj933 has more context on the Proguard deprecation history.

But I wonder how often does your optimization logic rely on the names of a type/class. That's not very common for us. In most cases, we rely on the name of some key classes that are the root of a hierarchy. Most of them are not renamed either because they are external (part of JDK), or marked as "donotrename" in the dependency (Kotlin runtime).

My idea is to create two different types for obfuscated and deobfuscated strings and have all names derived from the APK be instances of obfuscated strings

Do you plan to add an option to plugin the Proguard rename map into the deobfuscated_names in Redex?

justhecuke · 2022-08-30T19:41:22Z

I'm pretty sure the proguard map is already taken into account for the deobfuscated_name. That isn't a problem. It's just the ease with which new developers can forget to translate between de obfuscated and obfuscated. We have some cases where we refer to classes by their deobfuscated names such as when we input proguard's usage.txt or are dealing with some internal special classes.

…

On Tue, Aug 30, 2022, 12:26 PM Wei Zhang ***@***.***> wrote: It sounds like in your situation the deobfuscated_name is already "obfuscated" by Proguard, is that right? At one point, we were in the same spot. Maybe @ssj933 <https://github.com/ssj933> has more context on the Proguard deprecation history. But I wonder how often does your optimization logic rely on the names of a type/class. That's not very common for us. In most cases, we rely on the name of some key classes that are the root of a hierarchy. Most of them are not renamed either because they are external (part of JDK), or marked as "donotrename" in the dependency (Kotlin runtime). My idea is to create two different types for obfuscated and deobfuscated strings and have all names derived from the APK be instances of obfuscated strings Do you plan to add an option to plugin the Proguard rename map into the deobfuscated_names in Redex? — Reply to this email directly, view it on GitHub <#718 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQ3JAL7TY3YG5SXNYV6LWZLV3ZN4ZANCNFSM577VSLLA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

thezhangwei · 2022-08-30T22:54:26Z

It's just the ease with which new
developers can forget to translate between de obfuscated and obfuscated.

Yeah we have to live with that. It hasn't been a major concern tho. Can I ask why is get_deobfuscated_name not enough for you? It should always give you the deobfuscated_name, and that's what you want right?

NTillmann · 2022-08-30T23:13:12Z

We run proguard obfuscation before redex since some passes require a proguard mapping file in order to run properly. I'm not sure why this is the case but it is (or was when we first adopted redex).

Many passes require that proguard keep rules have been communicated to Redex via some ProGuard config files (and even that is just a superficial sanity check), but I am not aware of any relevant passes that would require an actual pro guard (class-name) mapping.

If at all possible, I suggest that you stop running Proguard to obfuscate classes first. That's the direction we've been taking.

justhecuke · 2022-08-31T00:34:42Z

It's just the ease with which new
developers can forget to translate between de obfuscated and obfuscated.

Yeah we have to live with that. It hasn't been a major concern tho. Can I ask why is get_deobfuscated_name not enough for you? It should always give you the deobfuscated_name, and that's what you want right?

It's what we want, but we also want it to be obvious that we need to use get_deobfuscated_name. The issue is that our devs sometimes forget to translate to/from deobfuscated strings and things end up not working right and it takes a while to track down what went wrong.

If we had this typing difference, we'd know at compile-time what we made a mistake.

thezhangwei · 2022-08-31T19:31:43Z

If we had this typing difference, we'd know at compile-time what we made a mistake.

IC. Thanks for explaining that. It's more clear to me what this is about now.

Sounds like you are proposing adding subtypes to DexString to encode obfuscation status into types? Or something similar? Personally I'm open to a change like that as long as it does not require a sweeping update for all the existing name handling logic in the code base.

justhecuke · 2022-08-31T21:07:46Z

Yes, that's pretty much what I am suggesting. I think we can keep DexString (and probably std::string as well) as a common parent and treat obfuscated and deobfuscated types as specializations to keep compatibility with existing code.

…

On Wed, Aug 31, 2022, 12:31 PM Wei Zhang ***@***.***> wrote: If we had this typing difference, we'd know at compile-time what we made a mistake. IC. Thanks for explaining that. It's more clear to me what this is about now. Sounds like you are proposing adding subtypes to DexString to encoding obfuscation status into types. Or something similar? Personally I'm open to a change like that as long as it does not require a sweeping update for existing name handling logic in the code base. — Reply to this email directly, view it on GitHub <#718 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQ3JALZVEFZZS6YFWQKGKI3V36XKVANCNFSM577VSLLA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

thezhangwei · 2022-09-01T18:05:15Z

SG. I will be open to see what the change looks like.

But I guess there's an underlying question about the differences in how we set things up. Like in your case, there's more interaction between the preceding Proguard and Redex. But we have deprecated Proguard a few years back. So the vast majority of the symbol obfuscation is done by Redex (towards the end of the optimization pipeline). That's maybe a worthwhile discussion by itself.

justhecuke · 2022-09-01T18:12:43Z

It's a good discussion to have, deprecating proguard. But we can't do it in our build because things start crashing when proguard does not run obfuscation. Must have too many Gradle plugins that depend on proguardisms that we can look into later.

…

On Thu, Sep 1, 2022, 11:05 AM Wei Zhang ***@***.***> wrote: SG. I will be open to see what the change looks like. But I guess there's an underlying question about the differences in how we set things up. Like in your case, there's more interaction between the preceding Proguard and Redex. But we have deprecated Proguard a few years back. So the vast majority of the symbol obfuscation is done by Redex (towards the end of the optimization pipeline). That's maybe a worthwhile discussion by itself. — Reply to this email directly, view it on GitHub <#718 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQ3JAL6YWAQA3PF7CSJLS6DV4DV6NANCNFSM577VSLLA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

NTillmann · 2022-09-01T23:49:02Z

Obfuscated entities in dex files may include types, field names, method names, method protos with their contained result types and (argument) type lists. Do you want subclasses for DexType, DexField, etc.? For DexField and DexMethod we already have a special thing going where we play games with the -Ref subtypes. I wonder how the different subtypes would affect existing passes.

For example, strings, there's a rule that there is only one DexString* for each unique string payload. Would your subtype get a different pointer address for identical strings? This would create problems, semantic and optimization-quality-wise.

In general, the Dex* concepts mirror quite closely concepts that are found in dex files, including uniqueness properties.

Regarding DexStrings in particular, we've tried to make sure they are very lightweight. On an 64-bit platform, each DexString takes up 16 bytes, and doesn't have a vtable. I'd like to keep that.

Anyway, I am not sure what you exactly you have in mind. Some examples for potential API changes would help :-)

justhecuke mentioned this issue Nov 17, 2022

Fix RemoveUninstantiablesPass in face of Proguard stripping #744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disambiguate obfuscated/non-obfuscated names #718

Disambiguate obfuscated/non-obfuscated names #718

justhecuke commented Aug 30, 2022

NTillmann commented Aug 30, 2022

justhecuke commented Aug 30, 2022

thezhangwei commented Aug 30, 2022

justhecuke commented Aug 30, 2022 via email

thezhangwei commented Aug 30, 2022

NTillmann commented Aug 30, 2022 •

edited

justhecuke commented Aug 31, 2022

thezhangwei commented Aug 31, 2022 •

edited

justhecuke commented Aug 31, 2022 via email

thezhangwei commented Sep 1, 2022

justhecuke commented Sep 1, 2022 via email

NTillmann commented Sep 1, 2022

Disambiguate obfuscated/non-obfuscated names #718

Disambiguate obfuscated/non-obfuscated names #718

Comments

justhecuke commented Aug 30, 2022

NTillmann commented Aug 30, 2022

justhecuke commented Aug 30, 2022

thezhangwei commented Aug 30, 2022

justhecuke commented Aug 30, 2022 via email

thezhangwei commented Aug 30, 2022

NTillmann commented Aug 30, 2022 • edited

justhecuke commented Aug 31, 2022

thezhangwei commented Aug 31, 2022 • edited

justhecuke commented Aug 31, 2022 via email

thezhangwei commented Sep 1, 2022

justhecuke commented Sep 1, 2022 via email

NTillmann commented Sep 1, 2022

NTillmann commented Aug 30, 2022 •

edited

thezhangwei commented Aug 31, 2022 •

edited