Skip to content

Some questions #1

@Git-JekG

Description

@Git-JekG

Great work. The idea of statically devirtualizing Themida through symbolic evaluation and aggressive optimization is fascinating.

After reading the article, I was left with a few technical questions that I hope you might be willing to elaborate on.

The most interesting part of the article, in my opinion, is control-flow recovery. It also seems to be the area where the least implementation detail is provided. Would it be fair to say that recovering virtual control flow is the main remaining challenge, while most VM handler logic naturally collapses through symbolic evaluation, constant propagation, dead-store elimination, and related optimization passes?

Regarding VMEXITs, how are they discovered automatically? Do you rely on Themida-specific signatures or patterns, or is there a more generic mechanism for identifying exits from the virtual machine?

The article mentions that Themida-specific knowledge is mainly required for control-flow recovery. Could you elaborate on how virtual conditional branches are recovered? How do you determine the targets of virtual JCCs during symbolic evaluation, especially when dealing with more complex control-flow structures?

How do you handle path explosion when exploring virtualized control flow? Are there any techniques such as path merging, state pruning, selective exploration, or other heuristics involved?

I am also curious about the practical limits of the approach. How well does it perform on larger real-world protected functions containing loops, indirect branches, virtual switches, and more complex CFGs? Do you have any statistics regarding success rates, coverage, or known limitations?

Was the implementation developed and tested against a specific Themida generation, or does it work across multiple versions of Themida and CodeVirtualizer?

Finally, how much of the process is currently fully automated? Are there still scenarios where manual analyst intervention is required, or can the pipeline run end-to-end without user guidance?

I would also be very interested in any future documentation describing the IR design, symbolic evaluation engine, and CFG recovery algorithms in greater detail.

Thank you for sharing the research and open-sourcing the project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions