Skip to content

Severe Performance Loss (450x) in Unicorn ARM64-to-ARM64 Emulation on Android? #2168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Gavin0210 opened this issue Apr 21, 2025 · 7 comments

Comments

@Gavin0210
Copy link

When using Unicorn Engine to emulate ARM64 code on an Android ARM64 device, the performance loss reaches 450x even with no hooks enabled. This seems abnormally high for same-architecture emulation.
Is this normal? If not, what are the possible reasons?
The code is
extern "C" int64_t looptest(){ int64_t j=0; for (int64_t i=0;i<1000000;i++){ j+=1; } return j; }

@wtdcode
Copy link
Member

wtdcode commented Apr 21, 2025

The slowdown is expected but 450x sounds too much for me. How to reproduce?

@Gavin0210
Copy link
Author

Gavin0210 commented Apr 21, 2025

Native code executed the looptest natively and simulated the looptest via Unicorn, then output the time difference. According to the log, the time difference is 500x.

the unicorn is https://github.com/saicao/unicorn/tree/master patched by saicao last year.
I build it myself to android. Because UC_HOOK_MEM_READ only triggered once#1908 in arm64 and saicao fix it

this Is log

2025-04-21 17:01:29.917 10195-10195 testuniLog              com.example.testunicorn              I  looptest addr is 7d8ef19930
2025-04-21 17:01:29.917 10195-10195 testuniLog              com.example.testunicorn              I  looptest 2697  loop count 1000000
2025-04-21 17:01:31.402 10195-10195 testuniLog              com.example.testunicorn              I  success
2025-04-21 17:01:31.402 10195-10195 testuniLog              com.example.testunicorn              I  emu time 1482573 

this code

extern "C"
int64_t looptest(){
    int64_t j=0;
    for (int64_t i=0;i<1000000;i++){
        j+=1;
    }
    return j;
}

void uc_emu(){
    uc_engine *uc=NULL;
    uc_err err = uc_open(UC_ARCH_ARM64, UC_MODE_ARM, &uc);
    if (err) {
        LOGI("Failed on uc_open() with error returned: %u (%s)\n", err,
             uc_strerror(err));
        return ;
    }

    uc_ctl_set_cpu_model(uc, UC_CPU_ARM64_A72);
    uc_mem_map(uc,(uint64_t )looptest&0xfffffffffffff000,4096,7);
    uc_mem_write(uc,(uint64_t )looptest,(void *)looptest,100);
    uint64_t sp_w=4096*2;
    uc_mem_map(uc,4096,4096*2,7);
    uc_reg_write(uc, UC_ARM64_REG_SP, &sp_w);
    struct timeval tv,tv2;
    gettimeofday(&tv, NULL);
    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);
    gettimeofday(&tv2, NULL);
    if (err) {
        LOGI("Failed on uc_emu_start with error returned: %u (%s)\n", err,
             uc_strerror(err));
        return ;
    }else{
        LOGI("success");
        LOGI("emu time %ld ",(tv2.tv_sec-tv.tv_sec)*1000000+tv2.tv_usec-tv.tv_usec);
    }
}



extern "C" JNIEXPORT jstring
JNICALL
Java_com_example_testunicorn_MainActivity_speedtest(
        JNIEnv *env,
        jobject a2 /* this */) {
    struct timeval tv,tv2;
    gettimeofday(&tv, NULL);
    int64_t j= looptest();
    gettimeofday(&tv2, NULL);
    LOGI("looptest addr is %lx",looptest);
    LOGI("looptest %ld  loop count %ld",(tv2.tv_sec-tv.tv_sec)*1000000+tv2.tv_usec-tv.tv_usec,j);
    uc_emu();
    return env->NewStringUTF("");
} 

@wtdcode

@wtdcode
Copy link
Member

wtdcode commented Apr 21, 2025

saicao’s patch wasn’t correct. You should try current dev/master instead.

@Gavin0210
Copy link
Author

Gavin0210 commented Apr 21, 2025

saicao’s patch wasn’t correct. You should try current dev/master instead.

I try the dev branch. there is the new log. The performance loss reaches 450x

2025-04-21 17:59:15.087 10836-10836 testuniLog              com.example.testunicorn              I  looptest addr is 7d8c16ee40
2025-04-21 17:59:15.087 10836-10836 testuniLog              com.example.testunicorn              I  looptest 2915  loop count 1000000
2025-04-21 17:59:16.415 10836-10836 testuniLog              com.example.testunicorn              I  success
2025-04-21 17:59:16.415 10836-10836 testuniLog              com.example.testunicorn              I  emu time 1325992 

@wtdcode

@PhilippTakacs
Copy link
Contributor

    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);

Are you sure this is correct? I doubt that the looptest function has exactly 0x50 bytes. A bit a better way to test would be something like this:

err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);

But this is still not complete correct, because looptest returns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.

@Gavin0210
Copy link
Author

    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);

Are you sure this is correct? I doubt that the looptest function has exactly 0x50 bytes. A bit a better way to test would be something like this:

err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);

But this is still not complete correct, because looptest returns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.

the ret of looptest exactly looptest+0x50 in ida

@wtdcode
Copy link
Member

wtdcode commented Apr 22, 2025

    err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)looptest+0x50, 0, 0);

Are you sure this is correct? I doubt that the looptest function has exactly 0x50 bytes. A bit a better way to test would be something like this:

err = uc_emu_start(uc, (uint64_t)looptest, (uint64_t)uc_emu, 0, 0);

But this is still not complete correct, because looptest returns so this might jump to an stack address (which is not initialized). Also the compiler is not required to sort the functions in the order of the source file.

I sometimes used this approach for very quick testing. By using the flags like -O3, gcc will tend to fully use registers in this case. But it is not portable for sure and I'm afraid UB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants