New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance loss on ARM CPU #1690

Closed
aircrack-ng opened this Issue Mar 10, 2018 · 15 comments

Comments

3 participants
@aircrack-ng
Owner

aircrack-ng commented Mar 10, 2018

Reported by misterx on 22 Jun 2016 01:40 UTC

1.2rc4 brought improved performance. On x86 CPUs. On ARM such as the Raspberry Pi, the performance was worse than before.

In this forum post, it shows the loss:

with aircrack rc3, the test "aircrac-ng -S" gave some comfortable numbers (on both debian 7.8 and 8.0)
raspi2 about 530 k/sec
rsspi3 about 925 k/sec

now with aircrack rc4 (on both debian 7.8 and 8.0)
raspi2 about 139 k/sec
raspi3 about 234 k/sec

@aircrack-ng aircrack-ng added this to the 1.3 milestone Mar 10, 2018

@aircrack-ng

This comment has been minimized.

Owner

aircrack-ng commented Apr 2, 2018

beta3 is around 470k/sec
current git is around 170k/sec

@jbenden

This comment has been minimized.

Collaborator

jbenden commented Apr 2, 2018

Does this chip support NEON? Did aircrack-ng launch the NEON version, too. If it supports NEON, it might be the trampoline binary does not detect NEON on this chip for some reason...

@aircrack-ng

This comment has been minimized.

Owner

aircrack-ng commented Apr 2, 2018

It was on a raspberry Pi 3 so I assume it supports NEON.

@jbenden

This comment has been minimized.

Collaborator

jbenden commented Apr 4, 2018

I would need a Pi 3 in order to debug this matter. I do not have one available for testing and development purposes.

@kimocoder

This comment has been minimized.

Contributor

kimocoder commented Apr 4, 2018

May give you SSH access to mine in 2 days if you need it @jbenden :)

@jbenden

This comment has been minimized.

Collaborator

jbenden commented Apr 4, 2018

I may need to take you up on the offer.

In the meantime, would you please post the output of: cat /proc/cpuinfo && gcc -v

@kimocoder

This comment has been minimized.

Contributor

kimocoder commented Apr 4, 2018

Im not home at the moment, but in two days I'll be at my place for a short time, so I may just set it up for you before I leave again. I'll give you a heads up.

@jbenden

This comment has been minimized.

Collaborator

jbenden commented May 13, 2018

Is the performance problem still present in the latest master branch? If so, please test using the commands below, with the binaries from the latest master branch:

./src/aircrack-ng--generic -S
./src/aircrack-ng--neon -S

Also, please post the output of:

uname -a
cat /proc/cpuinfo

Thanks,
-Joe

@aircrack-ng

This comment has been minimized.

Owner

aircrack-ng commented May 14, 2018

Generic: 158k/s
Neon: 160k/s

uname: Linux kali 4.9.80-Re4son-v7+ #1 SMP Fri Apr 20 18:17:48 CDT 2018 armv7l GNU/Linux
/proc/cpuinfo:

processor	: 0
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 38.40
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 1
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 38.40
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 2
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 38.40
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 3
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 38.40
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

Hardware	: BCM2835
Revision	: a22082
@jbenden

This comment has been minimized.

Collaborator

jbenden commented Jun 4, 2018

Please try out the current master branch.

I've fixed a significant problem which should now help towards solving this bug report.

Please let me know how testing goes...

Thanks,
-Joe

@aircrack-ng

This comment has been minimized.

Owner

aircrack-ng commented Jun 4, 2018

ASIMD: 293 k/s
Neon: 293 k/s
Generic: 156 k/s

For comparison, rc3 on the same system: 604 k/s

@jbenden

This comment has been minimized.

Collaborator

jbenden commented Jun 5, 2018

This doesn't make sense. I am getting 500+, now, on my RPi 3B.

@aircrack-ng

This comment has been minimized.

Owner

aircrack-ng commented Jun 5, 2018

Tested again, same numbers, on a regular RPi 3B, non plus.

Do you have a 3B or the 3B+?

@jbenden

This comment has been minimized.

Collaborator

jbenden commented Jun 5, 2018

Please compress to a tarball both of the folders and upload them to my server. I need to further debug this matter.

Thanks!
-Joe

@jbenden

This comment has been minimized.

Collaborator

jbenden commented Jun 7, 2018

Problem computing statistics on non-Intel/AMD systems

TL;DR: The previous version computed the statistic wrong - divide it by four (4).

Long Explaination

1.2-rc-3

On line 3943, it defaults the number of parallel computations to one (1). The set of following lines will set it to four (4), only, if an i386 or amd64 machine.

Next, beginning at line 4092, the number of attempted computations performed are incremented by a constant value of four (4). This might then decremented by a constant of three (3), if key[1][0], key[2][0], and key[3][0] are zeroed.

Finally, on line 4101, disaster strikes, as the smoking gun, always increments nb_kprev by four, regardless of how many computations performed. This is then the primary variable utilized in the statistics calculation.

Upon the speed test results being displayed (around line 3869), we see that computation is affected only by the global nb_kprev variable; which was always incremented by a value of four (4), as described above.

This then leads to a displayed value that is wrong. If the displayed value is divided by four (4), one then has the actual number of keys tried per second.

In short, at least rc-3, and possibly older, computes all non-ia32 machines incorrectly. Previous speed test results must be divided by four (4) to obtain the real performance metric.

-Joe

@jbenden jbenden closed this Jun 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment