SwiftFormer meets Android #14

escorciav · 2024-01-09T10:25:55Z

As mentioned in #13 , I forked the project to bring SwiftFormer onto Android (in Qualcomm hardware).

As of today, the performance of a single block as it's not encouraging under 2.2 msec. Gotten in S23 Utral S8G2 with QNN 2.16, details here

escorciav · 2024-01-09T12:16:15Z

Update. The results were so discouraging that I had to benchmark SwiftFormer_L1 (as in the paper?). The results with S23 Utral S8G2 with QNN 2.16 are worse than in the iPhone. But, perhaps decent, under 2.7 msec.

Amshaker · 2024-01-09T12:36:54Z

Thank you for the update.

Could you kindly conduct benchmark tests for both (MobileViT or MobileViT2×1) in addition to EfficientFormer_L1? I understand that we may not achieve the exact performance on the S23 (Ultra) as observed on the iPhone 14 (Pro Max) due to variations in hardware.

Please note that EfficientFormer_L1 has demonstrated comparable speed to SwiftFormer_L1 on the iPhone 14 (Pro Max). If you manage to replicate EfficientFormer_L1 on the S23 Ultra with a runtime of 2.63 msec, it suggests that the ANE of the iPhone 14 Pro Max is faster than the GPU or ANE on the S23 Ultra. If EfficientFormer_L1 significantly outperforms SwiftFormer_L1, it may indicate that the activations, normalization, and certain layers of SwiftFormer_L1 are not optimized for the S23 Ultra. This could mean that SwiftFormer requires additional optimization for optimal performance on this hardware.

I would appreciate your thoughts on this proposed plan.

Thank you.

escorciav · 2024-01-09T14:23:20Z

Agree, SwiftFormer_L1 (PytTorch implementation) + QNN 2.16 (+ my way of porting) may be leaving room for optimization.

😉 I will leave it to someone else as:

I'm kinda happy with the runtime,
I'm not interested in the architectures mentioned above atm 😆
Qualcomm does not pay my bills 🙃 ( for optimizing 3rdparty models on their hardware)

Perhaps add/edit your message with the relevant links for those arch 😊

Amshaker · 2024-01-09T15:35:16Z

I can do that soon and will update you 😄

I would be grateful if you could provide details on the steps or requirements involved in measuring the inference time on the S23 Ultra. For iOS, Apple has introduced a valuable feature in their IDE (Xcode 14) that allows for the measurement of prediction time, load time, and compilation time. Could you please share this information or update the forked repository with these specific details on Android? I am following your repo and already checked the export file.

escorciav · 2024-01-09T16:15:09Z

There are multiple ways to port a ML model on Android 😊. Feel free to rename the issue accordingly. I wrote it in that way for marketing reasons 😉

My approach is specific to Qualcomm hardware using QNN.

My fork has the script used to export onto ONNX
Then, it's just the QNN pipeline.
1. conversion to cpp
2. model library generation
3. (optional, yet recommended for fast inference & speed up trials) context (aka npu/dsp/gpu) library generation
profiling & execution of binaries from step 2

I'm preparing a tutorial for other folks in my org. I will share the slides later in Q1/Q2.

escorciav · 2024-01-09T16:53:27Z

Attaching the latency results,

The JSON file with _basic corresponds to the most reliable results.
The model was run over 100 times.
I believe that I use the fast mode, so less energy efficient, of the S23 Ultra HTP (i.e., the DSP/NPU w/o Qualcomm marketing).

The JSON files were generated with an internal/private tool. However, QNN docs provide all the info to parse the binary with the profiling results from step 3. The TXT-file was generated by a tiny wrapper digesting the JSON.

report_ops.txt
model.iters-100.qnn.int8.json
model.iters-100.qnn.int8_basic.json

escorciav · 2024-01-09T17:31:14Z

(perhaps) Good news, the latency of the block that I'm interested in improving got a speed-up of 1.27x by using QNN >= 2.17

With enough ⭐s on my fork, I may be persuaded to benchmark SwiftFormer L1 😊 🤣

Amshaker · 2024-01-09T17:40:36Z

That's great! 🚀
You have one star now, come on! 🤣

If you benchmarked SwiftFormer models (Let's say L1), we can do a pull request and I will add you as a contributor to the main repo with a special shoutout in the acknowledgments 👀. Isn't it a good deal? 🤣

escorciav · 2024-01-10T10:43:41Z

Push latency performance of SwiftFormerL1 with QNN 2.17 & 2.18. Improvement is as much 1.16x

we can do a pull request and I will add you as a contributor to the main repo with a special shoutout in the acknowledgments 👀. Isn't it a good deal?

Done with 80% of my duties. Awaiting instruction for the 20% & collecting the brownie points mentioned earlier 🍪

Amshaker · 2024-01-10T12:34:45Z

You have my word on it 💯. Here we go!

Please create a pull request to the readme file of the main repo with the following change: Create a new sub-section under "Latency Measurement" named as SwiftFormer meets Android (I liked the name). With this section, you can add the two tables (SwiftFormer Encoder & SwiftFormer-L1) for the latency measurements with the variants of QNN (Feel free to add the scripts as well). Then, I will check & merge the pull request and you will automatically added as a contributor! 🚀. Following this, I'll update the acknowledgment, earning you a well-deserved second brownie 🍪

Community drive contributions: SwiftFormer meets Android. Qualcomm S8G2 DSP/HTP hardware, via Qualcomm tooling (QNN). Details in Amshaker#14. Work done by @3scorciav . Refer to his fork for details.

escorciav · 2024-01-12T21:44:28Z

Thanks for merging 🥰. Let's keep the issue for 6-12 months in case someone else is interested in improving runtime performance, or exploring other porting avenue for Android 😉

escorciav mentioned this issue Jan 12, 2024

Update README.md #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SwiftFormer meets Android #14

SwiftFormer meets Android #14

escorciav commented Jan 9, 2024

escorciav commented Jan 9, 2024

Amshaker commented Jan 9, 2024 •

edited

escorciav commented Jan 9, 2024 •

edited

Amshaker commented Jan 9, 2024

escorciav commented Jan 9, 2024 •

edited

escorciav commented Jan 9, 2024 •

edited

escorciav commented Jan 9, 2024 •

edited

Amshaker commented Jan 9, 2024

escorciav commented Jan 10, 2024

Amshaker commented Jan 10, 2024

escorciav commented Jan 12, 2024

SwiftFormer meets Android #14

SwiftFormer meets Android #14

Comments

escorciav commented Jan 9, 2024

escorciav commented Jan 9, 2024

Amshaker commented Jan 9, 2024 • edited

escorciav commented Jan 9, 2024 • edited

Amshaker commented Jan 9, 2024

escorciav commented Jan 9, 2024 • edited

escorciav commented Jan 9, 2024 • edited

escorciav commented Jan 9, 2024 • edited

Amshaker commented Jan 9, 2024

escorciav commented Jan 10, 2024

Amshaker commented Jan 10, 2024

escorciav commented Jan 12, 2024

Amshaker commented Jan 9, 2024 •

edited

escorciav commented Jan 9, 2024 •

edited

escorciav commented Jan 9, 2024 •

edited

escorciav commented Jan 9, 2024 •

edited

escorciav commented Jan 9, 2024 •

edited