Exponentially Faster Language Modelling, Peter Belcak+, N/A, arXiv'23 #1163

AkihikoWatanabe · 2023-11-23T22:24:13Z

URL

Language models only really need to use an exponential fraction of theirneurons for individual inferences. As proof, we present UltraFastBERT, a BERTvariant that uses 0.3% of its neurons during inference while performing on parwith similar BERT models. UltraFastBERT selectively engages just 12 out of 4095neurons for each layer inference. This is achieved by replacing feedforwardnetworks with fast feedforward networks (FFFs). While no truly efficientimplementation currently exists to unlock the full acceleration potential ofconditional neural execution, we provide high-level CPU code achieving 78xspeedup over the optimized baseline feedforward implementation, and a PyTorchimplementation delivering 40x speedup over the equivalent batched feedforwardinference. We publish our training code, benchmarking setup, and model weights.

言語モデルは、個々の推論において指数的な割合のニューロンしか必要としない。その証拠として、私たちはUltraFastBERTを提案する。UltraFastBERTは、推論時に0.3%のニューロンしか使用せず、同様のBERTモデルと同等の性能を発揮する。UltraFastBERTは、各レイヤーの推論に4095個のニューロンのうちわずか12個を選択的に使用する。これは、フィードフォワードネットワークを高速フィードフォワードネットワーク（FFF）で置き換えることによって実現される。現在、条件付きニューラル実行の完全な高速化ポテンシャルを引き出すための真に効率的な実装は存在しないが、最適化されたベースラインのフィードフォワード実装に比べて78倍の高速化を実現する高レベルのCPUコードと、バッチ処理されたフィードフォワード推論に対して40倍の高速化を実現するPyTorch実装を提供する。私たちは、トレーニングコード、ベンチマークのセットアップ、およびモデルの重みを公開している。

UltraFastBERTは、推論時にわずか0.3%のニューロンしか使用せず、同等の性能を発揮することができる言語モデルです。UltraFastBERTは、高速フィードフォワードネットワーク（FFF）を使用して、効率的な実装を提供します。最適化されたベースラインの実装に比べて78倍の高速化を実現し、バッチ処理された推論に対しては40倍の高速化を実現します。トレーニングコード、ベンチマークのセットアップ、およびモデルの重みも公開されています。

AkihikoWatanabe added the Pocket label Nov 23, 2023

AkihikoWatanabe changed the title あ Exponentially Faster Language Modelling, Peter Belcak+, N/A, arXiv'23 Nov 23, 2023

AkihikoWatanabe added Efficiency/SpeedUp NLP LanguageModel labels Nov 27, 2023