Binary and Ternary Natural Language Generation, Zechun Liu+, N/A, arXiv'23 #735

AkihikoWatanabe · 2023-06-16T12:32:02Z

URL

https://arxiv.org/abs//2306.01841

Affiliations

Zechun Liu, N/A
Barlas Oguz, N/A
Aasish Pappu, N/A
Yangyang Shi, N/A
Raghuraman Krishnamoorthi, N/A

Abstract

Ternary and binary neural networks enable multiplication-free computation andpromise multiple orders of magnitude efficiency gains over full-precisionnetworks if implemented on specialized hardware. However, since both theparameter and the output space are highly discretized, such networks haveproven very difficult to optimize. The difficulties are compounded for theclass of transformer text generation models due to the sensitivity of theattention operation to quantization and the noise-compounding effects ofautoregressive decoding in the high-cardinality output space. We approach theproblem with a mix of statistics-based quantization for the weights and elasticquantization of the activations and demonstrate the first ternary and binarytransformer models on the downstream tasks of summarization and machinetranslation. Our ternary BART base achieves an R1 score of 41 on theCNN/DailyMail benchmark, which is merely 3.9 points behind the full model whilebeing 16x more efficient. Our binary model, while less accurate, achieves ahighly non-trivial score of 35.6. For machine translation, we achieved BLEUscores of 21.7 and 17.6 on the WMT16 En-Ro benchmark, compared with a fullprecision mBART model score of 26.8. We also compare our approach in the 8-bitactivation setting, where our ternary and even binary weight models can matchor outperform the best existing 8-bit weight models in the literature. Our codeand models are available at:https://github.com/facebookresearch/Ternary_Binary_Transformer

Translation (by gpt-3.5-turbo)

三値および二値ニューラルネットワークは、専用ハードウェア上で実装された場合、乗算を必要とせず、多数の桁の効率的な利益を約束する。しかし、パラメーターと出力空間の両方が高度に離散化されているため、このようなネットワークを最適化することは非常に困難であることが証明されている。Transformerテキスト生成モデルのクラスに対しては、注意操作が量子化に対して敏感であり、高次元の出力空間における自己回帰デコーディングのノイズ複合効果があるため、問題が複雑化する。我々は、重みの統計に基づく量子化と活性化の弾性量子化の混合によって問題に取り組み、要約と機械翻訳の下流タスクで最初の三値および二値Transformerモデルを実証する。我々の三値BARTベースは、CNN/DailyMailベンチマークでR1スコア41を達成し、完全モデルからわずか3.9ポイント遅れており、16倍効率的である。我々のバイナリモデルは、より正確ではないが、非常に重要なスコア35.6を達成している。機械翻訳においては、WMT16 En-RoベンチマークでBLEUスコア21.7および17.6を達成し、完全精度mBARTモデルスコア26.8と比較している。また、8ビット活性化設定でのアプローチを比較し、文献中の最高の既存の8ビット重みモデルを三値およびバイナリ重みモデルで一致または上回ることができることを示した。我々のコードとモデルは、https://github.com/facebookresearch/Ternary_Binary_Transformerで利用可能である。

Summary (by gpt-3.5-turbo)

三値および二値ニューラルネットワークを最適化することは困難であるが、重みの統計に基づく量子化と活性化の弾性量子化の混合によって問題に取り組み、要約と機械翻訳の下流タスクで最初の三値および二値Transformerモデルを実証する。三値BARTベースは、CNN/DailyMailベンチマークでR1スコア41を達成し、16倍効率的である。バイナリモデルは、非常に重要なスコア35.6を達成している。機械翻訳においては、WMT16 En-RoベンチマークでBLEUスコア21.7および17.6を達成し、8ビット重みモデルで一致または上回ることができることを示した。

AkihikoWatanabe added the Pocket label Jun 16, 2023

AkihikoWatanabe changed the title あ Binary and Ternary Natural Language Generation, Zechun Liu+, N/A, arXiv'23 Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary and Ternary Natural Language Generation, Zechun Liu+, N/A, arXiv'23 #735

Binary and Ternary Natural Language Generation, Zechun Liu+, N/A, arXiv'23 #735

AkihikoWatanabe commented Jun 16, 2023 •

edited

Binary and Ternary Natural Language Generation, Zechun Liu+, N/A, arXiv'23 #735

Binary and Ternary Natural Language Generation, Zechun Liu+, N/A, arXiv'23 #735

Comments

AkihikoWatanabe commented Jun 16, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Jun 16, 2023 •

edited