Skip to content

bri25yu/LanguageModelExperimentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LanguageModelExperimentation

Research conducted under Prof. Kurt Keutzer at Berkeley Artificial Intelligence Research (BAIR).

Example setup

# Install conda if not already installed
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh

conda env create -f environment.yml
conda activate LME

deepspeed run.py

python scripts/read_results.py

Max batch sizes

For tib to eng translation:

GPU size Model Batch size
24GB NLLB 600M 16
NLLB 1B 4
mT5 600M 8
mT5 1B 4
49GB NLLB 1B 16
NLLB 3B 16
mT5 1B 16
mT5 3B 4
mT5 13B ?

For Flores200:

GPU size Precision Model Seq len Batch size
24GB BF16 mT5 300M 128 32
24GB BF16 mT5 300M 256 16
24GB FP32 mT5 300M 128 8
24GB BF16 mT5 600M 256 8
24GB BF16 mT5 1B 256 4
48GB BF16 mT5 300M 128 64
48GB BF16 mT5 300M 128 64
48GB BF16 mT5 1B 128 32
48GB BF16 mT5 1B 256 16
48GB BF16 mT5 3B 256 4

Ideas

  • Activation function diversification
  • Single layer model with many attention heads

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published