GitHub - JackHanke/qt: qt 1B is a from-scratch LLM

           ___        ___             
          /\  \      /\  \            
         /88\  \     \8\  \           
        /8/\8\  \     \8\  \          
        \8\~\8\__\    /88\  \         
         \8\/8/  /   /8/\8\__\        
          \88/  /   /8/  \/__/        
          /8/__/   /8/  /             
          \8\__\   \/__/              
           \/__/

qt (pronounced "cutie") is a 1 billion parameter hand coded, from-scratch uncased english-only language model.

Model Card

qt is a dense GQA ALiBi/NoPE flash attn transformer. We use RMSNorm and GELU activations.

Vocab Size: 10,001
Parameters: 1.01B
    Embedding: 
    Non-embedding: 
d_model = 2048
ffw_size = 8196
n_heads = 32
n_heads_kv = 8
n_layers = 22
seq_len = 512

Data

Pretraining

For pretraining, I source my data from the fineweb-edu dataset

The pretraining dataset is a ~21.5B token subset of the above dataset, formatted in groups of 2.15GB parquet files each containg ~754M tokens each.

The learning rate schedule is Warmup Stable Decay

Finetuning

TODO

Tokenizer

Custom HuggingFace tokenizer trained on uncased english with a vocab_size of 10,001, stored at data/tokenizer.json.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
logs		logs
models		models
.gitignore		.gitignore
README.md		README.md
pretrain.py		pretrain.py
qt.py		qt.py
speed_benchmark.py		speed_benchmark.py
workspace.ipynb		workspace.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Card

Data

Pretraining

Finetuning

Tokenizer

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model Card

Data

Pretraining

Finetuning

Tokenizer

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages