Skip to content

JackHanke/qt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

           ___        ___             
          /\  \      /\  \            
         /88\  \     \8\  \           
        /8/\8\  \     \8\  \          
        \8\~\8\__\    /88\  \         
         \8\/8/  /   /8/\8\__\        
          \88/  /   /8/  \/__/        
          /8/__/   /8/  /             
          \8\__\   \/__/              
           \/__/                                                  

qt (pronounced "cutie") is a 1 billion parameter hand coded, from-scratch uncased english-only language model.

Model Card

qt is a dense GQA ALiBi/NoPE flash attn transformer. We use RMSNorm and GELU activations.

Vocab Size: 10,001
Parameters: 1.01B
    Embedding: 
    Non-embedding: 
d_model = 2048
ffw_size = 8196
n_heads = 32
n_heads_kv = 8
n_layers = 22
seq_len = 512

Data

Pretraining

For pretraining, I source my data from the fineweb-edu dataset

  • The pretraining dataset is a ~21.5B token subset of the above dataset, formatted in groups of 2.15GB parquet files each containg ~754M tokens each.

The learning rate schedule is Warmup Stable Decay

Finetuning

TODO

Tokenizer

Custom HuggingFace tokenizer trained on uncased english with a vocab_size of 10,001, stored at data/tokenizer.json.

Resources

About

qt 1B is a from-scratch LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors