- Flash Attention - 2 (from F.scaled_dot_product_attention)
- Weight Decay Denylist for Norm, Bias and Embedding
- PEFT with LoRA, LoRA+
- RAG for text generation
- PyTorch Lightning Trainer
- int8 and int4 quantization with torchao
- 8-bit optimizers form bitsandbytes
- Llama-2, Llama-3
- Mistral
- Run `python main.py` to train the model
- Run `python generate_sequence.py` to generate text
- Run `python finetune.py` to fine-tune the model
- Run `python prompt.py` to prompt the model after fine-tuning