Introduction

This repository is supposed to be a simplified repository for pretraining a large language model. It tries to leverage Deepspeed and Megatron-LM through Huggingface's Accelerate library for simplicity. It's still a work-in-progress. It currently implements a simplified version of dataset packing for improved efficiency during training.

Using this Repo

I provided a simple accelerate config in configs/, edit it to your needs.

Afterwards, run the bash script in the examples/ directory to train a 1b parameter llama model after providing the directory in which jsonl files reside.

Training Frameworks

Deepspeed

Deepspeed currently runs great! Check out the deepspeed configuration file in the configs/ directory for an example.

Megatron-LM

There's been some issues fully integrating Megatron-LM. Accelerate relies on Huggingface's Megatron-LM repository. The repository has long been abandoned, I am currently in the process of developing an updated fork that is compatible with Accelerate. In theory though, the training code here is designed to work with Megatron-LM.

Fully Sharded Data Parallel

Should work, but it's a bit finicky. Support not mature yet.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
configs		configs
core		core
examples		examples
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
pretrain.py		pretrain.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Using this Repo

Training Frameworks

Deepspeed

Megatron-LM

Fully Sharded Data Parallel

About

Releases

Packages

Languages

License

SulRash/minLLMTrain

Folders and files

Latest commit

History

Repository files navigation

Introduction

Using this Repo

Training Frameworks

Deepspeed

Megatron-LM

Fully Sharded Data Parallel

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages