GG-Training-Trick

General Trianing

Training AI models at a large scale

GETTING STARTED WITH FULLY SHARDED DATA PARALLEL(FSDP)

Bottleneck Issue

A Goldmine About training bottleneck from pytorch discussion forum

Webdataset: Efficient PyTorch I/O library for Large Datasets, Many Files, Many GPUs

A reddit post about bottleneck

Batched data augmentations using Kornia since pytorch doesn't support it yet.

Nvidia's Blog post about Data transfer and benchmarking

Write Custome Stream Dataloader when the Dataset is too big to fit in memeory: Or sometimes just recreate new Dataset object that contains only part of the whole dataset during training, this way may need to take care of the sharding of the dataset, but you may also just skip it, duplicated sample may stabilize the training process. Make the sub Dataset big enough to fit the memeory, since batching many small transfers between host memory and GPU memory into one larger transfer performs much better because it eliminates most of the per-transfer overhead.

Paper: Profiling and Improving the PyTorch Dataloader for high-latency Storage: A Technical Report

PyTorch 效能懶人包

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GG-Training-Trick

General Trianing

Training AI models at a large scale

Bottleneck Issue

About

Releases

Packages

Goooyi/GG-Training-Trick

Folders and files

Latest commit

History

Repository files navigation

GG-Training-Trick

General Trianing

Training AI models at a large scale

Bottleneck Issue

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages