Skip to content

Tutorial about How to change your slow tensorflow training faster

Notifications You must be signed in to change notification settings

ReturnToFirst/FastTFWorkflow

Repository files navigation

FastTFWorkflow

Tutorial about How to change your slow tensorflow training faster

Description

THIS CODE ONLY WORKS ON NVIDIA GPUS

Assuming dataset length is infinite, lnline preprocessing can cause CPU bottleneck that can decrease training throughput.
This code samples show unoptimized/optimized tensorflow workflow.

Requirements

Hardware Requirements

  • x86-64 (AMD64) CPU
  • RAM >= 8GiB
  • NVIDIA Computer Capability 7.0+ GPUs
    • GPU memory > 12GiB for default batch size

Test Environment

  • CPU : Intel(R) Xeon(R) Gold 5218R
  • GPU : 2x A100 80GB PCI-E
  • RAM : 255GiB

Optimization used in this repo

  1. Nvidia DALI - GPU Accelerated Dataloader
  2. Mixed Precision - Better MMA(Matrix Multiply-Accumulate) throughput than TF32
  3. XLA - JIT-Compile and fuse operators to effective job scheduling in GPUs
  4. (Optional) Multi GPU training - Use more then one GPU for training

Usage

  1. Clone this repo with submodule
    git clone --recursive https://github.com/ReturnToFirst/FastTFWorkflow.git
    
    
  2. Compare performance between unoptimized/optimized workflow

For advanced users

after_optimization_multi.ipynb shows training process with multi gpu.

DISCLAIMER

Depanding on devices in computer, performance can be decreased.
This optimized code will not show best performance.
Multi-GPUs Training doesn't works on test envrionment.
Wrong description or code there.

About

Tutorial about How to change your slow tensorflow training faster

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages