# Free FP8 training with unit scaling

Zero-effort, zero-cost FP8 training using the `unit_scaling` library.

💻 **Use the library**: [graphcore-research.github.io/unit-scaling](https://graphcore-research.github.io/unit-scaling/)

📖 **Read the paper**: [arxiv.org/abs/2303.11257](https://arxiv.org/abs/2303.11257)

## TL;DR

Naïvely casting to FP8 causes training to fail as some values go out-of-range.
This can be easily fixed by using
[unit-scaled](https://graphcore-research.github.io/unit-scaling/) layers:

In [None]:
from notebook_utils import config, train
from nanoGPT.model import GPT
from unit_scaling.transforms import simulate_fp8, unit_scale

gpt = GPT(config)  # model unchanged from original nanoGPT
fp8_gpt = simulate_fp8(gpt)
unit_scaled_fp8_gpt = unit_scale(fp8_gpt)

models = [gpt, fp8_gpt, unit_scaled_fp8_gpt]
for model in models:
    train(model)