From e25c8475f633b29f6ad1df0e160f6b6c08b1d106 Mon Sep 17 00:00:00 2001 From: awaelchli Date: Tue, 7 Feb 2023 14:55:04 +0100 Subject: [PATCH 01/10] wip --- docs/source-pytorch/fabric/fabric.rst | 42 ++++++++++++++++++++------- 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index 8744fae2ba2c6..dd6297c24eda5 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -2,18 +2,12 @@ Fabric (Beta) ############# -Fabric allows you to scale any PyTorch model with just a few lines of code! -With Fabric, you can easily scale your model to run on distributed devices using the strategy of your choice while keeping complete control over the training loop and optimization logic. +Fabric is the fast and lightweight way to scale your models without boilerplate code. -With only a few changes to your code, Fabric allows you to: - -- Automatic placement of models and data onto the device -- Automatic support for mixed precision (speedup and smaller memory footprint) -- Seamless switching between hardware (CPU, GPU, TPU) -- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) -- Easy-to-use launch command for spawning processes (DDP, torchelastic, etc) -- Multi-node support (TorchElastic, SLURM, and more) -- You keep complete control of your training loop +- Handles all the boilerplate device logic for you +- Easily switch from debugging on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training +- Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...) +- Designed with multi-billion parameter models in mind .. code-block:: diff @@ -60,6 +54,32 @@ With only a few changes to your code, Fabric allows you to: ---- +*********** +Why Fabric? +*********** + +Fabric differentiates itself from a fully-fledged trainer like `Lightning Trainer `_ in these key aspects: + +**Maximum Flexibility** +You write your own training and/or inference logic down to the individual optimizer calls. +This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors. +Just remember: With great power comes a great responsibility. + +**Personalization** +While a general-purpose trainer like `Lightning Trainer `_ contains all features *any* researcher could ever ask for, +it may contain way more stuff than you, the individual, would ever need. +This can make it more difficult to adapt it to your domain of research than it should be, but at the same time, building a well-tested, efficient and hackable trainer is very time-consuming. +Fabric bridges this gap by providing important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but at the same time it leaves the design and orchestration fully up to you. + +**Opt-in Philosophy** +Everything in Fabric is opt-in. +Think of it as a toolbox: You take out the tools (Fabric functions) you need and leave the other ones behind. +This makes it easier to develop and debug your PyTorch code as you gradually add more features to it. + + +---- + + ************ Fundamentals ************ From d2b720e16a54b2e14809c867883b5364dad31ec2 Mon Sep 17 00:00:00 2001 From: awaelchli Date: Tue, 7 Feb 2023 15:00:31 +0100 Subject: [PATCH 02/10] links --- docs/source-pytorch/fabric/fabric.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index dd6297c24eda5..7cfd77751d84d 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -58,15 +58,16 @@ Fabric is the fast and lightweight way to scale your models without boilerplate Why Fabric? *********** -Fabric differentiates itself from a fully-fledged trainer like `Lightning Trainer `_ in these key aspects: +Fabric differentiates itself from a fully-fledged trainer like :doc:`Lightning Trainer <../common/trainer>` in these key aspects: **Maximum Flexibility** -You write your own training and/or inference logic down to the individual optimizer calls. +Wite your own training and/or inference logic down to the individual optimizer calls. +You aren't forced to conform to a standardized epoch-based training loop like the one in :doc:`Lightning Trainer <../common/trainer>`. This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors. Just remember: With great power comes a great responsibility. **Personalization** -While a general-purpose trainer like `Lightning Trainer `_ contains all features *any* researcher could ever ask for, +While a general-purpose trainer like :doc:`Lightning Trainer <../common/trainer>` contains all features *any* researcher could ever ask for, it may contain way more stuff than you, the individual, would ever need. This can make it more difficult to adapt it to your domain of research than it should be, but at the same time, building a well-tested, efficient and hackable trainer is very time-consuming. Fabric bridges this gap by providing important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but at the same time it leaves the design and orchestration fully up to you. From 927d8a57d008b47fd5cef1581a516fafcfb1e3ea Mon Sep 17 00:00:00 2001 From: awaelchli Date: Tue, 7 Feb 2023 16:42:38 +0100 Subject: [PATCH 03/10] add a space --- docs/source-pytorch/fabric/fabric.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index 7cfd77751d84d..10eb7845950f4 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -9,6 +9,7 @@ Fabric is the fast and lightweight way to scale your models without boilerplate - Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...) - Designed with multi-billion parameter models in mind +| .. code-block:: diff From c36eb9a70a9cc8c61b3cff4a9ad6c9c56901ea34 Mon Sep 17 00:00:00 2001 From: awaelchli Date: Tue, 7 Feb 2023 17:00:02 +0100 Subject: [PATCH 04/10] typo fix --- docs/source-pytorch/fabric/fabric.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index 10eb7845950f4..d2363cd9d80b6 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -62,7 +62,7 @@ Why Fabric? Fabric differentiates itself from a fully-fledged trainer like :doc:`Lightning Trainer <../common/trainer>` in these key aspects: **Maximum Flexibility** -Wite your own training and/or inference logic down to the individual optimizer calls. +Write your own training and/or inference logic down to the individual optimizer calls. You aren't forced to conform to a standardized epoch-based training loop like the one in :doc:`Lightning Trainer <../common/trainer>`. This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors. Just remember: With great power comes a great responsibility. From 7cd880a8b1425bf14ecf6fcb3e3f156d36a19563 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adrian=20W=C3=A4lchli?= Date: Wed, 8 Feb 2023 06:07:04 -0500 Subject: [PATCH 05/10] Update docs/source-pytorch/fabric/fabric.rst Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com> --- docs/source-pytorch/fabric/fabric.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index d2363cd9d80b6..60490521bae12 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -2,7 +2,7 @@ Fabric (Beta) ############# -Fabric is the fast and lightweight way to scale your models without boilerplate code. +Fabric is the fast and lightweight way to scale PyTorch models without boilerplate code. - Handles all the boilerplate device logic for you - Easily switch from debugging on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training From 735b7b5243d970d94cf7e264e60ea05492b0afd2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adrian=20W=C3=A4lchli?= Date: Wed, 8 Feb 2023 06:07:15 -0500 Subject: [PATCH 06/10] Update docs/source-pytorch/fabric/fabric.rst Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com> --- docs/source-pytorch/fabric/fabric.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index 60490521bae12..ea79c8eb6b20f 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -5,7 +5,7 @@ Fabric (Beta) Fabric is the fast and lightweight way to scale PyTorch models without boilerplate code. - Handles all the boilerplate device logic for you -- Easily switch from debugging on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training +- Easily switch from running on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training - Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...) - Designed with multi-billion parameter models in mind From e3ba862a1d23ea5fd6021b7d993d922c1d54aeed Mon Sep 17 00:00:00 2001 From: awaelchli Date: Wed, 8 Feb 2023 16:15:10 +0100 Subject: [PATCH 07/10] eden feedback --- docs/source-pytorch/fabric/fabric.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index ea79c8eb6b20f..04fb6611f88e3 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -64,14 +64,15 @@ Fabric differentiates itself from a fully-fledged trainer like :doc:`Lightning T **Maximum Flexibility** Write your own training and/or inference logic down to the individual optimizer calls. You aren't forced to conform to a standardized epoch-based training loop like the one in :doc:`Lightning Trainer <../common/trainer>`. +You can do flexible iteration based training, meta-learning, cross-validation and other types of optimization algorithms without digging into framework internals. This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors. Just remember: With great power comes a great responsibility. **Personalization** While a general-purpose trainer like :doc:`Lightning Trainer <../common/trainer>` contains all features *any* researcher could ever ask for, it may contain way more stuff than you, the individual, would ever need. -This can make it more difficult to adapt it to your domain of research than it should be, but at the same time, building a well-tested, efficient and hackable trainer is very time-consuming. -Fabric bridges this gap by providing important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but at the same time it leaves the design and orchestration fully up to you. +This makes it more difficult to study and trust the internals than it should be, but at the same time, building a well-tested, efficient and hackable trainer yourself is very time-consuming. +Fabric bridges this gap by providing important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but leaves the design and orchestration fully up to you. **Opt-in Philosophy** Everything in Fabric is opt-in. From 77ca713131ad1d75b29bcfb7da3ea8413d46ee40 Mon Sep 17 00:00:00 2001 From: awaelchli Date: Thu, 9 Feb 2023 15:52:33 +0100 Subject: [PATCH 08/10] Address Eden's feedback --- docs/source-pytorch/fabric/fabric.rst | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index 04fb6611f88e3..96ea3d3cfa645 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -61,6 +61,9 @@ Why Fabric? Fabric differentiates itself from a fully-fledged trainer like :doc:`Lightning Trainer <../common/trainer>` in these key aspects: +**Fast to implement** +There is no need to restructure your code: Just change a few lines in the PyTorch script and you'll be able to leverage Fabric features. + **Maximum Flexibility** Write your own training and/or inference logic down to the individual optimizer calls. You aren't forced to conform to a standardized epoch-based training loop like the one in :doc:`Lightning Trainer <../common/trainer>`. @@ -68,16 +71,11 @@ You can do flexible iteration based training, meta-learning, cross-validation an This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors. Just remember: With great power comes a great responsibility. -**Personalization** -While a general-purpose trainer like :doc:`Lightning Trainer <../common/trainer>` contains all features *any* researcher could ever ask for, -it may contain way more stuff than you, the individual, would ever need. -This makes it more difficult to study and trust the internals than it should be, but at the same time, building a well-tested, efficient and hackable trainer yourself is very time-consuming. -Fabric bridges this gap by providing important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but leaves the design and orchestration fully up to you. - -**Opt-in Philosophy** -Everything in Fabric is opt-in. -Think of it as a toolbox: You take out the tools (Fabric functions) you need and leave the other ones behind. +**Maximum Control** +The :doc:`Lightning Trainer <../common/trainer>` has many built in features to make research simpler with less boilerplate, but debugging it requires some familiarity with the framework internals. +In Fabric, everything is opt-in. Think of it as a toolbox: You take out the tools (Fabric functions) you need and leave the other ones behind. This makes it easier to develop and debug your PyTorch code as you gradually add more features to it. +Fabric provides important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but leaves the design and orchestration fully up to you. ---- From e77e9d38e375adb15fcadee2bb014fb8c86f9e19 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adrian=20W=C3=A4lchli?= Date: Thu, 9 Feb 2023 12:44:11 -0500 Subject: [PATCH 09/10] Update docs/source-pytorch/fabric/fabric.rst Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com> --- docs/source-pytorch/fabric/fabric.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index 96ea3d3cfa645..e00ffe9cb110f 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -4,8 +4,9 @@ Fabric (Beta) Fabric is the fast and lightweight way to scale PyTorch models without boilerplate code. -- Handles all the boilerplate device logic for you - Easily switch from running on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training +- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) and mixed precision out of the bix +- Handles all the boilerplate device logic for you - Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...) - Designed with multi-billion parameter models in mind From 22f8390381d40b64f703c8343ce5f771ff96514f Mon Sep 17 00:00:00 2001 From: awaelchli Date: Thu, 9 Feb 2023 18:44:46 +0100 Subject: [PATCH 10/10] fix typo --- docs/source-pytorch/fabric/fabric.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index e00ffe9cb110f..1bf32a5ed9c23 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -5,7 +5,7 @@ Fabric (Beta) Fabric is the fast and lightweight way to scale PyTorch models without boilerplate code. - Easily switch from running on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training -- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) and mixed precision out of the bix +- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) and mixed precision out of the box - Handles all the boilerplate device logic for you - Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...) - Designed with multi-billion parameter models in mind