Skip to content

Commit

Permalink
Rename fabric run model to fabric run (#19527)
Browse files Browse the repository at this point in the history
  • Loading branch information
awaelchli committed Feb 27, 2024
1 parent e461e90 commit ea89133
Show file tree
Hide file tree
Showing 18 changed files with 60 additions and 72 deletions.
12 changes: 6 additions & 6 deletions docs/source-fabric/fundamentals/launch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ An alternative way to launch your Python script in multiple processes is to use

.. code-block:: bash
fabric run model path/to/your/script.py
fabric run path/to/your/script.py
This is essentially the same as running ``python path/to/your/script.py``, but it also lets you configure the following settings externally without changing your code:

Expand All @@ -80,9 +80,9 @@ This is essentially the same as running ``python path/to/your/script.py``, but i

.. code-block:: bash
fabric run model --help
fabric run --help
Usage: fabric run model [OPTIONS] SCRIPT [SCRIPT_ARGS]...
Usage: fabric run [OPTIONS] SCRIPT [SCRIPT_ARGS]...
Run a Lightning Fabric script.
Expand Down Expand Up @@ -128,7 +128,7 @@ Here is how you run DDP with 8 GPUs and `torch.bfloat16 <https://pytorch.org/doc

.. code-block:: bash
fabric run model ./path/to/train.py \
fabric run ./path/to/train.py \
--strategy=ddp \
--devices=8 \
--accelerator=cuda \
Expand All @@ -138,7 +138,7 @@ Or `DeepSpeed Zero3 <https://www.deepspeed.ai/2021/03/07/zero3-offload.html>`_ w

.. code-block:: bash
fabric run model ./path/to/train.py \
fabric run ./path/to/train.py \
--strategy=deepspeed_stage_3 \
--devices=8 \
--accelerator=cuda \
Expand All @@ -148,7 +148,7 @@ Or `DeepSpeed Zero3 <https://www.deepspeed.ai/2021/03/07/zero3-offload.html>`_ w

.. code-block:: bash
fabric run model ./path/to/train.py \
fabric run ./path/to/train.py \
--devices=auto \
--accelerator=auto \
--precision=16
Expand Down
2 changes: 1 addition & 1 deletion docs/source-fabric/fundamentals/precision.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ The same values can also be set through the :doc:`command line interface <launch

.. code-block:: bash
lightning run model train.py --precision=bf16-mixed
fabric run train.py --precision=bf16-mixed
.. note::
Expand Down
8 changes: 4 additions & 4 deletions docs/source-fabric/guide/multi_node/barebones.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Log in to the **first node** and run this command:
.. code-block:: bash
:emphasize-lines: 2,3
lightning run model \
fabric run \
--node-rank=0 \
--main-address=10.10.10.16 \
--accelerator=cuda \
Expand All @@ -85,7 +85,7 @@ Log in to the **second node** and run this command:
.. code-block:: bash
:emphasize-lines: 2,3
lightning run model \
fabric run \
--node-rank=1 \
--main-address=10.10.10.16 \
--accelerator=cuda \
Expand Down Expand Up @@ -129,7 +129,7 @@ The most likely reasons and how to fix it:
export GLOO_SOCKET_IFNAME=eno1
export NCCL_SOCKET_IFNAME=eno1
lightning run model ...
fabric run ...
You can find the interface name by parsing the output of the ``ifconfig`` command.
The name of this interface **may differ on each node**.
Expand All @@ -152,7 +152,7 @@ Launch your command by prepending ``NCCL_DEBUG=INFO`` to get more info.

.. code-block:: bash
NCCL_DEBUG=INFO lightning run model ...
NCCL_DEBUG=INFO fabric run ...
----
Expand Down
6 changes: 3 additions & 3 deletions examples/fabric/image_classifier/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ This script shows you how to scale the pure PyTorch code to enable GPU and multi

```bash
# CPU
lightning run model train_fabric.py
fabric run train_fabric.py

# GPU (CUDA or M1 Mac)
lightning run model train_fabric.py --accelerator=gpu
fabric run train_fabric.py --accelerator=gpu

# Multiple GPUs
lightning run model train_fabric.py --accelerator=gpu --devices=4
fabric run train_fabric.py --accelerator=gpu --devices=4
```
8 changes: 4 additions & 4 deletions examples/fabric/image_classifier/train_fabric.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@
3. Apply ``setup`` over each model and optimizers pair, ``setup_dataloaders`` on all your dataloaders,
and replace ``loss.backward()`` with ``self.backward(loss)``.
4. Run the script from the terminal using ``lightning run model path/to/train.py``
4. Run the script from the terminal using ``fabric run path/to/train.py``
Accelerate your training loop by setting the ``--accelerator``, ``--strategy``, ``--devices`` options directly from
the command line. See ``lightning run model --help`` or learn more from the documentation:
the command line. See ``fabric run --help`` or learn more from the documentation:
https://lightning.ai/docs/fabric.
"""
Expand Down Expand Up @@ -71,7 +71,7 @@ def forward(self, x):

def run(hparams):
# Create the Lightning Fabric object. The parameters like accelerator, strategy, devices etc. will be proided
# by the command line. See all options: `lightning run model --help`
# by the command line. See all options: `fabric run --help`
fabric = Fabric()

seed_everything(hparams.seed) # instead of torch.manual_seed(...)
Expand Down Expand Up @@ -168,7 +168,7 @@ def run(hparams):
if __name__ == "__main__":
# Arguments can be passed in through the CLI as normal and will be parsed here
# Example:
# lightning run model image_classifier.py accelerator=cuda --epochs=3
# fabric run image_classifier.py accelerator=cuda --epochs=3
parser = argparse.ArgumentParser(description="Fabric MNIST Example")
parser.add_argument(
"--batch-size", type=int, default=64, metavar="N", help="input batch size for training (default: 64)"
Expand Down
6 changes: 3 additions & 3 deletions examples/fabric/kfold_cv/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ This script shows you how to scale the pure PyTorch code to enable GPU and multi

```bash
# CPU
lightning run model train_fabric.py
fabric run train_fabric.py

# GPU (CUDA or M1 Mac)
lightning run model train_fabric.py --accelerator=gpu
fabric run train_fabric.py --accelerator=gpu

# Multiple GPUs
lightning run model train_fabric.py --accelerator=gpu --devices=4
fabric run train_fabric.py --accelerator=gpu --devices=4
```

### References
Expand Down
4 changes: 2 additions & 2 deletions examples/fabric/kfold_cv/train_fabric.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ def validate_dataloader(model, data_loader, fabric, hparams, fold, acc_metric):

def run(hparams):
# Create the Lightning Fabric object. The parameters like accelerator, strategy, devices etc. will be proided
# by the command line. See all options: `lightning run model --help`
# by the command line. See all options: `fabric run --help`
fabric = Fabric()

seed_everything(hparams.seed) # instead of torch.manual_seed(...)
Expand Down Expand Up @@ -171,7 +171,7 @@ def run(hparams):
if __name__ == "__main__":
# Arguments can be passed in through the CLI as normal and will be parsed here
# Example:
# lightning run model image_classifier.py accelerator=cuda --epochs=3
# fabric run image_classifier.py accelerator=cuda --epochs=3
parser = argparse.ArgumentParser(description="Fabric MNIST K-Fold Cross Validation Example")
parser.add_argument(
"--batch-size", type=int, default=64, metavar="N", help="input batch size for training (default: 64)"
Expand Down
6 changes: 3 additions & 3 deletions examples/fabric/language_model/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ It is a simplified version of the [official PyTorch example](https://github.com/

```bash
# CPU
lightning run model --accelerator=cpu train.py
fabric run --accelerator=cpu train.py

# GPU (CUDA or M1 Mac)
lightning run model --accelerator=gpu train.py
fabric run --accelerator=gpu train.py

# Multiple GPUs
lightning run model --accelerator=gpu --devices=4 train.py
fabric run --accelerator=gpu --devices=4 train.py
```
2 changes: 1 addition & 1 deletion examples/fabric/meta_learning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ torchrun --nproc_per_node=2 --standalone train_torch.py
**Accelerated using Lightning Fabric:**

```bash
lightning run model train_fabric.py --devices 2 --strategy ddp --accelerator cpu
fabric run train_fabric.py --devices 2 --strategy ddp --accelerator cpu
```

### References
Expand Down
4 changes: 2 additions & 2 deletions examples/fabric/meta_learning/train_fabric.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
- gym<=0.22
Run it with:
lightning run model train_fabric.py --accelerator=cuda --devices=2 --strategy=ddp
fabric run train_fabric.py --accelerator=cuda --devices=2 --strategy=ddp
"""

import cherry
Expand Down Expand Up @@ -59,7 +59,7 @@ def main(
seed=42,
):
# Create the Fabric object
# Arguments get parsed from the command line, see `lightning run model --help`
# Arguments get parsed from the command line, see `fabric run --help`
fabric = Fabric()

meta_batch_size = meta_batch_size // fabric.world_size
Expand Down
12 changes: 6 additions & 6 deletions examples/fabric/reinforcement_learning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ torchrun --nproc_per_node=2 --standalone train_torch.py
### Lightning Fabric:

```bash
lightning run model --accelerator=cpu --strategy=ddp --devices=2 train_fabric.py
fabric run --accelerator=cpu --strategy=ddp --devices=2 train_fabric.py
```

### Visualizing logs
Expand Down Expand Up @@ -71,7 +71,7 @@ The following video shows a trained agent on the [LunarLander-v2 environment](ht
The agent was trained with the following:

```bash
lightning run model \
fabric run \
--accelerator=cpu \
--strategy=ddp \
--devices=2 \
Expand All @@ -98,25 +98,25 @@ where, differently from the previous example, we have completely decoupled the e
So for example:

```bash
lightning run model --devices=3 train_fabric_decoupled.py --num-envs 4
fabric run --devices=3 train_fabric_decoupled.py --num-envs 4
```

will spawn 3 processes, one is the Player and the others the Trainers, with the Player running 4 independent environments, where every process runs on the CPU;

```bash
lightning run model --devices=3 train_fabric_decoupled.py --num-envs 4 --cuda
fabric run --devices=3 train_fabric_decoupled.py --num-envs 4 --cuda
```

will instead run only the Trainers on the GPU.
If one wants to run both the Player and the Trainers on the GPU, then both the flags `--cuda` and `--player-on-gpu` must be provided:

```bash
lightning run model --devices=3 train_fabric_decoupled.py --num-envs 4 --cuda --player-on-gpu
fabric run --devices=3 train_fabric_decoupled.py --num-envs 4 --cuda --player-on-gpu
```

> **Warning**
>
> With this second example, there is no need for the user to provide the `accelerator` and the `strategy` to the `lightning run model` script.
> With this second example, there is no need for the user to provide the `accelerator` and the `strategy` to the `fabric run` script.
## Number of updates, environment steps and share data

Expand Down
2 changes: 1 addition & 1 deletion examples/fabric/reinforcement_learning/train_fabric.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
Run it with:
lightning run model --accelerator=cpu --strategy=ddp --devices=2 train_fabric.py
fabric run --accelerator=cpu --strategy=ddp --devices=2 train_fabric.py
"""

import argparse
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
Run it with:
lightning run model --devices=2 train_fabric_decoupled.py
fabric run --devices=2 train_fabric_decoupled.py
"""

import argparse
Expand Down
8 changes: 0 additions & 8 deletions src/lightning/app/cli/lightning_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
from typing import Tuple, Union

import click
from lightning_utilities.core.imports import RequirementCache
from requests.exceptions import ConnectionError

import lightning.app.core.constants as constants
Expand Down Expand Up @@ -303,13 +302,6 @@ def run_app(
)


if RequirementCache("lightning-fabric>=1.9.0") or RequirementCache("lightning>=1.9.0"):
# note it is automatically replaced to `from lightning.fabric.cli` when building monolithic/mirror package
from lightning.fabric.cli import _run_model

run.add_command(_run_model)


@_main.command("open", hidden=True)
@click.argument("path", type=str, default=".")
@click.option("--name", help="The name to use for the CloudSpace", default="", type=str)
Expand Down
2 changes: 1 addition & 1 deletion src/lightning/fabric/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Changed

- Renamed `lightning run model` to `fabric run model` ([#19442](https://github.com/Lightning-AI/pytorch-lightning/pull/19442))
- Renamed `lightning run model` to `fabric run` ([#19442](https://github.com/Lightning-AI/pytorch-lightning/pull/19442), [#19527](https://github.com/Lightning-AI/pytorch-lightning/pull/19527))


- The `Fabric.rank_zero_first` context manager now uses a barrier without timeout to avoid long-running tasks to be interrupted ([#19448](https://github.com/Lightning-AI/lightning/pull/19448))
Expand Down
14 changes: 5 additions & 9 deletions src/lightning/fabric/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def _legacy_main() -> None:
"""
print(
"`lightning run model` is deprecated and will be removed in future versions."
" Please call `fabric run model` instead."
" Please call `fabric run` instead."
)
args = sys.argv[1:]
if args and args[0] == "run" and args[1] == "model":
Expand All @@ -70,12 +70,8 @@ def _legacy_main() -> None:
def _main() -> None:
pass

@_main.group()
def run() -> None:
pass

@run.command(
"model",
@_main.command(
"run",
context_settings={
"ignore_unknown_options": True,
},
Expand Down Expand Up @@ -146,7 +142,7 @@ def run() -> None:
),
)
@click.argument("script_args", nargs=-1, type=click.UNPROCESSED)
def _run_model(**kwargs: Any) -> None:
def _run(**kwargs: Any) -> None:
"""Run a Lightning Fabric script.
SCRIPT is the path to the Python script with the code to run. The script must contain a Fabric object.
Expand Down Expand Up @@ -225,4 +221,4 @@ def main(args: Namespace, script_args: Optional[List[str]] = None) -> None:
)
raise SystemExit(1)

_run_model()
_run()
4 changes: 2 additions & 2 deletions src/lightning/fabric/fabric.py
Original file line number Diff line number Diff line change
Expand Up @@ -839,7 +839,7 @@ def launch(self, function: Callable[["Fabric"], Any] = _do_nothing, *args: Any,
Returns the output of the function that ran in worker process with rank 0.
The ``launch()`` method should only be used if you intend to specify accelerator, devices, and so on in
the code (programmatically). If you are launching with the Lightning CLI, ``lightning run model ...``, remove
the code (programmatically). If you are launching with the Lightning CLI, ``fabric run ...``, remove
``launch()`` from your code.
The ``launch()`` is a no-op when called multiple times and no function is passed in.
Expand Down Expand Up @@ -1028,7 +1028,7 @@ def _validate_launched(self) -> None:
if not self._launched and not isinstance(self._strategy, (SingleDeviceStrategy, DataParallelStrategy)):
raise RuntimeError(
"To use Fabric with more than one device, you must call `.launch()` or use the CLI:"
" `lightning run model --help`."
" `fabric run --help`."
)

def _validate_setup(self, module: nn.Module, optimizers: Sequence[Optimizer]) -> None:
Expand Down

0 comments on commit ea89133

Please sign in to comment.