Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure package #12

Merged
merged 30 commits into from
Apr 13, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
9908d83
Restructre hpu pytorch lightning code
jerome-habana Apr 13, 2023
f2fc2f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2023
d371d62
Remove codecov from requirements
jerome-habana Apr 13, 2023
f807bae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2023
0e6b5bb
Correct ruff errors
jerome-habana Apr 13, 2023
fd3613d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2023
0b5a1df
Update test path
jerome-habana Apr 13, 2023
647f809
Correct missing module link
jerome-habana Apr 13, 2023
1708f73
Update tests
jerome-habana Apr 13, 2023
7265be1
Update parallel tests
jerome-habana Apr 13, 2023
d90ce7b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2023
c1f22ef
Update tests
jerome-habana Apr 13, 2023
3dcb5a8
Remove commented code
jerome-habana Apr 13, 2023
fbecefd
Add bcast override and enable test
jerome-habana Apr 13, 2023
36c77f5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2023
0acced7
Correct missing definition
jerome-habana Apr 13, 2023
71e4d70
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2023
606d2e9
Correct mypy errors, update info
jerome-habana Apr 13, 2023
fb7c476
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2023
a897291
Resolve ruff errors
jerome-habana Apr 13, 2023
18ffb34
Update return type
jerome-habana Apr 13, 2023
413276d
Update src/lightning_habana/__about__.py
jerome-habana Apr 13, 2023
bafda6a
cleaning
Borda Apr 13, 2023
5fdf20b
fixing
Borda Apr 13, 2023
7846712
fixing
Borda Apr 13, 2023
ea4dec4
fixing
Borda Apr 13, 2023
ccfc7df
fixing
Borda Apr 13, 2023
a57bde3
lasting
Borda Apr 13, 2023
a1d96eb
...
Borda Apr 13, 2023
a551416
.
Borda Apr 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .azure/hpu-tests-fabric.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ jobs:

- task: PublishTestResults@2
inputs:
testResultsFiles: 'tests/hpu*_test-results.xml'
testResultsFiles: 'tests/fabric_hpu*_test-results.xml'
testRunTitle: '$(Build.DefinitionName) - Python $(python.version)'
condition: succeededOrFailed()
displayName: 'Publish test results'
12 changes: 6 additions & 6 deletions .azure/hpu-tests-pl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,19 +73,19 @@ jobs:
displayName: 'Check the driver status'

- bash: |
python -m pytest -sv test_accelerator.py --forked --junitxml=hpu1_test-results.xml
python -m pytest -sv tests_pytorch/test_accelerator.py --forked --junitxml=hpu1_test-results.xml
jerome-habana marked this conversation as resolved.
Show resolved Hide resolved
workingDirectory: tests/
displayName: 'Single card HPU test'

- bash: |
python -m pytest -sv test_accelerator.py --forked --hpus 8 --junitxml=hpu8_test-results.xml
python -m pytest -sv tests_pytorch/test_accelerator.py --forked --hpus 8 --junitxml=hpu8_test-results.xml
workingDirectory: tests/
displayName: 'Multi card(8) HPU test'

- bash: |
python -m pytest -sv plugins/test_precision.py --hmp-bf16 \
'plugins/ops_bf16.txt' --hmp-fp32 \
'plugins/ops_fp32.txt' --forked \
python -m pytest -sv tests_pytorch/test_precision.py --hmp-bf16 \
'tests_pytorch/ops_bf16.txt' --hmp-fp32 \
'tests_pytorch/ops_fp32.txt' --forked \
--junitxml=hpu1_precision_test-results.xml
workingDirectory: tests/
displayName: 'HPU precision test'
Expand All @@ -102,7 +102,7 @@ jobs:

- task: PublishTestResults@2
inputs:
testResultsFiles: 'tests/hpu*_test-results.xml'
testResultsFiles: 'tests/pl_hpu*_test-results.xml'
testRunTitle: '$(Build.DefinitionName) - Python $(python.version)'
condition: succeededOrFailed()
displayName: 'Publish test results'
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-
### Changed

- Changed code hierarchy in compliance with base lightning code for pytorch ([#12](https://github.com/Lightning-AI/lightning-Habana/pull/12))
-
### Fixed

### Removed
Expand Down
1 change: 0 additions & 1 deletion _requirements/test.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
coverage>=5.0
codecov>=2.1
pytest>=6.0
pytest-cov
pytest-forked==1.6.0
Expand Down
18 changes: 13 additions & 5 deletions examples/mnist_sample.py → examples/pytorch/mnist_sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,22 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch
from jsonargparse import lazy_instance
from pytorch_lightning import LightningModule
from pytorch_lightning.cli import LightningCLI
from pytorch_lightning.demos.mnist_datamodule import MNISTDataModule
from lightning_utilities import module_available
from torch.nn import functional as F # noqa: N812

from lightning_habana.plugins.precision import HPUPrecisionPlugin
if module_available("lightning"):
from lightning.pytorch import LightningModule
from lightning.pytorch.cli import LightningCLI
from lightning.pytorch.demos.mnist_datamodule import MNISTDataModule
elif module_available("pytorch_lightning"):
from pytorch_lightning import LightningModule
from pytorch_lightning.cli import LightningCLI
from pytorch_lightning.demos.mnist_datamodule import MNISTDataModule

from lightning_habana.pytorch.plugins.precision import HPUPrecisionPlugin


class LitClassifier(LightningModule):
Expand Down Expand Up @@ -61,7 +69,7 @@ def configure_optimizers(self):
"accelerator": "hpu",
"devices": 1,
"max_epochs": 1,
"plugins": lazy_instance(HPUPrecisionPlugin, precision="16-mixed"),
"plugins": lazy_instance(HPUPrecisionPlugin, precision="bf16-mixed"),
},
run=False,
save_config_kwargs={"overwrite": True},
Expand Down
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions src/lightning_habana/__about__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
__version__ = "0.1.0rc1"
__version__ = "0.2.0dev"
__author__ = "Lightning-AI et al."
__author_email__ = "name@lightning.ai"
__license__ = "Apache-2.0"
__copyright__ = f"Copyright (c) 2020-2022, {__author__}."
__copyright__ = f"Copyright (c) 2020-2023, {__author__}."
__homepage__ = "https://github.com/Lightning-AI/lightning-habana"
__docs__ = "PyTorch Lightning Sample project."
__docs__ = "Lightning suport for Intel Habana accelerators"

__all__ = [
"__author__",
Expand Down
45 changes: 15 additions & 30 deletions src/lightning_habana/__init__.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,16 @@
"""Root package info."""
# Copyright (c) 2023 Habana Labs, Ltd. an Intel Company
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

_PACKAGE_ROOT = os.path.dirname(__file__)
_PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT)

from lightning_utilities.core.imports import package_available # noqa: E402

_HABANA_FRAMEWORK_AVAILABLE = package_available("habana_frameworks")
if _HABANA_FRAMEWORK_AVAILABLE:
from habana_frameworks.torch.utils.library_loader import is_habana_available

_HPU_AVAILABLE: bool = is_habana_available()
else:
_HPU_AVAILABLE = False

from lightning_habana.__about__ import * # noqa: E402, F401, F403
from lightning_habana.accelerator import HPUAccelerator # noqa: E402
from lightning_habana.plugins.io_plugin import HPUCheckpointIO # noqa: E402
from lightning_habana.plugins.precision import HPUPrecisionPlugin # noqa: E402
from lightning_habana.strategies.parallel import HPUParallelStrategy # noqa: E402
from lightning_habana.strategies.single import SingleHPUStrategy # noqa: E402

__all__ = [
"HPUAccelerator",
"HPUParallelStrategy",
"SingleHPUStrategy",
"HPUPrecisionPlugin",
"HPUCheckpointIO",
]
from lightning_habana.__about__ import * # noqa: F401, F403
from lightning_habana.utils import * # noqa: F401, F403
jerome-habana marked this conversation as resolved.
Show resolved Hide resolved
28 changes: 28 additions & 0 deletions src/lightning_habana/pytorch/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright (c) 2023 Habana Labs, Ltd. an Intel Company
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from lightning_habana.__about__ import * # noqa: F401, F403
from lightning_habana.pytorch.accelerator import HPUAccelerator
from lightning_habana.pytorch.plugins.io_plugin import HPUCheckpointIO
from lightning_habana.pytorch.plugins.precision import HPUPrecisionPlugin
from lightning_habana.pytorch.strategies.parallel import HPUParallelStrategy
from lightning_habana.pytorch.strategies.single import SingleHPUStrategy

__all__ = [
"HPUAccelerator",
"HPUParallelStrategy",
"SingleHPUStrategy",
"HPUPrecisionPlugin",
"HPUCheckpointIO",
]
17 changes: 17 additions & 0 deletions src/lightning_habana/pytorch/accelerator/__init__.py
Borda marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright (c) 2023 Habana Labs, Ltd. an Intel Company
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from lightning_habana.pytorch.accelerator.hpu import HPUAccelerator

__all__ = ["HPUAccelerator"]
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
else:
raise ModuleNotFoundError("You are missing `lightning` or `pytorch-lightning` package, please install it.")

from lightning_habana import _HPU_AVAILABLE
from lightning_habana.utils.imports import _HPU_AVAILABLE

if _HPU_AVAILABLE:
import habana_frameworks.torch.hpu as torch_hpu
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from lightning_habana.plugins.io_plugin import HPUCheckpointIO
from lightning_habana.plugins.precision import HPUPrecisionPlugin
from lightning_habana.pytorch.plugins.io_plugin import HPUCheckpointIO
from lightning_habana.pytorch.plugins.precision import HPUPrecisionPlugin

__all__ = ["HPUPrecisionPlugin", "HPUCheckpointIO"]
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@
else:
raise ModuleNotFoundError("You are missing `lightning` or `pytorch-lightning` package, please install it.")

from lightning_habana import _HPU_AVAILABLE
from lightning_habana.utils.imports import _HPU_AVAILABLE

if _HPU_AVAILABLE:
from habana_frameworks.torch.hpex import hmp

_PRECISION_INPUT = Literal["32-true", "16-mixed", "bf16-mixed"]
_PRECISION_INPUT = Literal["32", "32-true", "bf16", "bf16-mixed"]


class HPUPrecisionPlugin(PrecisionPlugin):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from lightning_habana.strategies.parallel import HPUParallelStrategy
from lightning_habana.strategies.single import SingleHPUStrategy

from lightning_habana.pytorch.strategies.parallel import HPUParallelStrategy
from lightning_habana.pytorch.strategies.single import SingleHPUStrategy

__all__ = ["HPUParallelStrategy", "SingleHPUStrategy"]
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@

import torch.distributed
from lightning_utilities import module_available
from torch.distributed import broadcast_object_list

if module_available("lightning"):
from lightning.fabric.plugins import CheckpointIO, ClusterEnvironment
from lightning.fabric.utilities.distributed import group as _group
from lightning.pytorch import LightningModule
from lightning.pytorch.accelerators import Accelerator
from lightning.pytorch.plugins.io.hpu_plugin import HPUCheckpointIO
from lightning.pytorch.plugins.io.wrapper import _WrappingCheckpointIO
from lightning.pytorch.plugins.precision import PrecisionPlugin
from lightning.pytorch.strategies.ddp import DDPStrategy
Expand All @@ -41,7 +41,7 @@
from torch.nn import Module
from torch.optim.optimizer import Optimizer

from lightning_habana import _HPU_AVAILABLE
from lightning_habana.utils.imports import _HPU_AVAILABLE

if _HPU_AVAILABLE:
import habana_frameworks.torch.core as htcore
Expand Down Expand Up @@ -94,7 +94,7 @@ def checkpoint_io(self) -> CheckpointIO:

@checkpoint_io.setter
def checkpoint_io(self, io: Optional[CheckpointIO]) -> None:
self._checkpoint_io = io # type: ignore[assignment]
self._checkpoint_io = io

def setup_environment(self) -> None:
os.environ["ID"] = str(self.local_rank)
Expand All @@ -111,7 +111,7 @@ def broadcast(self, obj: object, src: int = 0) -> object:
if self.global_rank != src:
obj = [None]

broadcast_object_list(obj, src, group=_group.WORLD)
_hpu_broadcast_object_list(obj, src, group=_group.WORLD)
return obj[0]

def on_after_backward(self) -> None:
Expand Down Expand Up @@ -143,3 +143,80 @@ def teardown(self) -> None:
# Was set to local rank
os.environ.pop("ID", None)
os.environ.pop("HCCL_DISTRIBUTED_BACKEND", None)


# The code underneath is taken from PyTorch `torch/distributed/distributed_c10d.py`
# the distributed backend and tensor type updates for habana backend is done here before broadcast
def _hpu_broadcast_object_list(object_list, src=0, group=None, device=None): # type: ignore
from torch.distributed import Backend, _rank_not_in_group, broadcast, get_backend, get_rank
from torch.distributed.distributed_c10d import _object_to_tensor, _tensor_to_object

if _rank_not_in_group(group):
return

my_rank = get_rank()
# Serialize object_list elements to tensors on src rank.
if my_rank == src:
tensor_list, size_list = zip(*[_object_to_tensor(obj, device) for obj in object_list])
object_sizes_tensor = torch.cat(size_list)
else:
object_sizes_tensor = torch.empty(len(object_list), dtype=torch.long)

# Current device selection.
# To preserve backwards compatibility, ``device`` is default to ``None``
# in which case we run current logic of device selection, i.e.
# ``current_device`` is CUDA if backend is NCCL otherwise CPU device. In the
# case it is not ``None`` we move the size and object tensors to be
# broadcasted to this device.
group_backend = get_backend(group)
is_nccl_backend = group_backend == Backend.NCCL
is_hpu_backend = os.environ.get("HCCL_DISTRIBUTED_BACKEND") == "1"
if device is not None:
if is_nccl_backend and device.type != "cuda":
raise ValueError("device type must be cuda for nccl backend")
current_device = device
else:
current_device = torch.device("cpu")
if is_nccl_backend:
# See note about using torch.cuda.current_device() here in
# docstring. We cannot simply use my_rank since rank == device is
# not necessarily true.
current_device = torch.device("cuda", torch.cuda.current_device())
if is_nccl_backend:
object_sizes_tensor = object_sizes_tensor.to(current_device)

elif is_hpu_backend:
current_device = torch.device("hpu")
# Workaround: HPU doesn't not support long tensors for collectives
if (object_sizes_tensor.type() == "torch.LongTensor") or (object_sizes_tensor.type() == "torch.hpu.LongTensor"):
object_sizes_tensor = object_sizes_tensor.int()
else:
print("unhandled hpu object_sizes_tensor type :: ", object_sizes_tensor.type())
object_sizes_tensor = object_sizes_tensor.to(current_device)

# Broadcast object sizes
broadcast(object_sizes_tensor, src=src, group=group)

# Concatenate and broadcast serialized object tensors
if my_rank == src:
object_tensor = torch.cat(tensor_list)
else:
object_tensor = torch.empty(
torch.sum(object_sizes_tensor).int().item(),
dtype=torch.uint8,
)

if is_nccl_backend or is_hpu_backend:
object_tensor = object_tensor.to(current_device)

broadcast(object_tensor, src=src, group=group)
# Deserialize objects using their stored sizes.
offset = 0
if my_rank != src:
for i, obj_size in enumerate(object_sizes_tensor):
obj_view = object_tensor[offset : offset + obj_size]
obj_view = obj_view.type(torch.uint8)
if obj_view.device != torch.device("cpu"):
obj_view = obj_view.cpu()
offset += obj_size
object_list[i] = _tensor_to_object(obj_view, obj_size)
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from lightning.fabric.utilities.types import _DEVICE
from lightning.pytorch import LightningModule, Trainer
from lightning.pytorch.accelerators import Accelerator
from lightning.pytorch.plugins.io.hpu_plugin import HPUCheckpointIO
from lightning.pytorch.plugins.io.wrapper import _WrappingCheckpointIO
from lightning.pytorch.plugins.precision import PrecisionPlugin
from lightning.pytorch.strategies.single_device import SingleDeviceStrategy
Expand All @@ -39,7 +40,7 @@
from torch.nn import Module
from torch.optim.optimizer import Optimizer

from lightning_habana import _HPU_AVAILABLE
from lightning_habana.utils.imports import _HPU_AVAILABLE

if _HPU_AVAILABLE:
import habana_frameworks.torch.core as htcore
Expand Down Expand Up @@ -75,7 +76,7 @@ def checkpoint_io(self) -> CheckpointIO:

@checkpoint_io.setter
def checkpoint_io(self, io: Optional[CheckpointIO]) -> None:
self._checkpoint_io = io # type: ignore[assignment]
self._checkpoint_io = io

@property
def is_distributed(self) -> bool:
Expand Down
Loading