Skip to content

Commit

Permalink
Weekly patch release v1.6.5 (#13481)
Browse files Browse the repository at this point in the history
* update NGC docker (#13136)

* update docker
* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Decouple pulling legacy checkpoints from existing GHA workflows and docker files (#13185)

* Add pull-legacy-checkpoints action
* Replace pulls with the new action and script
* Simplify

* Merge pull request #13250 from PyTorchLightning/ci/rm-base

CI: Remove simple test `ci_test-base.yml`

* Update rich requirement from !=10.15.*,<=12.0.0,>=10.2.2 to >=10.2.2,!=10.15.0.a,<13.0.0 in /requirements (#13047)

* Update rich requirement in /requirements

Updates the requirements on [rich](https://github.com/willmcgugan/rich) to permit the latest version.
- [Release notes](https://github.com/willmcgugan/rich/releases)
- [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md)
- [Commits](Textualize/rich@v10.2.2...v12.4.1)

---
updated-dependencies:
- dependency-name: rich
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix torch.distributed._sharded_tensor DeprecationWarning (#13261)

* update tutorials (#13268)

* [BUG] `estimated_stepping_batches` requires distributed comms in `configure_optimizers` for `DeepSpeedStrategy` (#13350)

* Update torchmetrics requirement from <=0.7.2,>=0.4.1 to >=0.4.1,<0.9.2 in /requirements (#13275)

Update torchmetrics requirement in /requirements

Updates the requirements on [torchmetrics](https://github.com/PyTorchLightning/metrics) to permit the latest version.
- [Release notes](https://github.com/PyTorchLightning/metrics/releases)
- [Changelog](https://github.com/PyTorchLightning/metrics/blob/master/CHANGELOG.md)
- [Commits](Lightning-AI/torchmetrics@v0.4.1...v0.9.1)

---
updated-dependencies:
- dependency-name: torchmetrics
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix mypy errors for model summary utilities (#13384)

* rename org Lightning AI

* Modified python version check to accommodate for legacy version styles (#13420)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

(cherry picked from commit b332b66)

* Call `set_epoch` for distributed batch samplers (#13396)

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

(cherry picked from commit 2dd332f)

* _RICH_AVAILABLE

* _FAIRSCALE_AVAILABLE

* _BAGUA_AVAILABLE

* redefine

* chlog spaces

* CI: Fix `fatal: unsafe repository` (#13515)

* update release date

* CI: azure rename

* Restore log step during restart (#13467)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove redundant test

* Update CI setup (#13291)

* drop mamba
* use legacy GPU machines

* fix schema check

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Martino Sorbaro <martinosorb@users.noreply.github.com>
  • Loading branch information
10 people committed Jul 12, 2022
1 parent 74b1317 commit ff53616
Show file tree
Hide file tree
Showing 72 changed files with 292 additions and 297 deletions.
9 changes: 9 additions & 0 deletions .actions/pull_legacy_checkpoints.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
# Run this script from the project root.
URL="https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip"
mkdir -p legacy
# wget is simpler but does not work on Windows
python -c "from urllib.request import urlretrieve; urlretrieve('$URL', 'legacy/checkpoints.zip')"
ls -l legacy/
unzip -o legacy/checkpoints.zip -d legacy/
ls -l legacy/checkpoints/
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
- job: benchmarks
timeoutInMinutes: "90"
cancelTimeoutInMinutes: "2"
pool: azure-gpus-spot
pool: azure-jirka-spot
container:
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.11"
options: "--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all --shm-size=32g"
Expand Down
7 changes: 2 additions & 5 deletions .azure-pipelines/gpu-tests.yml → .azure/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
# how much time to give 'run always even if cancelled tasks' before stopping them
cancelTimeoutInMinutes: "2"

pool: azure-gpus-spot
pool: azure-jirka-spot

container:
image: $(image)
Expand Down Expand Up @@ -69,10 +69,7 @@ jobs:
python requirements/check-avail-extras.py
displayName: 'Env details'
- bash: |
wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/
unzip -o legacy/checkpoints.zip -d legacy/
ls -l legacy/checkpoints/
- bash: bash .actions/pull_legacy_checkpoints.sh
displayName: 'Get legacy checkpoints'

- bash: |
Expand Down
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions .github/workflows/ci_dockers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ jobs:
UBUNTU_VERSION=${{ matrix.ubuntu_version }}
file: dockers/base-cuda/Dockerfile
push: false
timeout-minutes: 75
timeout-minutes: 95

build-Conda:
runs-on: ubuntu-20.04
Expand All @@ -123,7 +123,7 @@ jobs:
CUDA_VERSION=${{ matrix.cuda_version }}
file: dockers/base-conda/Dockerfile
push: false
timeout-minutes: 75
timeout-minutes: 95

build-ipu:
runs-on: ubuntu-20.04
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci_schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ jobs:
- name: Azure Pipelines
env:
SCHEMA_FILE: https://raw.githubusercontent.com/microsoft/azure-pipelines-vscode/v1.204.0/service-schema.json
run: check-jsonschema .azure-pipelines/*.yml --schemafile "$SCHEMA_FILE"
run: check-jsonschema .azure/*.yml --schemafile "$SCHEMA_FILE"
81 changes: 0 additions & 81 deletions .github/workflows/ci_test-base.yml

This file was deleted.

29 changes: 16 additions & 13 deletions .github/workflows/ci_test-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,27 @@ jobs:
timeout-minutes: 30
steps:
- name: Workaround for https://github.com/actions/checkout/issues/760
run: git config --global --add safe.directory /__w/pytorch-lightning/pytorch-lightning
run: git config --global --add safe.directory /__w/lightning/lightning

- uses: actions/checkout@v2

- name: Update dependencies
- name: Update base dependencies
run: |
conda info
conda list
pip install -r requirements/test.txt
- name: DocTests
run: |
coverage run --source pytorch_lightning -m pytest pytorch_lightning
- name: Update all dependencies
env:
HOROVOD_BUILD_ARCH_FLAGS: "-mfma"
HOROVOD_WITHOUT_MXNET: 1
HOROVOD_WITHOUT_TENSORFLOW: 1
run: |
set -e
conda info
conda list
# adjust versions according installed Torch version
python ./requirements/adjust-versions.py requirements/extra.txt
python ./requirements/adjust-versions.py requirements/examples.txt
Expand All @@ -55,17 +63,12 @@ jobs:
# sanity check
python requirements/check-avail-extras.py
- name: Pull checkpoints from S3
working-directory: ./legacy
run: |
# enter legacy and update checkpoints from S3
curl https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip --output checkpoints.zip
unzip -o checkpoints.zip
ls -l checkpoints/
- name: Pull legacy checkpoints
run: bash .actions/pull_legacy_checkpoints.sh

- name: Tests
- name: UnitTests
run: |
coverage run --source pytorch_lightning -m pytest --timeout 150 pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-torch${{ matrix.pytorch-version }}.xml
coverage run --source pytorch_lightning -m pytest --timeout 150 tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-torch${{ matrix.pytorch-version }}.xml
- name: Upload pytest results
uses: actions/upload-artifact@v2
Expand Down
21 changes: 9 additions & 12 deletions .github/workflows/ci_test-full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,27 +76,21 @@ jobs:
restore-keys: |
${{ runner.os }}-pip-td${{ env.TIME_PERIOD }}-py${{ matrix.python-version }}-${{ matrix.release }}-${{ matrix.requires }}-
- name: Pull checkpoints from S3
working-directory: ./legacy
run: |
# wget is simpler but does not work on Windows
python -c "from urllib.request import urlretrieve ; urlretrieve('https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip', 'checkpoints.zip')"
ls -l .
unzip -o checkpoints.zip
ls -l checkpoints/
- name: Pull legacy checkpoints
run: bash .actions/pull_legacy_checkpoints.sh

- name: Install dependencies
run: |
flag=$(python -c "print('--pre' if '${{matrix.release}}' == 'pre' else '')" 2>&1)
url=$(python -c "print('test/cpu/torch_test.html' if '${{matrix.release}}' == 'pre' else 'cpu/torch_stable.html')" 2>&1)
pip install -r requirements.txt --upgrade $flag --find-links "https://download.pytorch.org/whl/${url}"
# adjust versions according installed Torch version
python ./requirements/adjust-versions.py requirements/examples.txt
pip install -r requirements/examples.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
pip install -r requirements/test.txt --upgrade
pip list
shell: bash

- name: DocTests
run: coverage run --source pytorch_lightning -m pytest pytorch_lightning

- name: Install extra dependencies
run: |
# adjust versions according installed Torch version
Expand Down Expand Up @@ -132,13 +126,16 @@ jobs:
run: |
python requirements/check-avail-extras.py
- name: Tests
- name: UnitTests
run: |
# NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-py${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}.xml
- name: Examples
run: |
# adjust versions according installed Torch version
python ./requirements/adjust-versions.py requirements/examples.txt
pip install -r requirements/examples.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
python -m pytest pl_examples -v --durations=10
- name: Upload pytest results
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/events-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ jobs:
file: dockers/base-cuda/Dockerfile
push: ${{ env.PUSH_TO_HUB }}
tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
timeout-minutes: 85
timeout-minutes: 95

# report failure to Slack
- name: Slack notification
Expand Down Expand Up @@ -197,7 +197,7 @@ jobs:
file: dockers/base-conda/Dockerfile
push: ${{ env.PUSH_TO_HUB }}
tags: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
timeout-minutes: 85
timeout-minutes: 95

# report failure to Slack
- name: Slack notification
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).


## [1.6.5] - 2022-07-12

### Fixed

- Fixed `estimated_stepping_batches` requiring distributed comms in `configure_optimizers` for the `DeepSpeedStrategy` ([#13350](https://github.com/PyTorchLightning/pytorch-lightning/pull/13350))
- Fixed bug with Python version check that prevented use with development versions of Python ([#13420](https://github.com/PyTorchLightning/pytorch-lightning/pull/13420))
- The loops now call `.set_epoch()` also on batch samplers if the dataloader has one wrapped in a distributed sampler ([#13396](https://github.com/PyTorchLightning/pytorch-lightning/pull/13396))
- Fixed the restoration of log step during restart ([#13467](https://github.com/PyTorchLightning/pytorch-lightning/pull/13467))


## [1.6.4] - 2022-06-01

### Added
Expand Down
2 changes: 1 addition & 1 deletion _notebooks
7 changes: 3 additions & 4 deletions dockers/base-conda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,15 @@ COPY environment.yml environment.yml

# conda init
RUN conda update -n base -c defaults conda && \
conda install mamba -n base -c conda-forge && \
mamba create -y --name $CONDA_ENV python=${PYTHON_VERSION} pytorch=${PYTORCH_VERSION} torchvision torchtext cudatoolkit=${CUDA_VERSION} -c nvidia -c pytorch -c pytorch-test -c pytorch-nightly && \
conda create -y --name $CONDA_ENV python=${PYTHON_VERSION} pytorch=${PYTORCH_VERSION} torchvision torchtext cudatoolkit=${CUDA_VERSION} -c nvidia -c pytorch -c pytorch-test -c pytorch-nightly && \
conda init bash && \
# NOTE: this requires that the channel is presented in the yaml before packages \
printf "import re;\nfname = 'environment.yml';\nreq = open(fname).read();\nfor n in ['python', 'pytorch', 'torchtext', 'torchvision']:\n req = re.sub(rf'- {n}[>=]+', f'# - {n}=', req);\nopen(fname, 'w').write(req)" > prune.py && \
python prune.py && \
rm prune.py && \
cat environment.yml && \
mamba env update --name $CONDA_ENV --file environment.yml && \
mamba clean -ya && \
conda env update --name $CONDA_ENV --file environment.yml && \
conda clean -ya && \
rm environment.yml

ENV \
Expand Down
15 changes: 7 additions & 8 deletions dockers/nvidia/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG PYTORCH_VERSION=21.11
ARG PYTORCH_VERSION=22.04

# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes
FROM nvcr.io/nvidia/pytorch:${PYTORCH_VERSION}-py3
Expand All @@ -37,20 +37,19 @@ RUN \
cd .. ; \
fi && \
# save the examples
mv pytorch-lightning/_notebooks notebooks && \
mv pytorch-lightning/_notebooks/.notebooks/ notebooks && \
mv pytorch-lightning/pl_examples . && \

# Installations \
pip install -q fire && \
python ./pytorch-lightning/.actions/assistant.py requirements_prune_pkgs horovod --req_files ./pytorch-lightning/requirements/extra.txt && \
pip install "Pillow>=8.2, !=8.3.0" "cryptography>=3.4" "py>=1.10" --no-cache-dir --upgrade-strategy only-if-needed && \
pip install -r ./pytorch-lightning/requirements/extra.txt --no-cache-dir --upgrade-strategy only-if-needed && \
pip install -r ./pytorch-lightning/requirements/examples.txt --no-cache-dir --upgrade-strategy only-if-needed && \
pip install ./pytorch-lightning --no-cache-dir && \
pip install "Pillow>=8.2, !=8.3.0" "cryptography>=3.4" "py>=1.10" --no-cache-dir && \
pip install ./pytorch-lightning["extra","loggers","strategies"] --no-cache-dir && \
pip install -r ./pytorch-lightning/requirements/examples.txt --no-cache-dir && \
rm -rf pytorch-lightning && \
pip install jupyterlab[all] -U && \
pip list

RUN pip install jupyterlab[all] -U

RUN pip install lightning-grid -U && \
pip install "py>=1.10" "protobuf>=3.15.6" --upgrade-strategy only-if-needed

Expand Down
2 changes: 1 addition & 1 deletion dockers/release/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ RUN \
mv pytorch-lightning-*/ pytorch-lightning ; \
rm *.zip ; \
fi && \
pip install ./pytorch-lightning["extra"] --no-cache-dir && \
pip install ./pytorch-lightning["extra","loggers","strategies"] --no-cache-dir && \
rm -rf pytorch-lightning

RUN python --version && \
Expand Down
5 changes: 1 addition & 4 deletions dockers/tpu-tests/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,7 @@ LABEL maintainer="PyTorchLightning <https://github.com/PyTorchLightning>"
COPY ./ ./pytorch-lightning/

# Pull the legacy checkpoints
RUN cd pytorch-lightning && \
wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/ && \
unzip -o legacy/checkpoints.zip -d legacy/ && \
ls -l legacy/checkpoints/
RUN cd pytorch-lightning && bash .actions/pull_legacy_checkpoints.sh

RUN \
pip install -q fire && \
Expand Down
3 changes: 1 addition & 2 deletions legacy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@ At this moment we focus on ability running old checkpoints, so the flow here is
If you want to pull all saved version-checkpoints for local testing/development, call

```bash
wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip
unzip -o checkpoints.zip
bash .actions/pull_legacy_checkpoints.sh
```

To back populate collection with past version you can use following bash:
Expand Down
2 changes: 1 addition & 1 deletion pl_examples/basic_examples/mnist_datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ class _MNIST(Dataset):
"""Carbon copy of ``tests.helpers.datasets.MNIST``.
We cannot import the tests as they are not distributed with the package.
See https://github.com/PyTorchLightning/pytorch-lightning/pull/7614#discussion_r671183652 for more context.
See https://github.com/Lightning-AI/lightning/pull/7614#discussion_r671183652 for more context.
"""

RESOURCES = (
Expand Down
2 changes: 1 addition & 1 deletion pl_examples/domain_templates/reinforce_learn_ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def get_log_prob(self, pi: Normal, actions: torch.Tensor):


class ExperienceSourceDataset(IterableDataset):
"""Implementation from PyTorch Lightning Bolts: https://github.com/PyTorchLightning/lightning-
"""Implementation from PyTorch Lightning Bolts: https://github.com/Lightning-AI/lightning-
bolts/blob/master/pl_bolts/datamodules/experience_source.py.
Basic experience source dataset. Takes a generate_batch function that returns an iterator. The logic for the
Expand Down

0 comments on commit ff53616

Please sign in to comment.