Weekly patch release v1.6.5 (#13481)

* update NGC docker (#13136) * update docker * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Decouple pulling legacy checkpoints from existing GHA workflows and docker files (#13185) * Add pull-legacy-checkpoints action * Replace pulls with the new action and script * Simplify * Merge pull request #13250 from PyTorchLightning/ci/rm-base CI: Remove simple test `ci_test-base.yml` * Update rich requirement from !=10.15.*,<=12.0.0,>=10.2.2 to >=10.2.2,!=10.15.0.a,<13.0.0 in /requirements (#13047) * Update rich requirement in /requirements Updates the requirements on [rich](https://github.com/willmcgugan/rich) to permit the latest version. - [Release notes](https://github.com/willmcgugan/rich/releases) - [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md) - [Commits](Textualize/rich@v10.2.2...v12.4.1) --- updated-dependencies: - dependency-name: rich dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Fix torch.distributed._sharded_tensor DeprecationWarning (#13261) * update tutorials (#13268) * [BUG] `estimated_stepping_batches` requires distributed comms in `configure_optimizers` for `DeepSpeedStrategy` (#13350) * Update torchmetrics requirement from <=0.7.2,>=0.4.1 to >=0.4.1,<0.9.2 in /requirements (#13275) Update torchmetrics requirement in /requirements Updates the requirements on [torchmetrics](https://github.com/PyTorchLightning/metrics) to permit the latest version. - [Release notes](https://github.com/PyTorchLightning/metrics/releases) - [Changelog](https://github.com/PyTorchLightning/metrics/blob/master/CHANGELOG.md) - [Commits](Lightning-AI/torchmetrics@v0.4.1...v0.9.1) --- updated-dependencies: - dependency-name: torchmetrics dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix mypy errors for model summary utilities (#13384) * rename org Lightning AI * Modified python version check to accommodate for legacy version styles (#13420) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> (cherry picked from commit b332b66) * Call `set_epoch` for distributed batch samplers (#13396) Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> (cherry picked from commit 2dd332f) * _RICH_AVAILABLE * _FAIRSCALE_AVAILABLE * _BAGUA_AVAILABLE * redefine * chlog spaces * CI: Fix `fatal: unsafe repository` (#13515) * update release date * CI: azure rename * Restore log step during restart (#13467) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove redundant test * Update CI setup (#13291) * drop mamba * use legacy GPU machines * fix schema check Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> Co-authored-by: Sean Naren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Martino Sorbaro <martinosorb@users.noreply.github.com>
Lightning-AI · Jul 12, 2022 · ff53616 · ff53616
1 parent 74b1317
commit ff53616
Show file tree

Hide file tree

Showing 72 changed files with 292 additions and 297 deletions.
diff --git a/.actions/pull_legacy_checkpoints.sh b/.actions/pull_legacy_checkpoints.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+# Run this script from the project root.
+URL="https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip"
+mkdir -p legacy
+# wget is simpler but does not work on Windows
+python -c "from urllib.request import urlretrieve; urlretrieve('$URL', 'legacy/checkpoints.zip')"
+ls -l legacy/
+unzip -o legacy/checkpoints.zip -d legacy/
+ls -l legacy/checkpoints/
diff --git a/.azure-pipelines/gpu-benchmark.yml → .azure/gpu-benchmark.yml b/.azure-pipelines/gpu-benchmark.yml → .azure/gpu-benchmark.yml
@@ -26,7 +26,7 @@ jobs:
   - job: benchmarks
     timeoutInMinutes: "90"
     cancelTimeoutInMinutes: "2"
-    pool: azure-gpus-spot
+    pool: azure-jirka-spot
     container:
       image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.11"
       options: "--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all --shm-size=32g"

diff --git a/.azure-pipelines/gpu-tests.yml → .azure/gpu-tests.yml b/.azure-pipelines/gpu-tests.yml → .azure/gpu-tests.yml
@@ -29,7 +29,7 @@ jobs:
     # how much time to give 'run always even if cancelled tasks' before stopping them
     cancelTimeoutInMinutes: "2"
 
-    pool: azure-gpus-spot
+    pool: azure-jirka-spot
 
     container:
       image: $(image)
@@ -69,10 +69,7 @@ jobs:
         python requirements/check-avail-extras.py
       displayName: 'Env details'
 
-    - bash: |
-        wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/
-        unzip -o legacy/checkpoints.zip -d legacy/
-        ls -l legacy/checkpoints/
+    - bash: bash .actions/pull_legacy_checkpoints.sh
       displayName: 'Get legacy checkpoints'
 
     - bash: |

diff --git a/.azure-pipelines/hpu-tests.yml → .azure/hpu-tests.yml b/.azure-pipelines/hpu-tests.yml → .azure/hpu-tests.yml
diff --git a/.azure-pipelines/ipu-tests.yml → .azure/ipu-tests.yml b/.azure-pipelines/ipu-tests.yml → .azure/ipu-tests.yml
diff --git a/.github/workflows/ci_dockers.yml b/.github/workflows/ci_dockers.yml
@@ -97,7 +97,7 @@ jobs:
             UBUNTU_VERSION=${{ matrix.ubuntu_version }}
           file: dockers/base-cuda/Dockerfile
           push: false
-        timeout-minutes: 75
+        timeout-minutes: 95
 
   build-Conda:
     runs-on: ubuntu-20.04
@@ -123,7 +123,7 @@ jobs:
             CUDA_VERSION=${{ matrix.cuda_version }}
           file: dockers/base-conda/Dockerfile
           push: false
-        timeout-minutes: 75
+        timeout-minutes: 95
 
   build-ipu:
     runs-on: ubuntu-20.04

diff --git a/.github/workflows/ci_schema.yml b/.github/workflows/ci_schema.yml
@@ -21,4 +21,4 @@ jobs:
       - name: Azure Pipelines
         env:
           SCHEMA_FILE: https://raw.githubusercontent.com/microsoft/azure-pipelines-vscode/v1.204.0/service-schema.json
-        run: check-jsonschema .azure-pipelines/*.yml --schemafile "$SCHEMA_FILE"
+        run: check-jsonschema .azure/*.yml --schemafile "$SCHEMA_FILE"
diff --git a/.github/workflows/ci_test-base.yml b/.github/workflows/ci_test-base.yml
diff --git a/.github/workflows/ci_test-conda.yml b/.github/workflows/ci_test-conda.yml
@@ -31,19 +31,27 @@ jobs:
     timeout-minutes: 30
     steps:
     - name: Workaround for https://github.com/actions/checkout/issues/760
-      run: git config --global --add safe.directory /__w/pytorch-lightning/pytorch-lightning
+      run: git config --global --add safe.directory /__w/lightning/lightning
 
     - uses: actions/checkout@v2
 
-    - name: Update dependencies
+    - name: Update base dependencies
+      run: |
+        conda info
+        conda list
+        pip install -r requirements/test.txt
+
+    - name: DocTests
+      run: |
+        coverage run --source pytorch_lightning -m pytest pytorch_lightning
+
+    - name: Update all dependencies
       env:
         HOROVOD_BUILD_ARCH_FLAGS: "-mfma"
         HOROVOD_WITHOUT_MXNET: 1
         HOROVOD_WITHOUT_TENSORFLOW: 1
       run: |
         set -e
-        conda info
-        conda list
         # adjust versions according installed Torch version
         python ./requirements/adjust-versions.py requirements/extra.txt
         python ./requirements/adjust-versions.py requirements/examples.txt
@@ -55,17 +63,12 @@ jobs:
         # sanity check
         python requirements/check-avail-extras.py
 
-    - name: Pull checkpoints from S3
-      working-directory: ./legacy
-      run: |
-        # enter legacy and update checkpoints from S3
-        curl https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip --output checkpoints.zip
-        unzip -o checkpoints.zip
-        ls -l checkpoints/
+    - name: Pull legacy checkpoints
+      run: bash .actions/pull_legacy_checkpoints.sh
 
-    - name: Tests
+    - name: UnitTests
       run: |
-        coverage run --source pytorch_lightning -m pytest --timeout 150 pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-torch${{ matrix.pytorch-version }}.xml
+        coverage run --source pytorch_lightning -m pytest --timeout 150 tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-torch${{ matrix.pytorch-version }}.xml
 
     - name: Upload pytest results
       uses: actions/upload-artifact@v2

diff --git a/.github/workflows/ci_test-full.yml b/.github/workflows/ci_test-full.yml
@@ -76,27 +76,21 @@ jobs:
         restore-keys: |
           ${{ runner.os }}-pip-td${{ env.TIME_PERIOD }}-py${{ matrix.python-version }}-${{ matrix.release }}-${{ matrix.requires }}-
 
-    - name: Pull checkpoints from S3
-      working-directory: ./legacy
-      run: |
-        # wget is simpler but does not work on Windows
-        python -c "from urllib.request import urlretrieve ; urlretrieve('https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip', 'checkpoints.zip')"
-        ls -l .
-        unzip -o checkpoints.zip
-        ls -l checkpoints/
+    - name: Pull legacy checkpoints
+      run: bash .actions/pull_legacy_checkpoints.sh
 
     - name: Install dependencies
       run: |
         flag=$(python -c "print('--pre' if '${{matrix.release}}' == 'pre' else '')" 2>&1)
         url=$(python -c "print('test/cpu/torch_test.html' if '${{matrix.release}}' == 'pre' else 'cpu/torch_stable.html')" 2>&1)
         pip install -r requirements.txt --upgrade $flag --find-links "https://download.pytorch.org/whl/${url}"
-        # adjust versions according installed Torch version
-        python ./requirements/adjust-versions.py requirements/examples.txt
-        pip install -r requirements/examples.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
         pip install -r requirements/test.txt --upgrade
         pip list
       shell: bash
 
+    - name: DocTests
+      run: coverage run --source pytorch_lightning -m pytest pytorch_lightning
+
     - name: Install extra dependencies
       run: |
         # adjust versions according installed Torch version
@@ -132,13 +126,16 @@ jobs:
       run: |
         python requirements/check-avail-extras.py
 
-    - name: Tests
+    - name: UnitTests
       run: |
         # NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
         coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-py${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}.xml
 
     - name: Examples
       run: |
+        # adjust versions according installed Torch version
+        python ./requirements/adjust-versions.py requirements/examples.txt
+        pip install -r requirements/examples.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
         python -m pytest pl_examples -v --durations=10
 
     - name: Upload pytest results

diff --git a/.github/workflows/events-nightly.yml b/.github/workflows/events-nightly.yml
@@ -146,7 +146,7 @@ jobs:
           file: dockers/base-cuda/Dockerfile
           push: ${{ env.PUSH_TO_HUB }}
           tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
-        timeout-minutes: 85
+        timeout-minutes: 95
 
       # report failure to Slack
       - name: Slack notification
@@ -197,7 +197,7 @@ jobs:
           file: dockers/base-conda/Dockerfile
           push: ${{ env.PUSH_TO_HUB }}
           tags: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
-        timeout-minutes: 85
+        timeout-minutes: 95
 
       # report failure to Slack
       - name: Slack notification

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,16 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 
+## [1.6.5] - 2022-07-12
+
+### Fixed
+
+- Fixed `estimated_stepping_batches` requiring distributed comms in `configure_optimizers` for the `DeepSpeedStrategy` ([#13350](https://github.com/PyTorchLightning/pytorch-lightning/pull/13350))
+- Fixed bug with Python version check that prevented use with development versions of Python ([#13420](https://github.com/PyTorchLightning/pytorch-lightning/pull/13420))
+- The loops now call `.set_epoch()` also on batch samplers if the dataloader has one wrapped in a distributed sampler ([#13396](https://github.com/PyTorchLightning/pytorch-lightning/pull/13396))
+- Fixed the restoration of log step during restart ([#13467](https://github.com/PyTorchLightning/pytorch-lightning/pull/13467))
+
+
 ## [1.6.4] - 2022-06-01
 
 ### Added

diff --git a/_notebooks b/_notebooks
diff --git a/dockers/base-conda/Dockerfile b/dockers/base-conda/Dockerfile
@@ -72,16 +72,15 @@ COPY environment.yml environment.yml
 
 # conda init
 RUN conda update -n base -c defaults conda && \
-    conda install mamba -n base -c conda-forge && \
-    mamba create -y --name $CONDA_ENV python=${PYTHON_VERSION} pytorch=${PYTORCH_VERSION} torchvision torchtext cudatoolkit=${CUDA_VERSION} -c nvidia -c pytorch -c pytorch-test -c pytorch-nightly && \
+    conda create -y --name $CONDA_ENV python=${PYTHON_VERSION} pytorch=${PYTORCH_VERSION} torchvision torchtext cudatoolkit=${CUDA_VERSION} -c nvidia -c pytorch -c pytorch-test -c pytorch-nightly && \
     conda init bash && \
     # NOTE: this requires that the channel is presented in the yaml before packages \
     printf "import re;\nfname = 'environment.yml';\nreq = open(fname).read();\nfor n in ['python', 'pytorch', 'torchtext', 'torchvision']:\n    req = re.sub(rf'- {n}[>=]+', f'# - {n}=', req);\nopen(fname, 'w').write(req)" > prune.py && \
     python prune.py && \
     rm prune.py && \
     cat environment.yml && \
-    mamba env update --name $CONDA_ENV --file environment.yml && \
-    mamba clean -ya && \
+    conda env update --name $CONDA_ENV --file environment.yml && \
+    conda clean -ya && \
     rm environment.yml
 
 ENV \

diff --git a/dockers/nvidia/Dockerfile b/dockers/nvidia/Dockerfile
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-ARG PYTORCH_VERSION=21.11
+ARG PYTORCH_VERSION=22.04
 
 # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes
 FROM nvcr.io/nvidia/pytorch:${PYTORCH_VERSION}-py3
@@ -37,20 +37,19 @@ RUN \
         cd .. ; \
     fi && \
 # save the examples
-    mv pytorch-lightning/_notebooks notebooks && \
+    mv pytorch-lightning/_notebooks/.notebooks/ notebooks && \
     mv pytorch-lightning/pl_examples . && \
 
 # Installations \
     pip install -q fire && \
-    python ./pytorch-lightning/.actions/assistant.py requirements_prune_pkgs horovod --req_files ./pytorch-lightning/requirements/extra.txt  && \
-    pip install "Pillow>=8.2, !=8.3.0" "cryptography>=3.4" "py>=1.10" --no-cache-dir --upgrade-strategy only-if-needed && \
-    pip install -r ./pytorch-lightning/requirements/extra.txt --no-cache-dir --upgrade-strategy only-if-needed && \
-    pip install -r ./pytorch-lightning/requirements/examples.txt --no-cache-dir --upgrade-strategy only-if-needed && \
-    pip install ./pytorch-lightning --no-cache-dir && \
+    pip install "Pillow>=8.2, !=8.3.0" "cryptography>=3.4" "py>=1.10" --no-cache-dir && \
+    pip install ./pytorch-lightning["extra","loggers","strategies"] --no-cache-dir && \
+    pip install -r ./pytorch-lightning/requirements/examples.txt --no-cache-dir && \
     rm -rf pytorch-lightning && \
-    pip install jupyterlab[all] -U && \
     pip list
 
+RUN pip install jupyterlab[all] -U
+
 RUN pip install lightning-grid -U && \
     pip install "py>=1.10" "protobuf>=3.15.6" --upgrade-strategy only-if-needed
 

diff --git a/dockers/release/Dockerfile b/dockers/release/Dockerfile
@@ -36,7 +36,7 @@ RUN \
         mv pytorch-lightning-*/ pytorch-lightning ; \
         rm *.zip ; \
     fi && \
-    pip install ./pytorch-lightning["extra"] --no-cache-dir && \
+    pip install ./pytorch-lightning["extra","loggers","strategies"] --no-cache-dir && \
     rm -rf pytorch-lightning
 
 RUN python --version && \

diff --git a/dockers/tpu-tests/Dockerfile b/dockers/tpu-tests/Dockerfile
@@ -22,10 +22,7 @@ LABEL maintainer="PyTorchLightning <https://github.com/PyTorchLightning>"
 COPY ./ ./pytorch-lightning/
 
 # Pull the legacy checkpoints
-RUN cd pytorch-lightning && \
-    wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/ && \
-    unzip -o legacy/checkpoints.zip -d legacy/ && \
-    ls -l legacy/checkpoints/
+RUN cd pytorch-lightning && bash .actions/pull_legacy_checkpoints.sh
 
 RUN \
     pip install -q fire && \

diff --git a/legacy/README.md b/legacy/README.md
@@ -7,8 +7,7 @@ At this moment we focus on ability running old checkpoints, so the flow here is
 If you want to pull all saved version-checkpoints for local testing/development, call
 
 ```bash
-wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip
-unzip -o checkpoints.zip
+bash .actions/pull_legacy_checkpoints.sh
 ```
 
 To back populate collection with past version you can use following bash:

diff --git a/pl_examples/basic_examples/mnist_datamodule.py b/pl_examples/basic_examples/mnist_datamodule.py
@@ -36,7 +36,7 @@ class _MNIST(Dataset):
     """Carbon copy of ``tests.helpers.datasets.MNIST``.
 
     We cannot import the tests as they are not distributed with the package.
-    See https://github.com/PyTorchLightning/pytorch-lightning/pull/7614#discussion_r671183652 for more context.
+    See https://github.com/Lightning-AI/lightning/pull/7614#discussion_r671183652 for more context.
     """
 
     RESOURCES = (

diff --git a/pl_examples/domain_templates/reinforce_learn_ppo.py b/pl_examples/domain_templates/reinforce_learn_ppo.py
@@ -125,7 +125,7 @@ def get_log_prob(self, pi: Normal, actions: torch.Tensor):
 
 
 class ExperienceSourceDataset(IterableDataset):
-    """Implementation from PyTorch Lightning Bolts: https://github.com/PyTorchLightning/lightning-
+    """Implementation from PyTorch Lightning Bolts: https://github.com/Lightning-AI/lightning-
     bolts/blob/master/pl_bolts/datamodules/experience_source.py.
 
     Basic experience source dataset. Takes a generate_batch function that returns an iterator. The logic for the