From 0711711a3186c643dde056faece5b905caeecedd Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Thu, 23 Jun 2022 14:48:34 -0700 Subject: [PATCH 01/17] remove registry references for 1.7, update closed compound word version of `finetuning` to hyphenated compound version `fine-tuning` as that form remains the dominant version for now --- .../finetuning-scheduler/.meta.yml | 11 +- .../finetuning-scheduler.py | 174 ++++++++---------- .../finetuning-scheduler/logo_fts.png | Bin 7798 -> 7586 bytes 3 files changed, 82 insertions(+), 103 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/.meta.yml b/lightning_examples/finetuning-scheduler/.meta.yml index 9cfa148fe..02f37bf8e 100644 --- a/lightning_examples/finetuning-scheduler/.meta.yml +++ b/lightning_examples/finetuning-scheduler/.meta.yml @@ -1,20 +1,19 @@ -title: Finetuning Scheduler +title: Fine-Tuning Scheduler author: "[Dan Dale](https://github.com/speediedan)" created: 2021-11-29 updated: 2022-06-10 license: CC BY-SA build: 0 tags: - - Finetuning + - Fine-Tuning description: | - This notebook introduces the [Finetuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) extension - and demonstrates the use of it to finetune a small foundational model on the + This notebook introduces the [Fine-Tuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) extension + and demonstrates the use of it to fine-tune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/) with iterative early-stopping defined according to a user-specified schedule. It uses Hugging Face's ``datasets`` and ``transformers`` libraries to retrieve the relevant benchmark data and foundational model weights. The required dependencies are installed via the finetuning-scheduler ``[examples]`` extra. requirements: - - finetuning-scheduler[examples] - - hydra-core>=1.1.0 + - finetuning-scheduler[examples]>=0.1.8 accelerator: - GPU diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index 27835265a..08ef55ea1 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -1,35 +1,35 @@ # %% [markdown] -# ## Scheduled Finetuning with the Finetuning Scheduler Extension +# ## Scheduled Fine-Tuning with the Fine-Tuning Scheduler Extension # -# ![Finetuning Scheduler logo](logo_fts.png){height="58px" width="401px"} +# ![Fine-Tuning Scheduler logo](logo_fts.png){height="55px" width="401px"} # -# The [Finetuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) extension accelerates and enhances model experimentation with flexible finetuning schedules. +# The [Fine-Tuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) extension accelerates and enhances model experimentation with flexible fine-tuning schedules. # # Training with the extension is simple and confers a host of benefits: # -# - it dramatically increases finetuning flexibility +# - it dramatically increases fine-tuning flexibility # - expedites and facilitates exploration of model tuning dynamics -# - enables marginal performance improvements of finetuned models +# - enables marginal performance improvements of fine-tuned models # # Setup is straightforward, just install from PyPI! Since this notebook-based example requires a few additional packages (e.g. # ``transformers``, ``sentencepiece``), we installed the ``finetuning-scheduler`` package with the ``[examples]`` extra above. # Once the ``finetuning-scheduler`` package is installed, the [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) callback is available for use with PyTorch Lightning. -# For additional installation options, please see the Finetuning Scheduler [README](https://github.com/speediedan/finetuning-scheduler/blob/main/README.md). +# For additional installation options, please see the Fine-Tuning Scheduler [README](https://github.com/speediedan/finetuning-scheduler/blob/main/README.md). # # # #
# -# Fundamentally, [Finetuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) enables -# scheduled, multi-phase, finetuning of foundational models. Gradual unfreezing (i.e. thawing) can help maximize +# Fundamentally, [Fine-Tuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) enables +# scheduled, multi-phase, fine-tuning of foundational models. Gradual unfreezing (i.e. thawing) can help maximize # foundational model knowledge retention while allowing (typically upper layers of) the model to # optimally adapt to new tasks during transfer learning [1, 2, 3](#f1) # #
# # The [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) callback orchestrates the gradual unfreezing -# of models via a finetuning schedule that is either implicitly generated (the default) or explicitly provided by the user -# (more computationally efficient). Finetuning phase transitions are driven by +# of models via a fine-tuning schedule that is either implicitly generated (the default) or explicitly provided by the user +# (more computationally efficient). Fine-tuning phase transitions are driven by # [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) # criteria (a multi-phase extension of ``EarlyStopping`` packaged with FinetuningScheduler), user-specified epoch transitions or a composition of the two (the default mode). # A [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) training session completes when the @@ -44,8 +44,8 @@ # #
# -# If no finetuning schedule is provided by the user, [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) will generate a -# [default schedule](#The-Default-Finetuning-Schedule) and proceed to finetune according to the generated schedule, +# If no fine-tuning schedule is provided by the user, [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) will generate a +# [default schedule](#The-Default-Finetuning-Schedule) and proceed to fine-tune according to the generated schedule, # using default [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) and [FTSCheckpoint](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSCheckpoint) callbacks with ``monitor=val_loss``. # #
@@ -57,16 +57,16 @@ # ``` # %% [markdown] -# ## The Default Finetuning Schedule +# ## The Default Fine-Tuning Schedule # -# Schedule definition is facilitated via the [gen_ft_schedule](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.ScheduleImplMixin.gen_ft_schedule) method which dumps a default finetuning schedule (by default using a naive, 2-parameters per level heuristic) which can be adjusted as -# desired by the user and/or subsequently passed to the callback. Using the default/implicitly generated schedule will likely be less computationally efficient than a user-defined finetuning schedule but is useful for exploring a model's finetuning behavior and can serve as a good baseline for subsequent explicit schedule refinement. +# Schedule definition is facilitated via the [gen_ft_schedule](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.ScheduleImplMixin.gen_ft_schedule) method which dumps a default fine-tuning schedule (by default using a naive, 2-parameters per level heuristic) which can be adjusted as +# desired by the user and/or subsequently passed to the callback. Using the default/implicitly generated schedule will likely be less computationally efficient than a user-defined fine-tuning schedule but is useful for exploring a model's fine-tuning behavior and can serve as a good baseline for subsequent explicit schedule refinement. # While the current version of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) only supports single optimizer and (optional) lr_scheduler configurations, per-phase maximum learning rates can be set as demonstrated in the next section. # %% [markdown] -# ## Specifying a Finetuning Schedule +# ## Specifying a Fine-Tuning Schedule # -# To specify a finetuning schedule, it's convenient to first generate the default schedule and then alter the thawed/unfrozen parameter groups associated with each finetuning phase as desired. Finetuning phases are zero-indexed and executed in ascending order. +# To specify a fine-tuning schedule, it's convenient to first generate the default schedule and then alter the thawed/unfrozen parameter groups associated with each fine-tuning phase as desired. Fine-tuning phases are zero-indexed and executed in ascending order. # # 1. First, generate the default schedule to ``Trainer.log_dir``. It will be named after your # ``LightningModule`` subclass with the suffix ``_ft_schedule.yaml``. @@ -79,9 +79,9 @@ # # 2. Alter the schedule as desired. # -# ![side-by-side-yaml](side_by_side_yaml.png){height="327px" width="800px"} +# ![side_by_side_yaml](side_by_side_yaml.png){height="327px" width="800px"} # -# 3. Once the finetuning schedule has been altered as desired, pass it to +# 3. Once the fine-tuning schedule has been altered as desired, pass it to # [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) to commence scheduled training: # # ```python @@ -96,7 +96,7 @@ # # # By default, [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) and epoch-driven -# transition criteria are composed. If a ``max_transition_epoch`` is specified for a given phase, the next finetuning phase will begin at that epoch unless [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) criteria are met first. +# transition criteria are composed. If a ``max_transition_epoch`` is specified for a given phase, the next fine-tuning phase will begin at that epoch unless [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) criteria are met first. # If [FinetuningScheduler.epoch_transitions_only](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler.params.epoch_transitions_only) is ``True``, [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) will not be used # and transitions will be exclusively epoch-driven. # @@ -105,19 +105,19 @@ # # **Tip:** Use of regex expressions can be convenient for specifying more complex schedules. Also, a per-phase base maximum lr can be specified: # -# ![emphasized-yaml](emphasized_yaml.png){height="380px" width="800px"} +# ![emphasized_yaml](emphasized_yaml.png){height="380px" width="800px"} # # # # # -# The end-to-end example in this notebook ([Scheduled Finetuning For SuperGLUE](#superglue)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to finetune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). -# Please see the [official Finetuning Scheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) if you are interested in a similar [CLI-based example](https://finetuning-scheduler.readthedocs.io/en/stable/index.html#scheduled-finetuning-superglue) using the LightningCLI. +# The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#superglue)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). +# Please see the [official Fine-Tuning Scheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) if you are interested in a similar [CLI-based example](https://finetuning-scheduler.readthedocs.io/en/stable/index.html#scheduled-finetuning-superglue) using the LightningCLI. # %% [markdown] -# ## Resuming Scheduled Finetuning Training Sessions +# ## Resuming Scheduled Fine-Tuning Training Sessions # -# Resumption of scheduled finetuning training is identical to the continuation of +# Resumption of scheduled fine-tuning training is identical to the continuation of # [other training sessions](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html) with the caveat that the provided checkpoint must have been saved by a [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) session. # [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) uses [FTSCheckpoint](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSCheckpoint) (an extension of ``ModelCheckpoint``) to maintain schedule state with special metadata. # @@ -131,14 +131,14 @@ # # Training will resume at the depth/level of the provided checkpoint according to the specified schedule. Schedules can be altered between training sessions but schedule compatibility is left to the user for maximal flexibility. If executing a user-defined schedule, typically the same schedule should be provided for the original and resumed training sessions. # -# By default ([FinetuningScheduler.restore_best](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html?highlight=restore_best#finetuning_scheduler.fts.FinetuningScheduler.params.restore_best) is ``True``), [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) will attempt to restore the best available checkpoint before finetuning depth transitions. +# By default ([FinetuningScheduler.restore_best](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html?highlight=restore_best#finetuning_scheduler.fts.FinetuningScheduler.params.restore_best) is ``True``), [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) will attempt to restore the best available checkpoint before fine-tuning depth transitions. # # ```python # trainer = Trainer(callbacks=[FinetuningScheduler()]) # trainer.fit(..., ckpt_path="some/path/to/my_kth_best_checkpoint.ckpt") # ``` # -# Note that similar to the behavior of [ModelCheckpoint](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.callbacks.ModelCheckpoint.html), (specifically [this PR](https://github.com/PyTorchLightning/pytorch-lightning/pull/12045)), +# Note that similar to the behavior of [ModelCheckpoint](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.callbacks.ModelCheckpoint.html), (specifically [this PR](https://github.com/Lightning-AI/lightning/pull/12045)), # when resuming training with a different [FTSCheckpoint](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSCheckpoint) ``dirpath`` from the provided # checkpoint, the new training session's checkpoint state will be re-initialized at the resumption depth with the provided checkpoint being set as the best checkpoint. @@ -160,28 +160,25 @@ # %% [markdown] #
# -# ## Scheduled Finetuning For SuperGLUE +# ## Scheduled Fine-Tuning For SuperGLUE # -# The following example demonstrates the use of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) to finetune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). Iterative early-stopping will be applied according to a user-specified schedule. +# The following example demonstrates the use of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) to fine-tune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). Iterative early-stopping will be applied according to a user-specified schedule. # # %% import os import warnings from datetime import datetime -from importlib import import_module from typing import Any, Dict, List, Optional -import datasets - import sentencepiece as sp # noqa: F401 # isort: split +import datasets import pytorch_lightning as pl import torch +from datasets import logging as datasets_logging from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint from pytorch_lightning.loggers.tensorboard import TensorBoardLogger from pytorch_lightning.utilities import rank_zero_warn -from pytorch_lightning.utilities.cli import _Registry -from pytorch_lightning.utilities.exceptions import MisconfigurationException from torch.optim.adamw import AdamW from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts from torch.utils.data import DataLoader @@ -190,38 +187,24 @@ from transformers.tokenization_utils_base import BatchEncoding # %% -# a couple helper functions to prepare code to work with a user module registry -MOCK_REGISTRY = _Registry() - - -def mock_register_module(key: str, require_fqn: bool = False) -> List: - if key.lower() == "finetuningscheduler": - mod = import_module("finetuning_scheduler") - MOCK_REGISTRY.register_classes(mod, pl.callbacks.Callback) - else: - raise MisconfigurationException(f"user module key '{key}' not found") - registered_list = [] - # make registered class available by unqualified class name by default - if not require_fqn: - for n, c in MOCK_REGISTRY.items(): - globals()[f"{n}"] = c - registered_list = ", ".join([n for n in MOCK_REGISTRY.names]) - else: - registered_list = ", ".join([c.__module__ + "." + c.__name__ for c in MOCK_REGISTRY.classes]) - print(f"Imported and registered the following callbacks: {registered_list}") +# Import the `FinetuningScheduler` PyTorch Lightning extension module we want to use. This will import all necessary callbacks. +import finetuning_scheduler as fts # isort: split - -# %% -# Load the `FinetuningScheduler` PyTorch Lightning extension module we want to use. This will import all necessary callbacks. -mock_register_module("finetuningscheduler") # set notebook-level variables TASK_NUM_LABELS = {"boolq": 2, "rte": 2} DEFAULT_TASK = "rte" -transformers_logging.set_verbosity_error() +# reduce hf logging verbosity to focus on tutorial-relevant code/messages +for hflogger in [transformers_logging, datasets_logging]: + hflogger.set_verbosity_error() # ignore warnings related tokenizers_parallelism/DataLoader parallelism trade-off and # expected logging behavior -for warnf in [".*does not have many workers*", ".*The number of training samples.*"]: +for warnf in [ + ".*does not have many workers.*", + ".*The number of training samples.*", + ".*converting to a fast.*", + ".*number of training batches.*", +]: warnings.filterwarnings("ignore", warnf) @@ -329,8 +312,8 @@ def _convert_to_features(self, example_batch: datasets.arrow_dataset.Batch) -> B # %% class RteBoolqModule(pl.LightningModule): - """A ``LightningModule`` that can be used to finetune a foundational model on either the RTE or BoolQ SuperGLUE - tasks using Hugging Face implementations of a given model and the `SuperGLUE Hugging Face dataset.""" + """A ``LightningModule`` that can be used to fine-tune a foundational model on either the RTE or BoolQ + SuperGLUE tasks using Hugging Face implementations of a given model and the `SuperGLUE Hugging Face dataset.""" def __init__( self, @@ -376,9 +359,9 @@ def __init__( self.no_decay = ["bias", "LayerNorm.weight"] @property - def finetuningscheduler_callback(self) -> FinetuningScheduler: # type: ignore # noqa - fts = [c for c in self.trainer.callbacks if isinstance(c, FinetuningScheduler)] # type: ignore # noqa - return fts[0] if fts else None + def finetuningscheduler_callback(self) -> fts.FinetuningScheduler: + fts_callback = [c for c in self.trainer.callbacks if isinstance(c, fts.FinetuningScheduler)] + return fts_callback[0] if fts_callback else None def forward(self, **inputs): return self.model(**inputs) @@ -389,12 +372,9 @@ def training_step(self, batch, batch_idx): self.log("train_loss", loss) return loss - def on_train_epoch_start(self) -> None: + def training_epoch_end(self, outputs: List[Any]) -> None: if self.finetuningscheduler_callback: - self.logger.log_metrics( - metrics={"finetuning_schedule_depth": float(self.finetuningscheduler_callback.curr_depth)}, - step=self.global_step, - ) + self.log("finetuning_schedule_depth", float(self.finetuningscheduler_callback.curr_depth)) def validation_step(self, batch, batch_idx, dataloader_idx=0): outputs = self(**batch) @@ -451,20 +431,20 @@ def configure_optimizers(self): # %% [markdown] # ### Our Training Sessions # -# We'll be comparing three different finetuning training configurations. Every configuration in this example depends -# upon a shared set of defaults, only differing in their respective finetuning schedules. +# We'll be comparing three different fine-tuning training configurations. Every configuration in this example depends +# upon a shared set of defaults, only differing in their respective fine-tuning schedules. # # | Experiment Tag | Training Scenario Description | # |:-----------------:| ---------------------------------------------------------------------- | -# | ``fts_explicit`` | Training with a finetuning schedule explicitly provided by the user | -# | ``nofts_baseline``| A baseline finetuning training session (without scheduled finetuning) | -# | ``fts_implicit`` | Training with an implicitly generated finetuning schedule (the default)| +# | ``fts_explicit`` | Training with a fine-tuning schedule explicitly provided by the user | +# | ``nofts_baseline``| A baseline fine-tuning training session (without scheduled fine-tuning) | +# | ``fts_implicit`` | Training with an implicitly generated fine-tuning schedule (the default)| # # Let's begin by configuring the ``fts_explicit`` scenario. We'll subsequently run the other two scenarios for # comparison. # %% -# Let's create a finetuning schedule for our model and run an explicitly scheduled finetuning training scenario with it +# Let's create a fine-tuning schedule for our model and run an explicitly scheduled fine-tuning training scenario with it # Please see the [FinetuningScheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) for a full description of the schedule format @@ -488,7 +468,7 @@ def configure_optimizers(self): - model.deberta.embeddings.word_embeddings.weight """ ft_schedule_name = "RteBoolqModule_ft_schedule_deberta_base.yaml" -# Let's write the schedule to a file so we can simulate loading an explicitly defined finetuning +# Let's write the schedule to a file so we can simulate loading an explicitly defined fine-tuning # schedule. with open(ft_schedule_name, "w") as f: f.write(ft_schedule_yaml) @@ -505,8 +485,8 @@ def configure_optimizers(self): # # Though other optimizers can arguably yield some marginal advantage contingent on the context, # the Adam optimizer (and the [AdamW version](https://pytorch.org/docs/stable/_modules/torch/optim/adamw.html#AdamW) which -# implements decoupled weight decay) remains robust to hyperparameter choices and is commonly used for finetuning -# foundational language models. See [(Sivaprasad et al., 2020)](#f2) and [(Mosbach, Andriushchenko & Klakow, 2020)](#f3) for theoretical and systematic empirical justifications of Adam and its use in finetuning +# implements decoupled weight decay) remains robust to hyperparameter choices and is commonly used for fine-tuning +# foundational language models. See [(Sivaprasad et al., 2020)](#f2) and [(Mosbach, Andriushchenko & Klakow, 2020)](#f3) for theoretical and systematic empirical justifications of Adam and its use in fine-tuning # large transformer-based language models. The values used here have some justification # in the referenced literature but have been largely empirically determined and while a good # starting point could be could be further tuned. @@ -521,11 +501,11 @@ def configure_optimizers(self): # #
# -# The [CosineAnnealingWarmRestarts scheduler](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.html?highlight=cosineannealingwarm#torch.optim.lr_scheduler.CosineAnnealingWarmRestarts) nicely fits with our iterative finetuning since it does not depend upon a global max_epoch +# The [CosineAnnealingWarmRestarts scheduler](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.html?highlight=cosineannealingwarm#torch.optim.lr_scheduler.CosineAnnealingWarmRestarts) nicely fits with our iterative fine-tuning since it does not depend upon a global max_epoch # value. The importance of initial warmup is reduced due to the innate warmup effect of Adam bias correction [[5]](#f3) # and the gradual thawing we are performing. Note that commonly used LR schedulers that depend on providing # max_iterations/epochs (e.g. the -# [CosineWarmupScheduler](https://github.com/PyTorchLightning/lightning-tutorials/blob/0c325829101d5a6ebf32ed99bbf5b09badf04a59/course_UvA-DL/05-transformers-and-MH-attention/Transformers_MHAttention.py#L688) +# [CosineWarmupScheduler](https://github.com/Lightning-AI/tutorials/blob/0c325829101d5a6ebf32ed99bbf5b09badf04a59/course_UvA-DL/05-transformers-and-MH-attention/Transformers_MHAttention.py#L688) # used in other pytorch-lightning tutorials) also work with FinetuningScheduler. Though the LR scheduler is theoretically # justified [(Loshchilov & Hutter, 2016)](#f4), the particular values provided here are primarily empircally driven. # @@ -562,14 +542,14 @@ def configure_optimizers(self): checkpoint_kwargs = {"monitor": "val_loss", "save_top_k": 1} fts_kwargs = {"max_depth": 1} callbacks = [ - FinetuningScheduler(ft_schedule=ft_schedule_name, **fts_kwargs), # type: ignore # noqa - FTSEarlyStopping(**earlystopping_kwargs), # type: ignore # noqa - FTSCheckpoint(**checkpoint_kwargs), # type: ignore # noqa + fts.FinetuningScheduler(ft_schedule=ft_schedule_name, **fts_kwargs), + fts.FTSEarlyStopping(**earlystopping_kwargs), + fts.FTSCheckpoint(**checkpoint_kwargs), ] # %% logger = TensorBoardLogger("lightning_logs", name="fts_explicit") -# optionally start tensorboard and monitor progress graphically while viewing multi-phase finetuning specific training +# optionally start tensorboard and monitor progress graphically while viewing multi-phase fine-tuning specific training # logs in the cell output below by uncommenting the next 2 lines # # %load_ext tensorboard # # %tensorboard --logdir lightning_logs @@ -593,12 +573,12 @@ def train() -> None: print( - "Note given the computation associated w/ the multiple phases of finetuning demonstrated, this notebook is best used with an accelerator" + "Note given the computation associated w/ the multiple phases of fine-tuning demonstrated, this notebook is best used with an accelerator" ) train() # %% [markdown] -# ### Running the Baseline and Implicit Finetuning Scenarios +# ### Running the Baseline and Implicit Fine-Tuning Scenarios # # Let's now compare our ``nofts_baseline`` and ``fts_implicit`` scenarios with the ``fts_explicit`` one we just ran. # @@ -609,7 +589,7 @@ def train() -> None: # code. # # Note that we'll be using identical callback configurations to the ``fts_explicit`` scenario. Keeping [max_depth](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html?highlight=max_depth#finetuning_scheduler.fts.FinetuningScheduler.params.max_depth) for -# the implicit schedule will limit finetuning to just the last 4 parameters of the model, which is only a small fraction +# the implicit schedule will limit fine-tuning to just the last 4 parameters of the model, which is only a small fraction # of the parameters you'd want to tune for maximum performance. Since the implicit schedule is quite computationally # intensive and most useful for exploring model behavior, leaving [max_depth](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html?highlight=max_depth#finetuning_scheduler.fts.FinetuningScheduler.params.max_depth) 1 allows us to demo implicit mode # behavior while keeping the computational cost and runtime of this notebook reasonable. To review how a full implicit @@ -620,9 +600,9 @@ def train() -> None: # %% nofts_callbacks = [EarlyStopping(**earlystopping_kwargs), ModelCheckpoint(**checkpoint_kwargs)] fts_implicit_callbacks = [ - FinetuningScheduler(**fts_kwargs), # type: ignore # noqa - FTSEarlyStopping(**earlystopping_kwargs), # type: ignore # noqa - FTSCheckpoint(**checkpoint_kwargs), # type: ignore # noqa + fts.FinetuningScheduler(**fts_kwargs), + fts.FTSEarlyStopping(**earlystopping_kwargs), + fts.FTSCheckpoint(**checkpoint_kwargs), ] scenario_callbacks = {"nofts_baseline": nofts_callbacks, "fts_implicit": fts_implicit_callbacks} @@ -645,18 +625,18 @@ def train() -> None: # produced in the scenarios [here](https://drive.google.com/file/d/1t7myBgcqcZ9ax_IT9QVk-vFH_l_o5UXB/view?usp=sharing) # (caution, ~3.5GB). # -# [![fts-explicit-accuracy](fts_explicit_accuracy.png){height="315px" width="492px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOnRydWUsIm5vZnRzX2Jhc2VsaW5lIjpmYWxzZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D) -# [![nofts-baseline](nofts_baseline_accuracy.png){height="316px" width="505px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6dHJ1ZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D) +# [![fts_explicit_accuracy](fts_explicit_accuracy.png){height="315px" width="492px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOnRydWUsIm5vZnRzX2Jhc2VsaW5lIjpmYWxzZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D) +# [![nofts_baseline](nofts_baseline_accuracy.png){height="316px" width="505px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6dHJ1ZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D) # # Note there could be around ~1% variation in performance from the tensorboard summaries generated by this notebook # which uses DP and 1 GPU. # -# [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) expands the space of possible finetuning schedules and the composition of more sophisticated schedules can -# yield marginal finetuning performance gains. That stated, it should be emphasized the primary utility of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) is to grant -# greater finetuning flexibility for model exploration in research. For example, glancing at DeBERTa-v3's implicit training +# [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) expands the space of possible fine-tuning schedules and the composition of more sophisticated schedules can +# yield marginal fine-tuning performance gains. That stated, it should be emphasized the primary utility of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) is to grant +# greater fine-tuning flexibility for model exploration in research. For example, glancing at DeBERTa-v3's implicit training # run, a critical tuning transition point is immediately apparent: # -# [![implicit-training-transition](implicit_training_transition.png){height="272px" width="494px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6ZmFsc2UsImZ0c19pbXBsaWNpdCI6dHJ1ZX0%3D) +# [![implicit_training_transition](implicit_training_transition.png){height="272px" width="494px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6ZmFsc2UsImZ0c19pbXBsaWNpdCI6dHJ1ZX0%3D) # # Our `val_loss` begins a precipitous decline at step 3119 which corresponds to phase 17 in the schedule. Referring to our # schedule, in phase 17 we're beginning tuning the attention parameters of our 10th encoder layer (of 11). Interesting! @@ -667,7 +647,7 @@ def train() -> None: # # Note that though this example is intended to capture a common usage scenario, substantial variation is expected # among use cases and models. -# In summary, [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) provides increased finetuning flexibility that can be useful in a variety of +# In summary, [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) provides increased fine-tuning flexibility that can be useful in a variety of # contexts from exploring model tuning behavior to maximizing performance. # %% [markdown] # ## Footnotes diff --git a/lightning_examples/finetuning-scheduler/logo_fts.png b/lightning_examples/finetuning-scheduler/logo_fts.png index 00599a54ddb97acfae0f0b4eaeb9dfb9989376eb..02e14a323cf0d728967c939e3bf4382258a2c31c 100644 GIT binary patch delta 7570 zcmV;D9c|+FJfb_07Ya8B1^@s6%@cPJks%rj6LkOp6LkTrl$;-tMks$DNkl zd6*T&wZMOUXJeaTaRH(Mjawj~i5iz^RMZ$17vdV1m>7vf6BYGE!!k_G04icoi5qUw z#4VVZ7&ZFT7-J-AT;d91L{N!?0TG5}nB{i8KTcn6clC1n!ktB5{l1Tnxpk|n>sEKw zsXBGeDF=(i(gCRZK_7nyjsv^^d=pfASJ%p7DK(j{u7Jhz`D0Em-~@0ofCKCUJkYgu zvRFC>r>iSqv3x$MTL(@rUxJ3}p>XksT~;NFr2{DM>Izsa-5b>hcm&uPSO68Lz@(20 zU+2sMCi@d$&0WBO9Z^e*rK_;&EQ=)(eJ5}LFbEh0ZEusR->83DU~MqX`6j5HOI}5b zr3^MHfv)f+wgP=YZ76)b zS_$+Ird90;6J9Q1B`uaN!>&XumPE#_2mNa!u@anX(p~RP2&c8&zS-5XSh@*zC1SA@ zWNszQZvv`(iS2(te^6^KU2CoS}a|IU5QvM1v#k|oSTfqa^S|4 z1&jx6B(>kN`Sr3`x*od{u~^cPf%>(ed*+jzEL3j|7p*LOxmpTrlmDy%XEdlkm%5@B zOJ`zNA{I+JT+p+ENxBSH-cU!oy zvGBF^?}6Svm{xugOuoOAHMCed6YH^Vu@uKtV21B=23Fqze7o?a`V8Rj!L;nPKnDvE z)dv_3&KC07Spn`_pw=7t_5$7p)l}jNIB_ehgDyE8%^+A?i9{>(ZC&+zEA$l)l`KLRA9wz{!9MG=MTqXuCP4?}wuyvjfyF zE_`kMe9*&!X_>QNN{P*WRR)~XK#v3VCOlzm1s;Ee%v5mS0Db}d6f%3klov}_kAs1~ zM0{}zsIyCVpN>RbLU7`cV;rcPJK_Fgfha?a37|?jUT+dnD-pjF)DMk3|3Tsi`Ed}b z{6%1QpwjnUM6fy41NbZOB`;whxU>0O#}t~j4YZ$2P-rhh+VfNI^8lp%{46BjhBd$! zfNg(+S@gptu53qh18|n_^DS`sLongfxC^=;?gYsAthKZQ*<5&(0yw9E`%3Ut&{qN_ z6e4O{U@qu=f%(Aqfu}$%gBjJ(G6eL&pf3jZ1>ikkC%^@FLFwuNz*2C=gZ?&fL_XR1 z3{4s{^T7QXa5}J?FTJyV!>iz21o~uTLg0VVJ=)L^BI-%tN%?#FKxi{(3V#MQF1|R` z4^Y4bUc$PFO#71;ATjm$ypcVe{HSpYl+A?Ne+ToZ9|Gnk?ax8|F_<)U7s%cO9ONY| zD{*Bz5(oI6k(L4d88h%`m;-7m0pAa2e^A3g%}>5|%Dyye0o;+yTuG0;9Yy7p3!iDF*cp@D8X?2nZ3~Ee61JOqmDF z1GNo#A(DljZ+6c8)l0xjpsEQ7k?zrk37xlXw{Th^$){FHXW%|BVSPAH+53p2R*rT; zF@`>GXs`1)6J-f+3J@SL!^u$=(cXWx3!%ptV6}YCq=+cnZP3W9*Uz;u@sqBo%w}Tv zerN}{Gu@?Crx11=sV_~n+U6hv1jw8n-uLix=^BYjyfIMZtEH(!B*T4wYvA<`yjMrz z{|)cL&!Q&Dopt}^owpy8Oo9+`A_le8lJ8gE{}GK``)^=0B4}=g%s!wVOId&JXAo5m z916SybxHb5v%ObC?ULxMKVJ(+ ze!cLu6W<1&Hxg^0Vd7@iX;(93X7;ee`#*nX7Ni~r9#2`;XPnHw;PwTsCp`Pi;ShX5LE1VlKgdcX z3&WRhejPiL+Qq;*U0sXPP&O9Y9}Vtrl-&-DEPQR-DNwPPz{S}~h+&JnY4veN`(Bx9 zX!r_Dc`L(ZF?+b8 zubAnni-PH8ZQdW9z1-`DxJLl(Q29`No@qk}MS3mxlJo{vz)9K7s-C3)sxtn&+Q%Sc zSY+uD(2&ANWcvGvfr$^N7t~)?bQSaW_IbqPfO|hQL}(O3*S!Wj4^_`@`n{7-TN3cr z?;ab%#nEm~@;XxBMzDW&1W@Kn=r(N_vl3A~fG-<~5tZIB3fLjMtj{h7#t~SutG$qu z-g*LkkzjUxfEi&kIdvyuiR9Tt(+_wR)Cs<_euSJ|U4zWZ-v=H9ZUE)@;*R$ttqQh6)BJb9r$UCwgXaxQNTqXN*NvK^B z^Wwb`i^5)B86SUu^AzY?koW3b;2YAeY97>l3odF7muLEZIDmUAV(nCbic_KOBLb5- zJs#9#Q0=jo#bf^VVc`A?)QiA#;Jgg_Wf;CNIQdWy11YTh)IPxRh;htN#G)|QZx#T5 zhstYVqPaSb8GzLwU@YQe^F_q$xDog#R9sc~1j#uS(0G3=q8|S`@;>xK+TmT`NhrS_ zCYx$U^LOYkPbDDuJVe1{dle;9{hY z)p&Vo5DU5=Mc2_aCIf{m!_)!Dv)T_Bf;_8r-fwyaxDk{g2vslOZlupud-baZ=L)F1 z#Ty$gL>7M-u9Q>J5U0)>PDsF*vxiwdp-IW#=3a?f3z+uPLy(5r6KUKWON?v#ATPsE zf*tX3P+J3c0KLHZEjV|9+aKA`90?o+3_~7J1T%ZXy6M0&9WKP{i1N5km``aFG=0CQ16015ov!@UrZ;APNVuD0>o_7CYWqEE9oJ~ zOY}8FFy-pA2-M-g^`IXI=hvX$M+W5az&F4>1se7Z3zG7|&|U+41sLr6;=za?wj&~7 z_9Q>JE{5i2P}6`Df!~1B0D28@HDX;f3OEMT?$HDA<^7>$j(2?)xDlM`;530d47d#R zp@@Hx$7^2sOAyuDe%?$l!hb?N3-t?vz8`u=s2^R}e`nIiUhJ%+e2Gjw4zQQ}%c;8NuLbpj^dWsGsMh3i8%BbA6CzXr9tP(+aF&2ug}ei20|!9saWHpe zP)Iqu1vRsoH$hEACW5~KWtwF{c6=&jTid?vG3)ZUX07pcUL5Ts8xC!|$AmqpKNGZRw|)&c{{RMn9txa*3`px? zcujh>in@wG2dNkoJ>yUV0RQZLIxq%MpnhlMeGoXt`>@V4g){aO)E9uq$m^@;0Oy8R zE%iSEoom=%fcl>_ziRVP_ap6@BMW~u)zLHG$m_31X3ieY!BF?th8+5We$(^h%^4p-WSF z)X?bselG71t;;12*G8IvEjDd6qHZPVhf|2xk(qLg=X3q>pzlV6n+E4hsPkE&%`JoZ z%YZF?7j1z!WajF02jZ&&um*XaYoYh}poNn<4*0#VoD;)+9mg9(aw_z9fkVReSA)Fo zdIKK-!-AG-mu&&9zbEgX_7H#aZkr~~4yc&~h5BC*$Lt7ZSHa3=@_Svcg8Eu?Uey59 z`fhas)crmt?=wKIuf7KAs}XrlMJCx5z&gkr0aJ_!33Do-@t=r^c^j1N2DOGjQo9rM z7py~s`_WL-22HEIdjPDA*%OJH88aUAT+lZG6UfhK-z-uuzn$;zW$}MY){NN!F97F) zh?fe24vxKGj_Lo>19+7DAo3>g(`dDq_Rj{+pmXAjneF{fdI17#H0jI{XGIrk(}dNI z_UCm$p9#Ic9qz2Ixo6Oqc(ya@leF6M=N$zs0Ch=FfB-mE#^;FpH%}!0>|3G7)!{ta z{~m4y&fcJY4QeatoJoHJ`jnT!G1YiOpy8zWYjRBZ6twwH*MRf=$n1qV8(GN|SY2#n zSKv_@{7f1S1brX*b5v#r1;|Ev`2Kz}C_sSV`&|ue5VehIBAez}stM!;2#`5FS^%nL zeMCZltWw1oF|G6H`JPok_SA^m`Rmn&sE?1AJ@IW@5tSH7+f0w!ST_f?RZukm(G~6noaQU2 z*bSbPW^$$ukO!bHDtrP2ofkij*vps{ysvXp{Ggy#A8+Lr$2vhC1fs)wu^U7$I+$nMFu7(ZO|{qF>m@nV4yGI6J$0NRi+Wp(22VL z@xH%b_e`WsjNjp;IQrI4`59qjI(K%;i^mf5odR?%Vjf;3$_4dvWUun4h{~`~aLD2C z=?{!@D#5*jyjr^rh}iT$DGQk+5%u+@(DS(9_j4)`6?1=%RZwLZ;^Z)&rTH$YFB9~q zc@d_(5M34kzKKB~)XXKwUO|r8F#^OR_}d01Ov58G`0G^x&JCbe$j6hiQ2BLmPJ+xxP_yD*QnS%s_l2ccA)nO%;FrZ& zT7(A7B$Iz(^I_Z-oo!}sShu4u;f>(;vwA|4p19N4eNX}qDl=7UkCU2_-i&Y zpQ7VHWk!N)3iBe`^z#1D;?rLyt}YDI)6_Gc4`9mQV#+i(P2Y&jCiwHd9bQ%B2wB4X zj`;J;{qJgfjIP+fUlh|(H|qNV8lm4mWAeo?=}mvAdluPr&M^?wSWm^2fLaUcJ|wJ0 zj-g0Yf+r&{DTrE%n3Z4eeI5eNzM%I34kI*p=pzsnuZMFA=>%&Ly7Qrgc^<(*V{SfLMQU z<@f|@-%cpQtXIOOHV~TpR>t&)#$5;s*QCjxDO}CGX1nYkuZ;kLld7o6lK(e6M*)k+ z2q9uhJ%}09mLW^nB0ML}vJ(Z6|A)L6Z>3R4O6y&W3!Z_hF$JrNn0N0rLq+UM%o7j*Tp?4syuy22cUEYb#W=>90*OPd`6q@kFw14HL(0UAF%6))X znmB(1w+)=HL3;zzu^Pb{2XzTPJxN@$42h~b**m!qU;s?H7#e(5db4X_<>|=A{jRW} z2ArPZ@|+Y=fdSNxz#FjPM-&uT0B9I+5v)9(yxP#*n>5uNDV!D~NjZxTX%v4*`b)4{J&B5o4V@Ku4LQ8kB!;iWk1w-o?hmDe(Jqs6hnx8A_nnom&R;NB9!nN-(lW zSx+F?CTk(PC^}oA2}F+8?*og#xfJvm&_4iu5n?c~9Ep;88T6XF`SFl?UJ1?P!13*+ zWv)Pcn_O`AfR%<1N#q#z1`K*SG&#Qo36AK!6OO4i>lJ_e7z*){7XP!< zP!%KSn@wjrBVT2eVJnaY-i8UqdiVD+KF3UYa;WVIYBCbu+Zol+?BCZ(8n+DVszeo8 zp?ze-AVDwO^r+a)BS~)<07N?*nuA&vYBqHil2rEmcxS$}gF~pyR=`-AYIEtCVzW~9 z5bGHIpjdg-b1tP%khA--K6I%0pp&lP>4sZj{2y^McnIdQo+1J6w^9i&|g zSfQ!Iz-h~yygQ2xp#0d50*6_h9B zD+Ogc5S|$oD@1>M9i92n1MaMzpf3rgrB)>-tc*;nl|;ehFZ1IB)9zseXVgW!CsBgQ zIRr65JrSmTG5mc;PxKbJzpSn7LDFNB9D)$JAwOT#J(3sy*5yAr4|w*B{4UR zVd{3IUKW}&%%~6p08_t&*qtYoIj#+{ddY2+ZV&Tg%tfbek2oD24U8@vE@5-YM)~6u zyRYKcxki6%fT^va>34*dym2AqXZ9xe4wPWU0;CSW7qa$BTjp|@6rms2xoPJxXkQ{5 zj%p>SMo^2qfBypYoaYyEhSv`^le()Rvmatz6(QIiQ-{?9I9Hs_igq&0wL=x~D5#0z zOc#a9K>aw-xrowJqkTVy-A*vJ*N>5~CuVsApz(h`B*MYyps8#zlZ?RN6}Nd^kAx-4 zG0R(N8qBVO`U!{y(RHAH>8tNbaQq>Gj)JM?n1{5Vx)(W^&qvbH3y!(}5z!y(MH-3&-4R28J-mcBfG5EDFX*k1Fah5Lo&qkQ;Ne2WQa=H-X(qfif$#!= zNPc6wO5CkTxOy17oC3)wxKAsY0yi za(se>rdbH;G|x2gSww%m9UEt}fLMFA0GEJz%JWdq9V_}Z@{^Vo!21$*qi&}6JARLt zwGJW>tdDn**l0I^x*h7j1^V)cGS?%^`gRz;Uw&`q^8U~=-zzsQjga|H0Uv}Sm_C0P z+II&%n2jd#9EThc?Po5ilZz``ZHai;w<3e=en{-GHpJ_AA@U42ARE)8kmvRxI2Xoy zNP}k%dky&$PP4$Dr2bCUGg>(ccnQkR$)A&>Zbj5UTM~4rTjJH_K2TSAp8h$r;*sQ6 zpl#$QQ-=nvgH#{H3i1btW8G@tcV2(l<;Zw&1h7sV?joHdf%>z&hztX~_PZIJ1wcQq z-<*g@a$K;B#1o zs5L=^@f}1^jx=DScT>DZ$a??Gjk-Q0I+I1A*CA&t=Vo55zHs3 zE{2HBI53BXrD?~T0d&qL*|Vj-f2Bkj5tY1?LGV9B*< zD>C!UB4H`~ZCcL8y_Qeq(=`D)O^R2XS8YAJ*m1 z+l#z8at859Xo)D(k>8dRAhUWx(Z^z#*&CT0^+lf3$4Fm%90vcP@Se*>&A>RAb{VuEjcnj= z1$+ba3gnqwMJSLTD+v0x_@7bFym!A_g`A7Cf$+2R@|xzQYAtX!Osj|X(MYh&frzkb zLf+HU#5L= z1P(?fh{pmN^dfMshPua~VT7oY$dVrqK1ISx;Z4-uBfkna`&bR(n1e2GCCs=HHXH?d zKg4%rKV+IIOzNRDqlH9QqLJ4$wIvJuH?kKy8ay4~~BX-h#|ez?~PpJioj@ zwCxP;sesQhwQT2@Qz%PSM>FmhNGCtKY3^pJVDrfu4pq;kk%K%pxSbNfYP+!1hSc#DU&-w?pr1H{Wx;7}OZx4@Tm1 zVgzEbbZ@K>vHU+kZ7b)4urug>6E@81jM`W%g|I@z zV(DJ=Azi;bUC!UOSh^M~L@bu>#Zq8#ihPyOTFh=iTNX=)V1 zd7Ko*y}-XUdvXgZD55AvR1if?yk7!dNjzdql$dBlqw&a7o*Ge5*r{0-j1rBad5R`a zO`=KG#3Md6#tY9RUhtv;Z!sn!Q9)R)y{6wEznY!y>SJcyUD#pk`}yn#rlz{Prn{?t zRllng4u>NFeHU;%kOF`H4mDqd=_@yJqa2PNLuykuz~Sh*T)jPbCjb?I0``EGr#Eq< z9F87^6Cw^rEKSQmwFk?*u`uf^o2ESuN6*6v5l1hf{uIzpf^LM%M;3RCZUcT7E^nXT zrA>1<${4k&8{lwsKl&)(0bfBoq`m-k3yV0mKd=(0w7xhIbcBC^aX5MxZYARA1yt;6 z7eKa10if3azliwa`c5{=;n*y2Ld4Mv*!8DCbFj=C1({=uJATZ$;p5o`^qF1IIESOG zaYDq=3z*ab-i^UB1$h5m{PElzF88kNY|9*ua>v<39KD5S6l=$f6{+-&2%`*3%3$+^cJ$f++dLcytz@8qmTEl2^Y8P&Na*7C~w?K#L>&p ztAPQ*vg{P7FVax1n}GhoGF1oZSzXaOhojtZD-lO8L#4xIsZs<$L|tzht}i)yDHWT# z0S?E;Wx-!y;(rmajZg~{*TI74ia7pRm|!1Y1rz$gf=7S4wyh3F&&SQ49K8_lA~C85 z%F-0wAF|UU3a7xlnoc&%;V3(t5OMTE^lhO2Z58$dohk14`uV^|R$(=)xVH0cb2!Qi zCqx{*l!|jB3NHq2r#Fe?g79jFGfkQlhojfvgovY;GV^Zm{%IA~g8FfB$Ll*m{mUw> z1a6F~?Qnl=t~ep$=*6fr!e!nMK-U&`d__ds6ZLJlW=QAS<#3b_PKY>qG5Sxy+g4!} z@YCXspZQ1NonV<)0jqxA`L;P6<$)6-j$TgXsS$;z!n}b+o?vpgNPP`{vTbKt<#3b> zPKY>qIWwLE{%#dk!irmpJ6=BnEVc?$u&S7RTn>Ln+0z5kp3JX;W!nM&G&{*T%8Je5 z`OO{=t=0xrUKZ3a&>t5*zJ70Le>vElln37DK)(}vSeMW+650*~b_BKn+JJumFM@tK zUr(n%eIL9Rg5_O>9uI7VI4AT027*@uJP!J`uD@P)GG|L@8w1%IP=kSL@cIEQ;vT-S z(8GU$(Q-9pGL1aI$rnWuo;f&b{z zrgo4y9=uDS?EoaX&&OJ`07?|@hoa8|M#~_(xuU-TT<0qo3d{tK-ZX#h z=``N~UL&}iI2szZgBi9fZv@_VKrJ@H!=Mh9iJ5+@>za8@A6R`Gc$0uNz*MNd6)yWE zU+2mJu>M$JCU}j|(nLP#R|cf+hU_Bn&H%pY`{MsL zayxDS?_QZKaLA@>M>3WWneiUr+h~7BPmDNb9;Ba%Kc!BAnrt=!ssYv$5tRkrHzE#> z5H{t8t_0G69fPpBv6($T{f4n`_*V`J+I|TfTEy|Mf&MdaP`=Dl$TqrP?V>iOVD&w~ zmw{F>;HRGm*SoX{xD)2y2W#&G4l`e@2j1wKwglSM0z3*R(Eg3SUIV?M>#u*;-OPL) zcpdca1cZoMv~k-}w-8te`X~ZIB-NvYNFl@eUe;%if-@PNh4ihUE;j{hV=CIl%uj(U z3bdyj_!^>4*vK#3#zxuCuL)?_6fNmG2AoWI!ZX3Bd5aL|*yhKerXkPor@%g- z%Up2&@O|JSuH5CO&=VrFgc)wF&?J@!rEbjo&dx9>)o=|lPw0;mgp2={q5#!gc zh*!vmpx%X4qcM;bfgXW)mmCP}VA|Fu=i&vZxC3Td1A|V}|0=SAq8%&=&&_h{vS&66g=Yho*lG z{JOZy=siIlCaW)oAKWMPUb7Ol@vDE9fi`_#_Xj2c2P0mX>wtd&FTs%C z8m&a3cBOwnO$6^tz&^m1$UC(LaTa(4w!J5&T^5ZMu=q=$4g_yEU@)>l-+Q1Q1Mhyw z?r9wqG4|=v!0AXo9DwxUTI8OcWHd3ky2F9XkBcW#dj;R6Vpezc`$P5! z;PZ%mwh!V|@eZgb!TWVg$i*4p?Sk~7TBM&00FHkb6TIMmx|D6auq$g7%!Nb2UMO zVl#i?71$!smY0vd7a6Q(4~A*0!l%$55>TsjU_bx3XYE0sL9l*1#2s`9@H8?I975h3rVaQm=-)(K zEB8CUYy9~~AOimmnLv1hfs+u`q6aJi)`5Te9e5`J z0<-sk*2%yTz-SW$7T3@XP&2%9>MXqt5=2;4pyD#%B~VX+_uF8fN<9ArM7?~lDR>2V z6qHfzd50mt72rwWP;sK^IDR1V9_4PrLr7no2K*=RoSajw2k&A!Vp(_|22OyB;yi!* z6zIvo3}9zuF#IL(G$;k$e!v;XK=}rwz5#VHhG$(1>O%0&LncmZfO~<3COC8~BJ4g! z+`WBXb*V$blO%mHFgM~tAA{aHSd=*#ykAAsZG(XWMW?=C;!`a(H>)Yg)iz`5H8C`eI*3hz;{7d%=V!WOM zOac8zU%RWNogIGai$Pl+7CAB}f_E!2__xc;mA-+1*w@v-6Lh$3s#73+N5p@*E(PyL zzJdnOGo!C-qica*5%fpTgjX$K1nYIkDkI0upwEstZwfdQ8K@2FA~RzLs?HC-jQUK# zpX%d6&@r1V&Dk1S-b7|Bkof%d5agK+HwAB-j(iR9L(qT7*KMqT#Vg1Qo~6d)CjxyP zdG+C&P<>!9R$^{HSo&$;wfxZ;9>KuYh z26EgWby$k*)8dQ#&rA)*h^f;c9g!wR_eTb;qfEi8P84CO(%oh1=Lz0ZT5jV%{DDeR#Mo3NojF=U0s; zj8TBZ80~C6EH{%0yH0Kkf?QVVe*wpY9bNPoWTM;`S*h=qUoCdj*}~Mxc_jV;_y8CM zH77yyw!o|8Cktz1me_x(v4mA!gJkBhxDEISyvNB)>Jp7k)vF1vFcJ)wEuSDiI!9li z0jj9gy*kUSap9>B0wq+4GIu2=kPmX z>}b4_EHd9BV1J@MC_uotxBAtYX8;Onj)T+$==b?}0RrY;pGwFv#2ZBM0^}pPZdV8p zfNd|KXk$7ctCnf7_~!U3`Uc=)ItB%Gc2Ix-Vvq3;UTct5k&loK^>knX=+g+NjW)Ln7t@b^&GCit5Kg`*Y+4$m75_!vX}%pn8^7_;$1a z0rSqK6pyZ$QZ7quicc>bmFZg9omh)R|asVLR>WgVYrkQ^8Mm=t*AS#@cF zF&7aT{D~dQkf3^5z8j*#%aPEAhV79THpfn&&Vd>4hwJKU@+${`rf40vcaiTymb?qA zjelPcW8Z(6zEYO)Q{&>7{Ox%WIFix>M;}BcSR+9HPl?ASaCR%`Q$ZaC+)sQAmd*E! z{mmkCBJcxW!CxUAA*9B<3H%;;Z{7oS5+PUrkPY=EVz(Riih0*&5eVGm#sZlGeB+Vl z%d(+sA$t!2`&R0LpeoKr*{k0I{kfD$q}?ZzY#e{7aRdkKh`O1x0Kd@uoe5HmArRkX zXCuLkZ`=C{#(|oex1s1~LDeCG_zK|kIKz6B5cw6PeH}tqBFonUN_XY6Z+hbK_5*qb z^mEZwQ{!Yz=^(;J(>8;A3J8EllLwTF3NeXDL!rGtftP+>4XoJJcfcBGxA&+>K=&oT zTm^qv4LU|o4+QUr2Q^71XrSqh!&m&(9gk@V_^OHNZR+&zAxJlo$=1rLF2=#elE(C$)bAC&;@VYXrBS^IYQZxBaeJ&sh&AxY7hs zt0du{l=}Pk4+y>ouRz@^1H)tlnG;KGH=zrrVp%6TXjmgPZ751R{J?V z3+ab_pFK#=LWDdJ@0}GPMC==*+dvy3QmQLo$e+BFf$))p^A?y9=xTye8=#-qG~$0~ zAL;v$g*I!}v|@LH#?DugXE`>hv5>9s75t6Z&ZkEZoL#>Yzsv?gFKne9Nax3X<_XWl zett>MKQu#4cz%UCO$YCh|7~#w*{>0JE~*pR5H4*(Y;beL9@Mr2umQ4fi#js1!Wi=n z-f*a!4E5cd4qmqae>H!G*~6f1lCgi$`B(6k-hIggLc%Y#Ao+T60;I757W)U%He{w6 zhxME*?dLp+yj>GyW_UPcqS7!KyaRk6{OYsQt%{a$Jn~#6yx*aCk`~A&Ou+0uS(P+) zTJ5mM;MGSU-fn*Ofy_u^@xJ2w5RswU?t|+03GD--g@__NtB4D=A4*ZFP(pwB&tE8v z-X6+0!eXBn`f?yzP2GRvP zLvdX85n}7`(LNngH~HF!*{grOi-d^25!8#2E@H>T%a;S!gKk2y5O*{(SkL>Sbzl?B z{ET_EUbj`wuM_Jc{-Z&#VMkxV#}R6(L#)s510xG8B%d~a8=%5G3q6$Zs^`y1CovQf zay~-8eM!8nP3Mx-=2#x@X$1uPLo*Ds`=ouJRsw^J-7b#B=C1{DT1bDoCdOeQkq9Bu zumiN&djGgk`v&0JuzoZVTSyfO<1ZzZLV+2C7iADVH`>5fEFsm7HSp;nh(7ioMCL8( zlxi=bZd)nF{@n=9{Qe%@)(-sld1dy1R^dJiM6tbKqjp#25V_aQM!J0kXo z0l-3F2IvP9uGt0D&j)`!i=aJcE2y6Wb@MuRZao4x2fW*XQnuaw1#mU_m8(5Fw(Syc z^O@9rP+CA-%bpa}^M_{Fe~R`%EJSL1OT7&RvkK4c2W$O?{*}?j{2ax^lXS2Lg@3z2 zrwFRa7Dw0z?fbOwv)!iv{%5(KyKuy6NpO&|cx#!YKT??r!qn@$dmX9Lp6hSt^DWf{c z-UzJUlj5CzC{dGO_uoPeMTL@CcF}#x(O5xH?OA9t2}G_$Lh2dl(op-7)IY_{T5^mc z*wNky2k=xcg60k6XWc1TDHIYUdLQ5lsF)HF1@B)jM-6{``$<`>JE{NpcEkrestsEp zoF)%~7wYLDPI->6Mm4sCSUklgs zGiztWopLy|+p`!&@WHJB?X#K8+8r1T>aOT>H&%Zj=^Jt^jn^e}B|#ziDa=TUMlMB- z)QM+Os3QdEg1TNV2i0WIXUg-@qkMlu#7$B6jXu6RQ9lTmrDo8%lqb3S-5#+eiEn-I5NYJpl3ut>l^wBU9viAYI!sriU+Gt2L78mng-b`?o{Z-=>lYrL1 zbOL{qFue|mq3{v%KJszJU_yrYf}UAJgCDD6jOv;15^||6#UUlK9gPIzbq+L|f&?;? z!MhHKNV8H(f6Wv1az_Y}#FwH(?3+_rif2|A%%%rUMbf0?xEi!=WJGp{%baggn|y-2 z$o6U?mr^6gp!`FeR|DT5@XSgj)h$7#R`hBpw>s$q+yd=yA^QB>-zU(#0$2-rCFrN*IqLrL*X}CP z_kptY=Y1j5nQYbm`*IAjsy$5%)KS!rg%uAWF28G#pt|S7_4I1Vzz|St-HD_EvdDjY z8QPu#{teZ~2HnKxY)LS#;KO(aj>ZatGxt^)5;5rN50JfCa%~#{+-?lhcHKQe-w)J* zx-1-*Q+o*kJ4m*{s9F6eNg2|-5YZg?>1)~@hh;NpwfeA5fj$eonMm}$k5rP1vWV54 z-hgDf$Z1mg0*z)H54-Ltxi{Xeh>(A-3)ivxYjK-jDPz-!T74vO8)(~gE&+WwiLRXh z)mur_as9rKeKEN%^Fg0uR3L@6F8u>&D?gR4Mz-ReLOu{~A@V}BBAwzuq!V6`q|Ash z{_8{J8Xf0(2k{a)2&k6m(Imh0%tY{h?>qicB==DhcsnC1tlt5@A=#vn06PXW`z$LILbAlPs%vO!m+IsP8=?=U2h z>Q%3r3qw;&^sV5^96{{*;NVZc7oBzpMqx_ zMYfP+XF0s5DUFX!q&wg>(UH1C^^M3Jr%5BI~S&`op-dj(8 z=28jDJS^{paC3AEppAbHdx99eyb*}Oe=V>8)Ro{pS-^7)(9ggXJ44F}NRqRy<@c#8 z!TUAT6-b4jenMt{>R9mhK-6QSz*{FPFK>p-%=AIPT}W=QtZ8S9B*^RV+IRw#S*Cja zc$WU4Ow7dK+yMWg@0sMzN&cAtRGCx~sa4?Rp2-^Ua@Xq%ss?|&c3%W~OZ4+oRniUs zWmbaCHK3-Ox*N>zrP@zVSw9sPf{BS~u;^fSL*eMrZ~WBze6 z*UtdFh%C}G-UCh$Pd+sf*=r_upO%Aq7SvPmk?bmJ3{8Q`t#;dzT#=JzfDT6wLhpyH z_=W+CAaxehMNAMn>JKrKgW15ortm}1MHrAfOZ~x+ea0$m1qK$LlsFvaMQ8Uoa5y#= zw~$Xke|mqi00Hs1OFv97eg0>RgD91{XTh`M5voAl6j|NjC}W%uag+=G;J66a-#CT$i`sA?8@=k_%QG4k)9A%3WB93yw@8NbsymwL#T`SH5UM`cYu`O&YE?)#G zTL)bUD@utvcQ|@H&L-k06LbpHrN+QN5XlpCKjMGMwhS={)*@Tejz?0TY>A|CzaEBM zSKKzNUE1a`pk4*f-_Nqpe;8aGk^Ii#C_kJKag-Tz`@z~HfCG?}Dnk+f_*%rDh&UYGO8t3|y*}p4^}r>_ z?t*^~M_G~-iRo}SI)xr;g2y7Z_ytpb6ijG>1;wN&cQ|?+&am!qbQ@|S!M!Gk@iLxc zwh?wX$_^()9FFcIB6D-32Dt{>j*I%j;ppi&A>wd!8#Rw3=}3w|6>wR0IC>^dh&UYG z#ckW933{q3iqT}l*3S>l9 Date: Fri, 22 Jul 2022 13:11:59 -0700 Subject: [PATCH 02/17] added sync_dist to log ops --- .../finetuning-scheduler/finetuning-scheduler.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index 08ef55ea1..9a9279a56 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -369,12 +369,12 @@ def forward(self, **inputs): def training_step(self, batch, batch_idx): outputs = self(**batch) loss = outputs[0] - self.log("train_loss", loss) + self.log("train_loss", loss, sync_dist=True) return loss def training_epoch_end(self, outputs: List[Any]) -> None: if self.finetuningscheduler_callback: - self.log("finetuning_schedule_depth", float(self.finetuningscheduler_callback.curr_depth)) + self.log("finetuning_schedule_depth", float(self.finetuningscheduler_callback.curr_depth), sync_dist=True) def validation_step(self, batch, batch_idx, dataloader_idx=0): outputs = self(**batch) @@ -384,9 +384,9 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0): elif self.num_labels == 1: preds = logits.squeeze() labels = batch["labels"] - self.log("val_loss", val_loss, prog_bar=True) + self.log("val_loss", val_loss, prog_bar=True, sync_dist=True) metric_dict = self.metric.compute(predictions=preds, references=labels) - self.log_dict(metric_dict, prog_bar=True) + self.log_dict(metric_dict, prog_bar=True, sync_dist=True) def _init_param_groups(self) -> List[Dict]: """Initialize the parameter groups. Used to ensure weight_decay is not applied to our specified bias From d51cd1796903fff5b8db03362cd097ba26672122 Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Sun, 31 Jul 2022 12:43:52 -0700 Subject: [PATCH 03/17] minor clarification in language, bump required finetuningscheduler version --- .../finetuning-scheduler/.meta.yml | 2 +- .../finetuning-scheduler.py | 18 +++++++++++++++--- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/.meta.yml b/lightning_examples/finetuning-scheduler/.meta.yml index 02f37bf8e..c580dda7f 100644 --- a/lightning_examples/finetuning-scheduler/.meta.yml +++ b/lightning_examples/finetuning-scheduler/.meta.yml @@ -14,6 +14,6 @@ description: | schedule. It uses Hugging Face's ``datasets`` and ``transformers`` libraries to retrieve the relevant benchmark data and foundational model weights. The required dependencies are installed via the finetuning-scheduler ``[examples]`` extra. requirements: - - finetuning-scheduler[examples]>=0.1.8 + - finetuning-scheduler[examples]>=0.2.0 accelerator: - GPU diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index 9a9279a56..51488bd48 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -1,3 +1,15 @@ +# --- +# jupyter: +# jupytext: +# cell_metadata_filter: -all +# formats: ipynb,py:percent +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.14.1 +# --- + # %% [markdown] # ## Scheduled Fine-Tuning with the Fine-Tuning Scheduler Extension # @@ -620,7 +632,7 @@ def train() -> None: # See the [tensorboard experiment summaries](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/) to get a sense # of the relative computational and performance tradeoffs associated with these [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) configurations. # The summary compares a full ``fts_implicit`` execution to ``fts_explicit`` and ``nofts_baseline`` scenarios using DDP -# training with 2 GPUs. The full logs/schedules for all three scenarios are available +# training with 2 GPUs. The full logs/schedules and detailed system configuration used for all three scenarios are available # [here](https://drive.google.com/file/d/1LrUcisRLHeJgh_BDOOD_GUBPp5iHAkoR/view?usp=sharing) and the checkpoints # produced in the scenarios [here](https://drive.google.com/file/d/1t7myBgcqcZ9ax_IT9QVk-vFH_l_o5UXB/view?usp=sharing) # (caution, ~3.5GB). @@ -628,8 +640,8 @@ def train() -> None: # [![fts_explicit_accuracy](fts_explicit_accuracy.png){height="315px" width="492px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOnRydWUsIm5vZnRzX2Jhc2VsaW5lIjpmYWxzZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D) # [![nofts_baseline](nofts_baseline_accuracy.png){height="316px" width="505px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6dHJ1ZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D) # -# Note there could be around ~1% variation in performance from the tensorboard summaries generated by this notebook -# which uses DP and 1 GPU. +# Note that the results above may vary to a small degree from the tensorboard summaries generated by this notebook +# which uses DP, 1 GPU and likely when you're running this, different versions of certain software components (e.g. pytorch, transformers). # # [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) expands the space of possible fine-tuning schedules and the composition of more sophisticated schedules can # yield marginal fine-tuning performance gains. That stated, it should be emphasized the primary utility of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) is to grant From fbc9d77d3c6eca4f5a4b7fcdeeffa8802375e582 Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Wed, 3 Aug 2022 11:39:30 -0700 Subject: [PATCH 04/17] apply small examples patch for torch issue #80809 with torch 1.12.0 --- lightning_examples/finetuning-scheduler/.meta.yml | 2 +- .../finetuning-scheduler/finetuning-scheduler.py | 10 +++++++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/.meta.yml b/lightning_examples/finetuning-scheduler/.meta.yml index c580dda7f..421086fff 100644 --- a/lightning_examples/finetuning-scheduler/.meta.yml +++ b/lightning_examples/finetuning-scheduler/.meta.yml @@ -1,7 +1,7 @@ title: Fine-Tuning Scheduler author: "[Dan Dale](https://github.com/speediedan)" created: 2021-11-29 -updated: 2022-06-10 +updated: 2022-08-03 license: CC BY-SA build: 0 tags: diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index 51488bd48..d57d763c0 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -183,6 +183,8 @@ from datetime import datetime from typing import Any, Dict, List, Optional +from packaging.version import Version + import sentencepiece as sp # noqa: F401 # isort: split import datasets import pytorch_lightning as pl @@ -191,13 +193,19 @@ from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint from pytorch_lightning.loggers.tensorboard import TensorBoardLogger from pytorch_lightning.utilities import rank_zero_warn -from torch.optim.adamw import AdamW from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts from torch.utils.data import DataLoader from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer from transformers import logging as transformers_logging from transformers.tokenization_utils_base import BatchEncoding +if Version(torch.__version__) == Version("1.12.0") or torch.__version__.startswith("1.12.0"): + # we need to use a patched version of AdamW to fix https://github.com/pytorch/pytorch/issues/80809 + # and allow examples to succeed with torch 1.12.0 (this torch bug is fixed in 1.12.1) + from fts_examples.patched_adamw import AdamW +else: + from torch.optim.adamw import AdamW + # %% # Import the `FinetuningScheduler` PyTorch Lightning extension module we want to use. This will import all necessary callbacks. import finetuning_scheduler as fts # isort: split From d1c1094238552ec161adf2669a916c0291e80947 Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Sat, 6 Aug 2022 12:48:25 -0700 Subject: [PATCH 05/17] update build date --- lightning_examples/finetuning-scheduler/.meta.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lightning_examples/finetuning-scheduler/.meta.yml b/lightning_examples/finetuning-scheduler/.meta.yml index 421086fff..c9985be4b 100644 --- a/lightning_examples/finetuning-scheduler/.meta.yml +++ b/lightning_examples/finetuning-scheduler/.meta.yml @@ -1,7 +1,7 @@ title: Fine-Tuning Scheduler author: "[Dan Dale](https://github.com/speediedan)" created: 2021-11-29 -updated: 2022-08-03 +updated: 2022-08-06 license: CC BY-SA build: 0 tags: From fd3d216e5c0f39a639ff3571785b6150fdde8b82 Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Sat, 6 Aug 2022 13:14:26 -0700 Subject: [PATCH 06/17] link anchor update --- lightning_examples/finetuning-scheduler/finetuning-scheduler.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index d57d763c0..68fce61be 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -124,7 +124,7 @@ # # # The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#superglue)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). -# Please see the [official Fine-Tuning Scheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) if you are interested in a similar [CLI-based example](https://finetuning-scheduler.readthedocs.io/en/stable/index.html#scheduled-finetuning-superglue) using the LightningCLI. +# Please see the [official Fine-Tuning Scheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) if you are interested in a similar [CLI-based example](https://finetuning-scheduler.readthedocs.io/en/stable/index.html#example-scheduled-fine-tuning-for-superglue) using the LightningCLI. # %% [markdown] # ## Resuming Scheduled Fine-Tuning Training Sessions From bed34ac71b1a7fb1fd7ffc29022cb011be2af754 Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Sat, 6 Aug 2022 18:49:26 -0700 Subject: [PATCH 07/17] remove jupytext metadata --- .../finetuning-scheduler/finetuning-scheduler.py | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index 68fce61be..e3ee24f98 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -1,15 +1,3 @@ -# --- -# jupyter: -# jupytext: -# cell_metadata_filter: -all -# formats: ipynb,py:percent -# text_representation: -# extension: .py -# format_name: percent -# format_version: '1.3' -# jupytext_version: 1.14.1 -# --- - # %% [markdown] # ## Scheduled Fine-Tuning with the Fine-Tuning Scheduler Extension # From a81ac7347f89460e0f3c4f76679e575743cc2e59 Mon Sep 17 00:00:00 2001 From: Dan Dale Date: Sun, 7 Aug 2022 15:51:59 -0700 Subject: [PATCH 08/17] aesthetic table formatting fix Co-authored-by: Rohit Gupta --- .../finetuning-scheduler/finetuning-scheduler.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index e3ee24f98..135435fce 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -444,9 +444,9 @@ def configure_optimizers(self): # # | Experiment Tag | Training Scenario Description | # |:-----------------:| ---------------------------------------------------------------------- | -# | ``fts_explicit`` | Training with a fine-tuning schedule explicitly provided by the user | +# | ``fts_explicit`` | Training with a fine-tuning schedule explicitly provided by the user | # | ``nofts_baseline``| A baseline fine-tuning training session (without scheduled fine-tuning) | -# | ``fts_implicit`` | Training with an implicitly generated fine-tuning schedule (the default)| +# | ``fts_implicit`` | Training with an implicitly generated fine-tuning schedule (the default) | # # Let's begin by configuring the ``fts_explicit`` scenario. We'll subsequently run the other two scenarios for # comparison. From 85ce97c012e98226380e97b93448b30ae0f3e259 Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Sun, 7 Aug 2022 16:04:47 -0700 Subject: [PATCH 09/17] remove unnecessary sync_dist logging config --- .../finetuning-scheduler/finetuning-scheduler.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index 135435fce..8d279c033 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -377,12 +377,12 @@ def forward(self, **inputs): def training_step(self, batch, batch_idx): outputs = self(**batch) loss = outputs[0] - self.log("train_loss", loss, sync_dist=True) + self.log("train_loss", loss) return loss def training_epoch_end(self, outputs: List[Any]) -> None: if self.finetuningscheduler_callback: - self.log("finetuning_schedule_depth", float(self.finetuningscheduler_callback.curr_depth), sync_dist=True) + self.log("finetuning_schedule_depth", float(self.finetuningscheduler_callback.curr_depth)) def validation_step(self, batch, batch_idx, dataloader_idx=0): outputs = self(**batch) @@ -392,9 +392,9 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0): elif self.num_labels == 1: preds = logits.squeeze() labels = batch["labels"] - self.log("val_loss", val_loss, prog_bar=True, sync_dist=True) + self.log("val_loss", val_loss, prog_bar=True) metric_dict = self.metric.compute(predictions=preds, references=labels) - self.log_dict(metric_dict, prog_bar=True, sync_dist=True) + self.log_dict(metric_dict, prog_bar=True) def _init_param_groups(self) -> List[Dict]: """Initialize the parameter groups. Used to ensure weight_decay is not applied to our specified bias From 985b105850c609c13d0045cbaf0c336e5d9e10bc Mon Sep 17 00:00:00 2001 From: Dan Dale Date: Mon, 8 Aug 2022 11:15:19 -0700 Subject: [PATCH 10/17] change regex strings to raw string literals Co-authored-by: Jirka Borovec --- .../finetuning-scheduler/finetuning-scheduler.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py index 8d279c033..3ab1b9c73 100644 --- a/lightning_examples/finetuning-scheduler/finetuning-scheduler.py +++ b/lightning_examples/finetuning-scheduler/finetuning-scheduler.py @@ -208,10 +208,10 @@ # ignore warnings related tokenizers_parallelism/DataLoader parallelism trade-off and # expected logging behavior for warnf in [ - ".*does not have many workers.*", - ".*The number of training samples.*", - ".*converting to a fast.*", - ".*number of training batches.*", + r".*does not have many workers.*", + r".*The number of training samples.*", + r".*converting to a fast.*", + r".*number of training batches.*", ]: warnings.filterwarnings("ignore", warnf) From 4877cde480db518aaa0bc37e83563832cd2ab0bd Mon Sep 17 00:00:00 2001 From: rohitgr7 Date: Fri, 12 Aug 2022 13:14:12 +0530 Subject: [PATCH 11/17] try fix --- lightning_examples/basic-gan/gan.py | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/lightning_examples/basic-gan/gan.py b/lightning_examples/basic-gan/gan.py index 24520fa1a..776b3397a 100644 --- a/lightning_examples/basic-gan/gan.py +++ b/lightning_examples/basic-gan/gan.py @@ -20,7 +20,7 @@ # ### MNIST DataModule # # Below, we define a DataModule for the MNIST Dataset. To learn more about DataModules, check out our tutorial -# on them or see the [latest docs](https://pytorch-lightning.readthedocs.io/en/stable/extensions/datamodules.html). +# on them or see the [latest release docs](https://pytorch-lightning.readthedocs.io/en/stable/data/datamodules.html). # %% @@ -43,9 +43,6 @@ def __init__( ] ) - # self.dims is returned when you call dm.size() - # Setting default dims here because we know them. - # Could optionally be assigned dynamically in dm.setup() self.dims = (1, 28, 28) self.num_classes = 10 @@ -248,7 +245,7 @@ def on_validation_epoch_end(self): # %% dm = MNISTDataModule() -model = GAN(*dm.size()) +model = GAN(*dm.dims) trainer = Trainer( accelerator="auto", devices=1 if torch.cuda.is_available() else None, # limiting got iPython runs From b1f3475cf629c4216fb13b381af8cfc796a95a5e Mon Sep 17 00:00:00 2001 From: Rohit Gupta Date: Fri, 12 Aug 2022 13:26:54 +0530 Subject: [PATCH 12/17] Apply suggestions from code review --- lightning_examples/basic-gan/gan.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lightning_examples/basic-gan/gan.py b/lightning_examples/basic-gan/gan.py index 776b3397a..1f2bd54a9 100644 --- a/lightning_examples/basic-gan/gan.py +++ b/lightning_examples/basic-gan/gan.py @@ -20,7 +20,7 @@ # ### MNIST DataModule # # Below, we define a DataModule for the MNIST Dataset. To learn more about DataModules, check out our tutorial -# on them or see the [latest release docs](https://pytorch-lightning.readthedocs.io/en/stable/data/datamodules.html). +# on them or see the [latest release docs](https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html). # %% From e9d59e863fb8cfce7136406b214b227dfccfeed1 Mon Sep 17 00:00:00 2001 From: rohitgr7 Date: Fri, 12 Aug 2022 14:15:07 +0530 Subject: [PATCH 13/17] try fix --- lightning_examples/datamodules/.meta.yml | 2 +- lightning_examples/mnist-tpu-training/mnist-tpu.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lightning_examples/datamodules/.meta.yml b/lightning_examples/datamodules/.meta.yml index 93a86ce8a..11cb43b60 100644 --- a/lightning_examples/datamodules/.meta.yml +++ b/lightning_examples/datamodules/.meta.yml @@ -8,7 +8,7 @@ description: This notebook will walk you through how to start using Datamodules. the release of `pytorch-lightning` version 0.9.0, we have included a new class called `LightningDataModule` to help you decouple data related hooks from your `LightningModule`. The most up-to-date documentation on datamodules can be found - [here](https://pytorch-lightning.readthedocs.io/en/stable/extensions/datamodules.html). + [here](https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html). requirements: - torchvision accelerator: diff --git a/lightning_examples/mnist-tpu-training/mnist-tpu.py b/lightning_examples/mnist-tpu-training/mnist-tpu.py index eab46c80c..35f018ee6 100644 --- a/lightning_examples/mnist-tpu-training/mnist-tpu.py +++ b/lightning_examples/mnist-tpu-training/mnist-tpu.py @@ -23,7 +23,7 @@ # ### Defining The `MNISTDataModule` # # Below we define `MNISTDataModule`. You can learn more about datamodules -# in [docs](https://pytorch-lightning.readthedocs.io/en/stable/extensions/datamodules.html). +# in [docs](https://pytorch-lightning.readthedocs.io/en/stable/data/datamodule.html). # %% From f7522fa9689bda66b95cddd792c57ff7a55e6391 Mon Sep 17 00:00:00 2001 From: rohitgr7 Date: Fri, 12 Aug 2022 14:47:19 +0530 Subject: [PATCH 14/17] try fix --- lightning_examples/datamodules/datamodules.py | 9 +++------ lightning_examples/mnist-tpu-training/mnist-tpu.py | 7 ++----- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/lightning_examples/datamodules/datamodules.py b/lightning_examples/datamodules/datamodules.py index b5731765b..eff583d37 100644 --- a/lightning_examples/datamodules/datamodules.py +++ b/lightning_examples/datamodules/datamodules.py @@ -146,7 +146,7 @@ def test_dataloader(self): # 1. ```__init__``` # - Takes in a `data_dir` arg that points to where you have downloaded/wish to download the MNIST dataset. # - Defines a transform that will be applied across train, val, and test dataset splits. -# - Defines default `self.dims`, which is a tuple returned from `datamodule.size()` that can help you initialize models. +# - Defines default `self.dims`. # # # 2. ```prepare_data``` @@ -176,9 +176,6 @@ def __init__(self, data_dir: str = PATH_DATASETS): ] ) - # self.dims is returned when you call dm.size() - # Setting default dims here because we know them. - # Could optionally be assigned dynamically in dm.setup() self.dims = (1, 28, 28) self.num_classes = 10 @@ -274,7 +271,7 @@ def configure_optimizers(self): # Init DataModule dm = MNISTDataModule() # Init model from datamodule's attributes -model = LitModel(*dm.size(), dm.num_classes) +model = LitModel(*dm.dims, dm.num_classes) # Init trainer trainer = Trainer( max_epochs=3, @@ -341,7 +338,7 @@ def test_dataloader(self): # %% dm = CIFAR10DataModule() -model = LitModel(*dm.size(), dm.num_classes, hidden_size=256) +model = LitModel(*dm.dims, dm.num_classes, hidden_size=256) tqdm_progress_bar = TQDMProgressBar(refresh_rate=20) trainer = Trainer( max_epochs=5, diff --git a/lightning_examples/mnist-tpu-training/mnist-tpu.py b/lightning_examples/mnist-tpu-training/mnist-tpu.py index 35f018ee6..4be9b00ea 100644 --- a/lightning_examples/mnist-tpu-training/mnist-tpu.py +++ b/lightning_examples/mnist-tpu-training/mnist-tpu.py @@ -33,9 +33,6 @@ def __init__(self, data_dir: str = "./"): self.data_dir = data_dir self.transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) - # self.dims is returned when you call dm.size() - # Setting default dims here because we know them. - # Could optionally be assigned dynamically in dm.setup() self.dims = (1, 28, 28) self.num_classes = 10 @@ -151,7 +148,7 @@ def configure_optimizers(self): # Init DataModule dm = MNISTDataModule() # Init model from datamodule's attributes -model = LitModel(*dm.size(), dm.num_classes) +model = LitModel(*dm.dims, dm.num_classes) # Init trainer trainer = Trainer( max_epochs=3, @@ -170,7 +167,7 @@ def configure_optimizers(self): # Init DataModule dm = MNISTDataModule() # Init model from datamodule's attributes -model = LitModel(*dm.size(), dm.num_classes) +model = LitModel(*dm.dims, dm.num_classes) # Init trainer trainer = Trainer( max_epochs=3, From 43451b7beff6c016ed996b273b216ad396e834a8 Mon Sep 17 00:00:00 2001 From: rohitgr7 Date: Fri, 12 Aug 2022 20:36:27 +0530 Subject: [PATCH 15/17] fix link --- lightning_examples/mnist-hello-world/hello-world.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lightning_examples/mnist-hello-world/hello-world.py b/lightning_examples/mnist-hello-world/hello-world.py index ee9303467..5536a070d 100644 --- a/lightning_examples/mnist-hello-world/hello-world.py +++ b/lightning_examples/mnist-hello-world/hello-world.py @@ -94,7 +94,7 @@ def configure_optimizers(self): # - If you don't mind loading all your datasets at once, you can set up a condition to allow for both 'fit' related setup and 'test' related setup to run whenever `None` is passed to `stage` (or ignore it altogether and exclude any conditionals). # - **Note this runs across all GPUs and it *is* safe to make state assignments here** # -# 3. [x_dataloader()](https://pytorch-lightning.readthedocs.io/en/stable/api_references.html#core-api) ♻️ +# 3. [x_dataloader()](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.core.hooks.DataHooks.html#pytorch_lightning.core.hooks.DataHooks.train_dataloader) ♻️ # - `train_dataloader()`, `val_dataloader()`, and `test_dataloader()` all return PyTorch `DataLoader` instances that are created by wrapping their respective datasets that we prepared in `setup()` From 1de1dec2dcc1e8b46e4615913bd58af42e33712c Mon Sep 17 00:00:00 2001 From: Daniel Dale Date: Fri, 12 Aug 2022 11:10:00 -0700 Subject: [PATCH 16/17] upgrade pip to avoid rich triggering a logging error prompting to upgrade pip --- .azure/ipynb-publish.yml | 1 + .azure/ipynb-tests.yml | 1 + 2 files changed, 2 insertions(+) diff --git a/.azure/ipynb-publish.yml b/.azure/ipynb-publish.yml index 59441bf4b..f1dbe08c4 100644 --- a/.azure/ipynb-publish.yml +++ b/.azure/ipynb-publish.yml @@ -64,6 +64,7 @@ jobs: set -e sudo apt-get update -q --fix-missing sudo apt install -y tree ffmpeg + pip install --upgrade pip pip --version pip install --requirement requirements.txt pip install --requirement requirements/data.txt diff --git a/.azure/ipynb-tests.yml b/.azure/ipynb-tests.yml index bbb9f2fd3..11a6a8977 100644 --- a/.azure/ipynb-tests.yml +++ b/.azure/ipynb-tests.yml @@ -44,6 +44,7 @@ jobs: set -e sudo apt-get update -q --fix-missing sudo apt install -y tree ffmpeg + pip install --upgrade pip pip --version pip install --requirement requirements.txt pip install --requirement requirements/data.txt From 0472d78f078aaf20367dd21a8361b944c1d88015 Mon Sep 17 00:00:00 2001 From: rohitgr7 Date: Sat, 13 Aug 2022 17:12:55 +0530 Subject: [PATCH 17/17] shoot