Stability and additional improvements
App
Added
- Added a possibility to set up basic authentication for Lightning apps (#16105)
Changed
- The LoadBalancer now uses internal ip + port instead of URL exposed (#16119)
- Added support for logging in different trainer stages with
DeviceStatsMonitor
(#16002) - Changed
lightning_app.components.serve.gradio
tolightning_app.components.serve.gradio_server
(#16201) - Made cluster creation/deletion async by default (#16185)
Fixed
- Fixed not being able to run multiple lightning apps locally due to port collision (#15819)
- Avoid
relpath
bug on Windows (#16164) - Avoid using the deprecated
LooseVersion
(#16162) - Porting fixes to autoscaler component (#16249)
- Fixed a bug where
lightning login
with env variables would not correctly save the credentials (#16339)
Fabric
Added
- Added
Fabric.launch()
to programmatically launch processes (e.g. in Jupyter notebook) (#14992) - Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the
run
method (#14992) - Added
Fabric.setup_module()
andFabric.setup_optimizers()
to support strategies that need to set up the model before an optimizer can be created (#15185) - Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
- Added
lightning_fabric.accelerators.find_usable_cuda_devices
utility function (#16147) - Added basic support for LightningModules (#16048)
- Added support for managing callbacks via
Fabric(callbacks=...)
and emitting events throughFabric.call()
(#16074) - Added Logger support (#16121)
- Added
Fabric(loggers=...)
to support different Logger frameworks in Fabric - Added
Fabric.log
for logging scalars using multiple loggers - Added
Fabric.log_dict
for logging a dictionary of multiple metrics at once - Added
Fabric.loggers
andFabric.logger
attributes to access the individual logger instances - Added support for calling
self.log
andself.log_dict
in a LightningModule when using Fabric - Added access to
self.logger
andself.loggers
in a LightningModule when using Fabric
- Added
- Added
lightning_fabric.loggers.TensorBoardLogger
(#16121) - Added
lightning_fabric.loggers.CSVLogger
(#16346) - Added support for a consistent
.zero_grad(set_to_none=...)
on the wrapped optimizer regardless of which strategy is used (#16275)
Changed
- Renamed the class
LightningLite
toFabric
(#15932, #15938) - The
Fabric.run()
method is no longer abstract (#14992) - The
XLAStrategy
now inherits fromParallelStrategy
instead ofDDPSpawnStrategy
(#15838) - Merged the implementation of
DDPSpawnStrategy
intoDDPStrategy
and removedDDPSpawnStrategy
(#14952) - The dataloader wrapper returned from
.setup_dataloaders()
now calls.set_epoch()
on the distributed sampler if one is used (#16101) - Renamed
Strategy.reduce
toStrategy.all_reduce
in all strategies (#16370) - When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set (#16388)
Removed
- Removed support for FairScale's sharded training (
strategy='ddp_sharded'|'ddp_sharded_spawn'
). Use Fully-Sharded Data Parallel instead (strategy='fsdp'
) (#16329)
Fixed
- Restored sampling parity between PyTorch and Fabric dataloaders when using the
DistributedSampler
(#16101) - Fixes an issue where the error message wouldn't tell the user the real value that was passed through the CLI (#16334)
PyTorch
Added
- Added support for native logging of
MetricCollection
with enabled compute groups (#15580) - Added support for custom artifact names in
pl.loggers.WandbLogger
(#16173) - Added support for DDP with
LRFinder
(#15304) - Added utilities to migrate checkpoints from one Lightning version to another (#15237)
- Added support to upgrade all checkpoints in a folder using the
pl.utilities.upgrade_checkpoint
script (#15333) - Add an axes argument
ax
to the.lr_find().plot()
to enable writing to a user-defined axes in a matplotlib figure (#15652) - Added
log_model
parameter toMLFlowLogger
(#9187) - Added a check to validate that wrapped FSDP models are used while initializing optimizers (#15301)
- Added a warning when
self.log(..., logger=True)
is called without a configured logger (#15814) - Added support for colossalai 0.1.11 (#15888)
- Added
LightningCLI
support for optimizer and learning schedulers via callable type dependency injection (#15869) - Added support for activation checkpointing for the
DDPFullyShardedNativeStrategy
strategy (#15826) - Added the option to set
DDPFullyShardedNativeStrategy(cpu_offload=True|False)
via bool instead of needing to pass a configuration object (#15832) - Added info message for Ampere CUDA GPU users to enable tf32 matmul precision (#16037)
- Added support for returning optimizer-like classes in
LightningModule.configure_optimizers
(#16189)
Changed
- Switch from
tensorboard
totensorboardx
inTensorBoardLogger
(#15728) - From now on, Lightning Trainer and
LightningModule.load_from_checkpoint
automatically upgrade the loaded checkpoint if it was produced in an old version of Lightning (#15237) Trainer.{validate,test,predict}(ckpt_path=...)
no longer restores theTrainer.global_step
andtrainer.current_epoch
value from the checkpoints - From now on, onlyTrainer.fit
will restore this value (#15532)- The
ModelCheckpoint.save_on_train_epoch_end
attribute is now computed dynamically every epoch, accounting for changes to the validation dataloaders (#15300) - The Trainer now raises an error if it is given multiple stateful callbacks of the same time with colliding state keys (#15634)
MLFlowLogger
now logs hyperparameters and metrics in batched API calls (#15915)- Overriding the
on_train_batch_{start,end}
hooks in conjunction with taking adataloader_iter
in thetraining_step
no longer errors out and instead shows a warning (#16062) - Move
tensorboardX
to extra dependencies. Use theCSVLogger
by default (#16349) - Drop PyTorch 1.9 support (#15347)
Deprecated
- Deprecated
description
,env_prefix
andenv_parse
parameters inLightningCLI.__init__
in favour of giving them throughparser_kwargs
(#15651) - Deprecated
pytorch_lightning.profiler
in favor ofpytorch_lightning.profilers
(#16059) - Deprecated
Trainer(auto_select_gpus=...)
in favor ofpytorch_lightning.accelerators.find_usable_cuda_devices
(#16147) - Deprecated
pytorch_lightning.tuner.auto_gpu_select.{pick_single_gpu,pick_multiple_gpus}
in favor ofpytorch_lightning.accelerators.find_usable_cuda_devices
(#16147) nvidia/apex
deprecation (#16039)- Deprecated
pytorch_lightning.plugins.NativeMixedPrecisionPlugin
in favor ofpytorch_lightning.plugins.MixedPrecisionPlugin
- Deprecated the
LightningModule.optimizer_step(using_native_amp=...)
argument - Deprecated the
Trainer(amp_backend=...)
argument - Deprecated the
Trainer.amp_backend
property - Deprecated the
Trainer(amp_level=...)
argument - Deprecated the
pytorch_lightning.plugins.ApexMixedPrecisionPlugin
class - Deprecates the
pytorch_lightning.utilities.enums.AMPType
enum - Deprecates the
DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)
arguments
- Deprecated
horovod
deprecation (#16141)- Deprecated
Trainer(strategy="horovod")
- Deprecated the
HorovodStrategy
class
- Deprecated
- Deprecated
pytorch_lightning.lite.LightningLite
in favor oflightning.fabric.Fabric
(#16314) FairScale
deprecation (in favor of PyTorch's FSDP implementation) (#16353)- Deprecated the
pytorch_lightning.overrides.fairscale.LightningShardedDataParallel
class - Deprecated the
pytorch_lightning.plugins.precision.fully_sharded_native_amp.FullyShardedNativeMixedPrecisionPlugin
class - Deprecated the
pytorch_lightning.plugins.precision.sharded_native_amp.ShardedNativeMixedPrecisionPlugin
class - Deprecated the
pytorch_lightning.strategies.fully_sharded.DDPFullyShardedStrategy
class - Deprecated the
pytorch_lightning.strategies.sharded.DDPShardedStrategy
class - Deprecated the
pytorch_lightning.strategies.sharded_spawn.DDPSpawnShardedStrategy
class
- Deprecated the
Removed
- Removed deprecated
pytorch_lightning.utilities.memory.get_gpu_memory_map
in favor ofpytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats
(#15617) - Temporarily removed support for Hydra multi-run (#15737)
- Removed deprecated
pytorch_lightning.profiler.base.AbstractProfiler
in favor ofpytorch_lightning.profilers.profiler.Profiler
(#15637) - Removed deprecated
pytorch_lightning.profiler.base.BaseProfiler
in favor ofpytorch_lightning.profilers.profiler.Profiler
(#15637) - Removed deprecated code in
pytorch_lightning.utilities.meta
(#16038) - Removed the deprecated
LightningDeepSpeedModule
(#16041) - Removed the deprecated
pytorch_lightning.accelerators.GPUAccelerator
in favor ofpytorch_lightning.accelerators.CUDAAccelerator
(#16050) - Removed the deprecated
pytorch_lightning.profiler.*
classes in favor ofpytorch_lightning.profilers
(#16059) - Removed the deprecated
pytorch_lightning.utilities.cli
module in favor ofpytorch_lightning.cli
(#16116) - Removed the deprecated
pytorch_lightning.loggers.base
module in favor ofpytorch_lightning.loggers.logger
(#16120) - Removed the deprecated
pytorch_lightning.loops.base
module in favor ofpytorch_lightning.loops.loop
(#16142) - Removed the deprecated
pytorch_lightning.core.lightning
module in favor ofpytorch_lightning.core.module
(#16318) - Removed the deprecated
pytorch_lightning.callbacks.base
module in favor ofpytorch_lightning.callbacks.callback
(#16319) - Removed the deprecated
Trainer.reset_train_val_dataloaders()
in favor ofTrainer.reset_{train,val}_dataloader
(#16131) - Removed support for
LightningCLI(seed_everything_default=None)
(#16131) - Removed support in LightningLite for FairScale's sharded training (
strategy='ddp_sharded'|'ddp_sharded_spawn'
). Use Fully-Sharded Data Parallel instead (strategy='fsdp'
) (#16329)
Fixed
- Enhanced
reduce_boolean_decision
to accommodateany
-analogous semantics expected by theEarlyStopping
callback (#15253) - Fixed the incorrect optimizer step synchronization when running across multiple TPU devices (#16020)
- Fixed a type error when dividing the chunk size in the ColossalAI strategy (#16212)
- Fixed bug where the
interval
key of the scheduler would be ignored during manual optimization, making the LearningRateMonitor callback fail to log the learning rate (#16308) - Fixed an issue with
MLFlowLogger
not finalizing correctly when status code 'finished' was passed (#16340)
Contributors
@1SAA, @akihironitta, @AlessioQuercia, @awaelchli, @bipinKrishnan, @Borda, @carmocca, @dmitsf, @erhoo82, @ethanwharris, @Forbu, @hhsecond, @justusschock, @lantiga, @lightningforever, @Liyang90, @manangoel99, @mauvilsa, @nicolai86, @nohalon, @rohitgr7, @schmidt-jake, @speediedan, @yMayanand
If we forgot someone due to not matching commit email with GitHub account, let us know :]