Lightning-AI pytorch-lightning Discussions
Pinned Discussions
Sort by:
Latest activity
Label
Categories, most helpful, and community links
Categories
Community links
Discussions
-
You must be logged in to vote ⚡ Gist on how to use Schedule-Free in Lightning
CompRhys askedApr 10, 2024 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote 💬 -
You must be logged in to vote ⚡ Disabling Validation Progress Bar
ludwigwinkler askedOct 6, 2022 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote 🤖 -
You must be logged in to vote ⚡ OOM after automatic batch size finder
JonathanDZiegler askedApr 25, 2024 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote 💭 -
You must be logged in to vote ⚡ -
You must be logged in to vote 💬 -
You must be logged in to vote ⚡ how are the samples picked when
trainer: argumentlimit_train_batches
< 1.0?miccio-dk askedNov 22, 2021 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote 🤖 -
You must be logged in to vote 🤖 -
You must be logged in to vote 🤖 -
You must be logged in to vote 🤖 Proper way to log things when using DDP
strategy: ddpDistributedDataParallel -
You must be logged in to vote 💬 -
You must be logged in to vote 🤖 DDP: NCCL " The server socket has failed to bind to..."
strategy: ddpDistributedDataParallel -
You must be logged in to vote ⚡ Using Mlflow logger
logger: mlflowJCardoso9 askedDec 21, 2021 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote 🤖 When I set num_works> 0, there is a error Producer process has been terminated before all shared CUDA tensors released
accelerator: cudaCompute Unified Device Architecture GPU -
You must be logged in to vote 🤖 How to scale learning rate with batch size for DDP training?
distributedGeneric distributed-related topic strategy: ddpDistributedDataParallel -
You must be logged in to vote ⚡ Gradient checkpointing with DDP in a loop
shivammehta25 askedNov 11, 2021 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote ⚡ The Trainer freeze when I use the logger
xugaoqi1993 askedJan 16, 2023 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote ⚡ How can I use median as the reduction function for loss aggregation?
dempsey-ryan askedApr 15, 2024 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote ⚡ "No hparams data was found" in tensorboard
adosar askedApr 14, 2024 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote ⚡ -
You must be logged in to vote ⚡ LightningCLI fails with num_workers > 0 and custom dataset
adosar askedApr 11, 2024 in Lightning Trainer API: Trainer, LightningModule, LightningDataModule · Unanswered -
You must be logged in to vote 😎