-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
plGeneric label for PyTorch Lightning packageGeneric label for PyTorch Lightning packagequestionFurther information is requestedFurther information is requestedwaiting on authorWaiting on user action, correction, or updateWaiting on user action, correction, or update
Description
π Bug
`Trainer.fit` stopped: `max_epochs=1` reached.
Fails to run trainer fit
(py3.10) β Deepgit:(main) β python torch_nn.py
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
---------------------------------------
0 | encoder | Sequential | 50.4 K
1 | decoder | Sequential | 51.2 K
---------------------------------------
101 K Trainable params
0 Non-trainable params
101 K Total params
0.407 Total estimated model params size (MB)
/opt/miniconda3/envs/py3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:225: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Epoch 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:00<00:00, 324.18it/s, loss=0.0663, v_num=13]
`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:00<00:00, 318.34it/s, loss=0.0663, v_num=13]
(py3.10) β Deepgit:(main) β To Reproduce
import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import pytorch_lightning as pl
# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))
# define the LightningModule
class LitAutoEncoder(pl.LightningModule):
def __init__(self, encoder, decoder):
super().__init__()
self.encoder = encoder
self.decoder = decoder
def training_step(self, batch, batch_idx):
# training_step defines the train loop.
# it is independent of forward
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
# Logging to TensorBoard by default
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)
# setup data
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)
# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = pl.Trainer(limit_train_batches=100, max_epochs=1)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
Expected behavior
Environment
- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
- pytorch_lightning==1.7.3
- torch==1.12.1
- torchaudio==0.12.1
- torchvision==0.13.1
- PyTorch Lightning Version (e.g., 1.5.0):
- Lightning App Version (e.g., 0.5.2):
- PyTorch Version (e.g., 1.10):
- Python version (e.g., 3.9): 3.10
- OS (e.g., Linux): macOS Monterey
- CUDA/cuDNN version:
- GPU models and configuration:
- How you installed PyTorch (
conda,pip, source): pip - If compiling from source, the output of
torch.__config__.show(): - Running environment of LightningApp (e.g. local, cloud):
- Any other relevant information:
Additional context
Metadata
Metadata
Assignees
Labels
plGeneric label for PyTorch Lightning packageGeneric label for PyTorch Lightning packagequestionFurther information is requestedFurther information is requestedwaiting on authorWaiting on user action, correction, or updateWaiting on user action, correction, or update