Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSVLogger fails if save_dir is an s3 path #16196

Closed
turian opened this issue Dec 25, 2022 · 7 comments · Fixed by #16880
Closed

CSVLogger fails if save_dir is an s3 path #16196

turian opened this issue Dec 25, 2022 · 7 comments · Fixed by #16880
Labels
bug Something isn't working logger: csv
Milestone

Comments

@turian
Copy link
Contributor

turian commented Dec 25, 2022

Bug description

Cloud checkpoints are cool! But I also want CSVLogger to periodically write to cloud storage. This doesn't work.

Related bug #16195 . See 'More info' at the bottom of this issue.

There are some related issues:
#14325
#5935
#11769
#15539
#2318
#2161
but I haven't found this specifically.

How to reproduce the bug

Here is a google colab that replicates this and a related bag. I share the code for both because it's easier to configure the AWS credentials and see both bugs simultaneously.

Copying and pasting the most important bit (but see the colab for a full minimal replication):

from pytorch_lightning.loggers import WandbLogger

def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    test_data = DataLoader(RandomDataset(32, 64), batch_size=2)

    logger = WandbLogger(
        project="boringbug",
        log_model="all",
    )

    model = BoringModel()
    trainer = Trainer(
        limit_train_batches=1,
        limit_val_batches=1,
        limit_test_batches=1,
        num_sanity_val_steps=0,
        max_epochs=1,
        enable_model_summary=False,
        logger=logger,
        default_root_dir = f"{BORING_BUCKET}/wandbtest/"
    )
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
    trainer.test(model, dataloaders=test_data)

run()

Error messages and logs

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.utilities.rank_zero:`Trainer(limit_train_batches=1)` was configured so 1 batch per epoch will be used.
INFO:pytorch_lightning.utilities.rank_zero:`Trainer(limit_val_batches=1)` was configured so 1 batch will be used.
INFO:pytorch_lightning.utilities.rank_zero:`Trainer(limit_test_batches=1)` was configured so 1 batch will be used.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     37         else:
---> 38             return trainer_fn(*args, **kwargs)
     39 

16 frames
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py in _fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    644         )
--> 645         self._run(model, ckpt_path=self.ckpt_path)
    646 

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py in _run(self, model, ckpt_path)
   1085 
-> 1086         self._log_hyperparams()
   1087 

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py in _log_hyperparams(self)
   1155             logger.log_graph(self.lightning_module)
-> 1156             logger.save()
   1157 

/usr/local/lib/python3.8/dist-packages/lightning_utilities/core/rank_zero.py in wrapped_fn(*args, **kwargs)
     23         if rank == 0:
---> 24             return fn(*args, **kwargs)
     25         return None

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loggers/csv_logs.py in save(self)
    206         super().save()
--> 207         self.experiment.save()
    208 

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loggers/csv_logs.py in save(self)
     86         hparams_file = os.path.join(self.log_dir, self.NAME_HPARAMS_FILE)
---> 87         save_hparams_to_yaml(hparams_file, self.hparams)
     88 

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/saving.py in save_hparams_to_yaml(config_yaml, hparams, use_omegaconf)
    378     if not fs.isdir(os.path.dirname(config_yaml)):
--> 379         raise RuntimeError(f"Missing folder: {os.path.dirname(config_yaml)}.")
    380 

RuntimeError: Missing folder: s3://boringbucketjpt/csvloggerdoesntwork/lightning_logs/version_1.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-025edeeafe89> in <module>
     24     trainer.test(model, dataloaders=test_data)
     25 
---> 26 run()

<ipython-input-11-025edeeafe89> in run()
     21         default_root_dir = f"{BORING_BUCKET}/csvloggertest/"
     22     )
---> 23     trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
     24     trainer.test(model, dataloaders=test_data)
     25 

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    601             raise TypeError(f"`Trainer.fit()` requires a `LightningModule`, got: {model.__class__.__qualname__}")
    602         self.strategy._lightning_module = model
--> 603         call._call_and_handle_interrupt(
    604             self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    605         )

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     60         trainer._call_callback_hooks("on_exception", exception)
     61         for logger in trainer.loggers:
---> 62             logger.finalize("failed")
     63         trainer._teardown()
     64         # teardown might access the stage so we reset it after

/usr/local/lib/python3.8/dist-packages/lightning_utilities/core/rank_zero.py in wrapped_fn(*args, **kwargs)
     22             raise RuntimeError("The `rank_zero_only.rank` needs to be set before use")
     23         if rank == 0:
---> 24             return fn(*args, **kwargs)
     25         return None
     26 

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loggers/csv_logs.py in finalize(self, status)
    213             # initialized there
    214             return
--> 215         self.save()
    216 
    217     @property

/usr/local/lib/python3.8/dist-packages/lightning_utilities/core/rank_zero.py in wrapped_fn(*args, **kwargs)
     22             raise RuntimeError("The `rank_zero_only.rank` needs to be set before use")
     23         if rank == 0:
---> 24             return fn(*args, **kwargs)
     25         return None
     26 

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loggers/csv_logs.py in save(self)
    205     def save(self) -> None:
    206         super().save()
--> 207         self.experiment.save()
    208 
    209     @rank_zero_only

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loggers/csv_logs.py in save(self)
     85         """Save recorded hparams and metrics into files."""
     86         hparams_file = os.path.join(self.log_dir, self.NAME_HPARAMS_FILE)
---> 87         save_hparams_to_yaml(hparams_file, self.hparams)
     88 
     89         if not self.metrics:

/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/saving.py in save_hparams_to_yaml(config_yaml, hparams, use_omegaconf)
    377     fs = get_filesystem(config_yaml)
    378     if not fs.isdir(os.path.dirname(config_yaml)):
--> 379         raise RuntimeError(f"Missing folder: {os.path.dirname(config_yaml)}.")
    380 
    381     # convert Namespace or AD to dict

RuntimeError: Missing folder: s3://boringbucketjpt/csvloggerdoesntwork/lightning_logs/version_1.

Environment

* CUDA:
	- GPU:
		- Tesla T4
	- available:         True
	- version:           11.6
* Lightning:
	- lightning-utilities: 0.5.0
	- pytorch-lightning: 1.8.6
	- torch:             1.13.0+cu116
	- torchaudio:        0.13.0+cu116
	- torchmetrics:      0.11.0
	- torchsummary:      1.5.1
	- torchtext:         0.14.0
	- torchvision:       0.14.0+cu116
* Packages:
	- absl-py:           1.3.0
	- aeppl:             0.0.33
	- aesara:            2.7.9
	- aiobotocore:       2.4.2
	- aiohttp:           3.8.3
	- aioitertools:      0.11.0
	- aiosignal:         1.3.1
	- alabaster:         0.7.12
	- albumentations:    1.2.1
	- altair:            4.2.0
	- appdirs:           1.4.4
	- arviz:             0.12.1
	- astor:             0.8.1
	- astropy:           4.3.1
	- astunparse:        1.6.3
	- async-timeout:     4.0.2
	- atari-py:          0.2.9
	- atomicwrites:      1.4.1
	- attrs:             22.1.0
	- audioread:         3.0.0
	- autograd:          1.5
	- awscli:            1.25.60
	- babel:             2.11.0
	- backcall:          0.2.0
	- beautifulsoup4:    4.6.3
	- bleach:            5.0.1
	- blis:              0.7.9
	- bokeh:             2.3.3
	- boto3:             1.24.59
	- botocore:          1.27.59
	- branca:            0.6.0
	- bs4:               0.0.1
	- cachecontrol:      0.12.11
	- cachetools:        5.2.0
	- catalogue:         2.0.8
	- certifi:           2022.12.7
	- cffi:              1.15.1
	- cftime:            1.6.2
	- chardet:           3.0.4
	- charset-normalizer: 2.1.1
	- click:             7.1.2
	- clikit:            0.6.2
	- cloudpickle:       1.5.0
	- cmake:             3.22.6
	- cmdstanpy:         1.0.8
	- colorama:          0.3.7
	- colorcet:          3.0.1
	- colorlover:        0.3.0
	- community:         1.0.0b1
	- confection:        0.0.3
	- cons:              0.4.5
	- contextlib2:       0.5.5
	- convertdate:       2.4.0
	- crashtest:         0.3.1
	- crcmod:            1.7
	- cryptography:      38.0.4
	- cufflinks:         0.17.3
	- cupy-cuda11x:      11.0.0
	- cvxopt:            1.3.0
	- cvxpy:             1.2.2
	- cycler:            0.11.0
	- cymem:             2.0.7
	- cython:            0.29.32
	- daft:              0.0.4
	- dask:              2022.2.1
	- datascience:       0.17.5
	- db-dtypes:         1.0.5
	- debugpy:           1.0.0
	- decorator:         4.4.2
	- defusedxml:        0.7.1
	- descartes:         1.1.0
	- dill:              0.3.6
	- distributed:       2022.2.1
	- dlib:              19.24.0
	- dm-tree:           0.1.7
	- dnspython:         2.2.1
	- docker-pycreds:    0.4.0
	- docutils:          0.16
	- dopamine-rl:       1.0.5
	- earthengine-api:   0.1.335
	- easydict:          1.10
	- ecos:              2.0.10
	- editdistance:      0.5.3
	- en-core-web-sm:    3.4.1
	- entrypoints:       0.4
	- ephem:             4.1.3
	- et-xmlfile:        1.1.0
	- etils:             0.9.0
	- etuples:           0.3.8
	- fa2:               0.3.5
	- fastai:            2.7.10
	- fastcore:          1.5.27
	- fastdownload:      0.0.7
	- fastdtw:           0.3.4
	- fastjsonschema:    2.16.2
	- fastprogress:      1.0.3
	- fastrlock:         0.8.1
	- feather-format:    0.4.1
	- filelock:          3.8.2
	- firebase-admin:    5.3.0
	- fix-yahoo-finance: 0.0.22
	- flask:             1.1.4
	- flatbuffers:       1.12
	- folium:            0.12.1.post1
	- frozenlist:        1.3.3
	- fsspec:            2022.11.0
	- future:            0.16.0
	- gast:              0.4.0
	- gdal:              2.2.2
	- gdown:             4.4.0
	- gensim:            3.6.0
	- geographiclib:     1.52
	- geopy:             1.17.0
	- gin-config:        0.5.0
	- gitdb:             4.0.10
	- gitpython:         3.1.29
	- glob2:             0.7
	- google:            2.0.3
	- google-api-core:   2.8.2
	- google-api-python-client: 1.12.11
	- google-auth:       2.15.0
	- google-auth-httplib2: 0.0.4
	- google-auth-oauthlib: 0.4.6
	- google-cloud-bigquery: 3.3.6
	- google-cloud-bigquery-storage: 2.16.2
	- google-cloud-core: 2.3.2
	- google-cloud-datastore: 2.9.0
	- google-cloud-firestore: 2.7.2
	- google-cloud-language: 2.6.1
	- google-cloud-storage: 2.5.0
	- google-cloud-translate: 3.8.4
	- google-colab:      1.0.0
	- google-crc32c:     1.5.0
	- google-pasta:      0.2.0
	- google-resumable-media: 2.4.0
	- googleapis-common-protos: 1.57.0
	- googledrivedownloader: 0.4
	- graphviz:          0.10.1
	- greenlet:          2.0.1
	- grpcio:            1.51.1
	- grpcio-status:     1.48.2
	- gspread:           3.4.2
	- gspread-dataframe: 3.0.8
	- gym:               0.25.2
	- gym-notices:       0.0.8
	- h5py:              3.1.0
	- heapdict:          1.0.1
	- hijri-converter:   2.2.4
	- holidays:          0.17.2
	- holoviews:         1.14.9
	- html5lib:          1.0.1
	- httpimport:        0.5.18
	- httplib2:          0.17.4
	- httpstan:          4.6.1
	- humanize:          0.5.1
	- hyperopt:          0.1.2
	- idna:              2.10
	- imageio:           2.9.0
	- imagesize:         1.4.1
	- imbalanced-learn:  0.8.1
	- imblearn:          0.0
	- imgaug:            0.4.0
	- importlib-metadata: 5.1.0
	- importlib-resources: 5.10.1
	- imutils:           0.5.4
	- inflect:           2.1.0
	- intel-openmp:      2022.2.1
	- intervaltree:      2.1.0
	- ipykernel:         5.3.4
	- ipython:           7.9.0
	- ipython-genutils:  0.2.0
	- ipython-sql:       0.3.9
	- ipywidgets:        7.7.1
	- itsdangerous:      1.1.0
	- jax:               0.3.25
	- jaxlib:            0.3.25+cuda11.cudnn805
	- jieba:             0.42.1
	- jinja2:            2.11.3
	- jmespath:          0.9.3
	- joblib:            1.2.0
	- jpeg4py:           0.1.4
	- jsonschema:        4.3.3
	- jupyter-client:    6.1.12
	- jupyter-console:   6.1.0
	- jupyter-core:      5.1.0
	- jupyterlab-widgets: 3.0.4
	- kaggle:            1.5.12
	- kapre:             0.3.7
	- keras:             2.9.0
	- keras-preprocessing: 1.1.2
	- keras-vis:         0.4.1
	- kiwisolver:        1.4.4
	- korean-lunar-calendar: 0.3.1
	- langcodes:         3.3.0
	- libclang:          14.0.6
	- librosa:           0.8.1
	- lightgbm:          2.2.3
	- lightning-utilities: 0.5.0
	- llvmlite:          0.39.1
	- lmdb:              0.99
	- locket:            1.0.0
	- logical-unification: 0.4.5
	- lunarcalendar:     0.0.9
	- lxml:              4.9.2
	- markdown:          3.4.1
	- markupsafe:        2.0.1
	- marshmallow:       3.19.0
	- matplotlib:        3.2.2
	- matplotlib-venn:   0.11.7
	- minikanren:        1.0.3
	- missingno:         0.5.1
	- mistune:           0.8.4
	- mizani:            0.7.3
	- mkl:               2019.0
	- mlxtend:           0.14.0
	- more-itertools:    9.0.0
	- moviepy:           0.2.3.5
	- mpmath:            1.2.1
	- msgpack:           1.0.4
	- multidict:         6.0.3
	- multipledispatch:  0.6.0
	- multitasking:      0.0.11
	- murmurhash:        1.0.9
	- music21:           5.5.0
	- natsort:           5.5.0
	- nbconvert:         5.6.1
	- nbformat:          5.7.0
	- netcdf4:           1.6.2
	- networkx:          2.8.8
	- nibabel:           3.0.2
	- nltk:              3.7
	- notebook:          5.7.16
	- numba:             0.56.4
	- numexpr:           2.8.4
	- numpy:             1.21.6
	- oauth2client:      4.1.3
	- oauthlib:          3.2.2
	- okgrade:           0.4.3
	- olefile:           0.45.1
	- opencv-contrib-python: 4.6.0.66
	- opencv-python:     4.6.0.66
	- opencv-python-headless: 4.6.0.66
	- openpyxl:          3.0.10
	- opt-einsum:        3.3.0
	- osqp:              0.6.2.post0
	- packaging:         21.3
	- palettable:        3.3.0
	- pandas:            1.3.5
	- pandas-datareader: 0.9.0
	- pandas-gbq:        0.17.9
	- pandas-profiling:  1.4.1
	- pandocfilters:     1.5.0
	- panel:             0.12.1
	- param:             1.12.3
	- parso:             0.8.3
	- partd:             1.3.0
	- pastel:            0.2.1
	- pathlib:           1.0.1
	- pathtools:         0.1.2
	- pathy:             0.10.1
	- patsy:             0.5.3
	- pep517:            0.13.0
	- pexpect:           4.8.0
	- pickleshare:       0.7.5
	- pillow:            7.1.2
	- pip:               21.1.3
	- pip-tools:         6.2.0
	- platformdirs:      2.6.0
	- plotly:            5.5.0
	- plotnine:          0.8.0
	- pluggy:            0.7.1
	- pooch:             1.6.0
	- portpicker:        1.3.9
	- prefetch-generator: 1.0.3
	- preshed:           3.0.8
	- prettytable:       3.5.0
	- progressbar2:      3.38.0
	- prometheus-client: 0.15.0
	- promise:           2.3
	- prompt-toolkit:    2.0.10
	- prophet:           1.1.1
	- proto-plus:        1.22.1
	- protobuf:          3.19.6
	- psutil:            5.4.8
	- psycopg2:          2.9.5
	- ptyprocess:        0.7.0
	- py:                1.11.0
	- pyarrow:           9.0.0
	- pyasn1:            0.4.8
	- pyasn1-modules:    0.2.8
	- pycocotools:       2.0.6
	- pycparser:         2.21
	- pyct:              0.4.8
	- pydantic:          1.10.2
	- pydata-google-auth: 1.4.0
	- pydot:             1.3.0
	- pydot-ng:          2.0.0
	- pydotplus:         2.0.2
	- pydrive:           1.3.1
	- pyemd:             0.5.1
	- pyerfa:            2.0.0.1
	- pygments:          2.6.1
	- pygobject:         3.26.1
	- pylev:             1.4.0
	- pymc:              4.1.4
	- pymeeus:           0.5.12
	- pymongo:           4.3.3
	- pymystem3:         0.2.0
	- pyopengl:          3.1.6
	- pyopenssl:         22.1.0
	- pyparsing:         3.0.9
	- pyrsistent:        0.19.2
	- pysimdjson:        3.2.0
	- pysndfile:         1.3.8
	- pysocks:           1.7.1
	- pystan:            3.3.0
	- pytest:            3.6.4
	- python-apt:        0.0.0
	- python-dateutil:   2.8.2
	- python-louvain:    0.16
	- python-slugify:    7.0.0
	- python-utils:      3.4.5
	- pytorch-lightning: 1.8.6
	- pytz:              2022.6
	- pyviz-comms:       2.2.1
	- pywavelets:        1.4.1
	- pyyaml:            5.4.1
	- pyzmq:             23.2.1
	- qdldl:             0.1.5.post2
	- qudida:            0.0.4
	- regex:             2022.6.2
	- requests:          2.23.0
	- requests-oauthlib: 1.3.1
	- resampy:           0.4.2
	- roman:             2.0.0
	- rpy2:              3.5.5
	- rsa:               4.7.2
	- s3fs:              2022.11.0
	- s3transfer:        0.6.0
	- scikit-image:      0.18.3
	- scikit-learn:      1.0.2
	- scipy:             1.7.3
	- screen-resolution-extra: 0.0.0
	- scs:               3.2.2
	- seaborn:           0.11.2
	- send2trash:        1.8.0
	- sentry-sdk:        1.9.0
	- setproctitle:      1.3.2
	- setuptools:        57.4.0
	- setuptools-git:    1.2
	- shapely:           2.0.0
	- shortuuid:         1.0.11
	- six:               1.15.0
	- sklearn-pandas:    1.8.0
	- smart-open:        6.3.0
	- smmap:             5.0.0
	- snowballstemmer:   2.2.0
	- sortedcontainers:  2.4.0
	- soundfile:         0.11.0
	- spacy:             3.4.4
	- spacy-legacy:      3.0.10
	- spacy-loggers:     1.0.4
	- sphinx:            1.8.6
	- sphinxcontrib-serializinghtml: 1.1.5
	- sphinxcontrib-websupport: 1.2.4
	- sqlalchemy:        1.4.45
	- sqlparse:          0.4.3
	- srsly:             2.4.5
	- statsmodels:       0.12.2
	- sympy:             1.7.1
	- tables:            3.7.0
	- tabulate:          0.8.10
	- tblib:             1.7.0
	- tenacity:          8.1.0
	- tensorboard:       2.9.1
	- tensorboard-data-server: 0.6.1
	- tensorboard-plugin-wit: 1.8.1
	- tensorboardx:      2.5.1
	- tensorflow:        2.9.2
	- tensorflow-datasets: 4.6.0
	- tensorflow-estimator: 2.9.0
	- tensorflow-gcs-config: 2.9.1
	- tensorflow-hub:    0.12.0
	- tensorflow-io-gcs-filesystem: 0.28.0
	- tensorflow-metadata: 1.12.0
	- tensorflow-probability: 0.17.0
	- termcolor:         2.1.1
	- terminado:         0.13.3
	- testpath:          0.6.0
	- text-unidecode:    1.3
	- textblob:          0.15.3
	- thinc:             8.1.5
	- threadpoolctl:     3.1.0
	- tifffile:          2022.10.10
	- toml:              0.10.2
	- tomli:             2.0.1
	- toolz:             0.12.0
	- torch:             1.13.0+cu116
	- torchaudio:        0.13.0+cu116
	- torchmetrics:      0.11.0
	- torchsummary:      1.5.1
	- torchtext:         0.14.0
	- torchvision:       0.14.0+cu116
	- tornado:           6.0.4
	- tqdm:              4.64.1
	- traitlets:         5.7.1
	- tweepy:            3.10.0
	- typeguard:         2.7.1
	- typer:             0.7.0
	- typing-extensions: 4.4.0
	- tzlocal:           1.5.1
	- uritemplate:       3.0.1
	- urllib3:           1.25.11
	- vega-datasets:     0.9.0
	- wandb:             0.13.7
	- wasabi:            0.10.1
	- wcwidth:           0.2.5
	- webargs:           8.2.0
	- webencodings:      0.5.1
	- werkzeug:          1.0.1
	- wheel:             0.38.4
	- widgetsnbextension: 3.6.1
	- wordcloud:         1.8.2.2
	- wrapt:             1.14.1
	- xarray:            2022.12.0
	- xarray-einstats:   0.4.0
	- xgboost:           0.90
	- xkit:              0.0.0
	- xlrd:              1.2.0
	- xlwt:              1.3.0
	- yarl:              1.8.2
	- yellowbrick:       1.5
	- zict:              2.2.0
	- zipp:              3.11.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.8.16
	- version:           #1 SMP Fri Aug 26 08:44:51 UTC 2022

More info

What I really want for christmas this year, all packaged together:

  • I have a CSVLogger that persists to s3.
  • I have a WandbLogger that saves checkpoints to Wandb.
  • I have an S3 trainer.default_root_dir that also saves checkpoints to s3.

cc @Borda

@turian
Copy link
Contributor Author

turian commented Dec 30, 2022

Fixed formatting issues in the description.

@awaelchli awaelchli added feature Is an improvement or enhancement logger: csv and removed needs triage Waiting to be triaged by maintainers labels Jan 10, 2023
@awaelchli awaelchli added this to the future milestone Jan 10, 2023
@Borda Borda changed the title CSVLogger fails if save_dir is an s3 path (RuntimeError: Missing folder: s3://boringbucketjpt/csvloggerdoesntwork/lightning_logs/version_1.) CSVLogger fails if save_dir is an s3 path Jan 12, 2023
@Borda Borda changed the title CSVLogger fails if save_dir is an s3 path CSVLogger fails if save_dir is an s3 path Jan 12, 2023
@ELind77
Copy link

ELind77 commented Feb 22, 2023

I also have this issue and I noticed an issue that I think is related. When I run my Trainer with default_root_dir=s3:/bucket/path/ I find in my local working directory an s3: path! It even has all of the subdirectories added.

I happened to have this working in PyCharm so I ran a quick debug session and confirmed that pl.core.saving.save_hparams_to_yaml has the correct fs argument. It does indeed have an s3 file system in there and it's able to see and access the bucket I was trying to write to.

This makes me think that some other subroutine is getting a local filesystem passed and creating the requisite paths. Since S3 is a key-value system you must have some special function to create those empty directories on S3 in order to get that check to pass, right (assuming this worked properly in the past)?

I'm very new to lightning (slowly dragging myself away from all of my very old keras code) so I don't know the code base well enough to just dive in and patch this but this is definitely a serious irritant to my workflow.

@ELind77
Copy link

ELind77 commented Feb 24, 2023

Why is this labeled as a feature and not a bug?

@Borda Borda added bug Something isn't working and removed feature Is an improvement or enhancement labels Feb 24, 2023
@ELind77
Copy link

ELind77 commented Feb 24, 2023

Thank you @Borda. I appreciate that.

@carmocca
Copy link
Member

This is considered a feature and not a bug because fsspec support for the csv logger is not implemented. If it was, but it wasn't working properly then we would consider it a bug

@ELind77
Copy link

ELind77 commented Feb 28, 2023

Oh, that's interesting. Maybe the documentation needs to be changed then? I was going based on the Remote Filesystems documentation page which has this example at the top of the page:

# `default_root_dir` is the default path used for logs and checkpoints
trainer = Trainer(default_root_dir="s3://my_bucket/data/")
trainer.fit(model)

If my understanding is correct, that example should use the CSVLogger by default, right (as stated in the Trainer API docs)?

Thank you very much for the fast PR though. Maybe it's fixing a docs bug and adding a new feature?

@carmocca
Copy link
Member

I understand your confusion now. We changed the default logger from TensorBoardLogger to CSVLogger recently: #9900. TensorBoard did support fsspec, but CSVLogger didn't. So you are correct that the docs are incorrect until #16880 is merged

@carmocca carmocca modified the milestones: future, 2.0 Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working logger: csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants