Feature request: Can we have a more compact output formats than CSV such as Parquet? #3332

jachymb · 2025-02-17T20:43:35Z

I run some experiments, where the output CSV file easily becomes >100 GiB. An example is when fitting a model with Gaussian process as a latent variable, where there is essentially one parameter for each datapoint and this is repeated on each row of the output file. Running this many times over for different inputs makes it challenging to even manage the file storage and also just reading the file to memory becomes trickier.

It would be cool, if we had the option to directly store the outputs in other formats, in particular Apache Parquet or Avro are popular in data science and use a more compact data representation with some compression on top and allow for natural integration with other big data tooling.

Personally, I would favor Parquet: It is a columnar format, which could be suitable if we want to discard columns with nuisance parameters or the runtime values (I mean the values like stepsize__ etc.) from the stored STAN output without any unnecessary computational overhead (i.e. not processing the entire file). Also, it does support structured values, which means a vector/matrix parameter could be stored as in a single column, making the whole thing easier to parse than the CSV.

The text was updated successfully, but these errors were encountered:

mitzimorris · 2025-02-17T21:53:00Z

this is a planned feature - see https://github.com/stan-dev/design-docs/blob/master/designs/0032-stan-output-formats.md

WardBrian added the i/o label Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature request: Can we have a more compact output formats than CSV such as Parquet? #3332

Feature request: Can we have a more compact output formats than CSV such as Parquet? #3332

jachymb commented Feb 17, 2025

mitzimorris commented Feb 17, 2025

Uh oh!

Uh oh!

Feature request: Can we have a more compact output formats than CSV such as Parquet? #3332

Feature request: Can we have a more compact output formats than CSV such as Parquet? #3332

Comments

jachymb commented Feb 17, 2025

mitzimorris commented Feb 17, 2025

Uh oh!