Skip to content

open_dataset on folder with gzip files #8505

@martindut

Description

@martindut

Hi
I'm trying to run open_dataset on a folder that contains tsv files that are gzipped.
ds <- open_dataset(.path, format = "tsv", delim = "\t", schema = aschema)
which returns
FileSystemDataset with 24 csv files
valuationdate: int32
CGTClientID: int32
CGTInstrumentID: int32
AiaRecType: int32
ParcelID: int32
n: int32
AiaAdjustAmt: float
mindate: int32
maxdate: int32

However, if I call collect on the dataset, I get this error

Error in dataset___Scanner__ToTable(self) :
Invalid: Could not open CSV input source 'C:/inndx/investmentaccountingdata/snapshot/aiaparcelsumm/obelix/v1.0/curo/TPA_UnitTrust/2020/01/06/135844/curo_[TPA_UnitTrust]_20200103_135844.gz': Invalid: CSV parse error: Expected 1 columns, got 2

I can open a individual file with

a_df <- read_tsv_arrow(
  file = .file,
  schema = rschema,
  col_names = TRUE,
  skip_empty_rows = TRUE,
  as_data_frame = FALSE
)

and it works perfectly. I can also do open_dataset on folders that contains parquet files and that also works perfectly.
I'm running on windows 10

Please advise if I'm doing something wrong here

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions