-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Hi
I'm trying to run open_dataset on a folder that contains tsv files that are gzipped.
ds <- open_dataset(.path, format = "tsv", delim = "\t", schema = aschema)
which returns
FileSystemDataset with 24 csv files
valuationdate: int32
CGTClientID: int32
CGTInstrumentID: int32
AiaRecType: int32
ParcelID: int32
n: int32
AiaAdjustAmt: float
mindate: int32
maxdate: int32
However, if I call collect on the dataset, I get this error
Error in dataset___Scanner__ToTable(self) :
Invalid: Could not open CSV input source 'C:/inndx/investmentaccountingdata/snapshot/aiaparcelsumm/obelix/v1.0/curo/TPA_UnitTrust/2020/01/06/135844/curo_[TPA_UnitTrust]_20200103_135844.gz': Invalid: CSV parse error: Expected 1 columns, got 2
I can open a individual file with
a_df <- read_tsv_arrow(
file = .file,
schema = rschema,
col_names = TRUE,
skip_empty_rows = TRUE,
as_data_frame = FALSE
)
and it works perfectly. I can also do open_dataset on folders that contains parquet files and that also works perfectly.
I'm running on windows 10
Please advise if I'm doing something wrong here