Hi,
Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering after open_datatset(). Specifically, the code below :
library(dplyr)
library(arrow)
data = data.frame(a = c("a", "a2", "a3"))
write_parquet(data, "Test_filter/data.parquet")
ds <- open_dataset("Test_filter/")
data_flt <- ds %>%
filter(substr(a, 1, 1) == "a")
gives this error :
Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a"
Call collect() first to pull data into R.
These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?
Thank you.