You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reading data from partitioned S3 storage is not yet implemented:
create table forex_p (
datetime DateTime64(3),
bid String,
ask String,
base String,
quote String,
month String)
ENGINE=S3('https://datasets-documentation.s3.amazonaws.com/forex/csv/year_month/2000{_partition_id}-tick.csv.zst', 'CSVWithNames')
PARTITION BY month;
select count(*) from forex_p where month = '05';
-- Reading from a partitioned S3 storage is not implemented yet. (NOT_IMPLEMENTED)
Usually one needs to read from s3 table or function with globs and filter by _file or _path to only read from files/paths we care about. However, with use of extract and view, life can be made little easier:
select count(*)
from
s3('https://datasets-documentation.s3.amazonaws.com/forex/csv/year_month/2000{05,06}-tick.csv.zst', 'CSVWithNames')
-- 181425
-- 1 row in set. Elapsed: 3.632 sec. Processed 4.24 thousand rows, 38.14 KB (1.17 thousand rows/s., 10.50 KB/s.)
select count(*)
from
s3('https://datasets-documentation.s3.amazonaws.com/forex/csv/year_month/200005-tick.csv.zst', 'CSVWithNames')
-- 4238
-- 1 row in set. Elapsed: 2.634 sec.
create view s3_partition_read as (
select
*,
_file,
extract(_file, '2000([0-9]{2})') as month
from
s3('https://datasets-documentation.s3.amazonaws.com/forex/csv/year_month/2000{05,06}-tick.csv.zst', 'CSVWithNames')
)
select count(*) from s3_partition_read
-- 181425
select count(*) from s3_partition_read where month = '05'
-- 4238
I don't have verbose output to confirm that only relevant file was read (any tips how to get transferred s3 file list appreciated), only the time difference. Also, for some reason number of processed rows comes up only above certain amount of transferred data.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Reading data from partitioned S3 storage is not yet implemented:
Usually one needs to read from s3 table or function with globs and filter by
_file
or_path
to only read from files/paths we care about. However, with use of extract and view, life can be made little easier:I don't have verbose output to confirm that only relevant file was read (any tips how to get transferred s3 file list appreciated), only the time difference. Also, for some reason number of processed rows comes up only above certain amount of transferred data.
Hope that helps anyone, good luck
Beta Was this translation helpful? Give feedback.
All reactions