Skip to content

ISSUE: IMPORT from s3 parquet with where statement (LESS ROWS COUNT INSERTED) #93204

@ashit-official

Description

@ashit-official

Company or project name

No response

Describe what's wrong

INSERT INTO logs_prod.logs (a_1,a_2) select (a_1 as a1, a_2 as a2)
FROM s3(
  'https://prod.s3.ap-south-1.amazonaws.com/logs-parquet/logs-2025-12-*.parquet',
  'Parquet',
  extra_credentials(
    role_arn = 'arn:aws:iam::xxxx:role/clickhouse-s3'
  )
) ;

this above query is working as expect but not with a where statement

where a2 = '111409077535327';

a2 is string in parquet 
a_2 is FixedString(17) in logs

-- No dedup issue --
ENGINE = ReplacingMergeTree(a_1)
PARTITION BY toYYYYMM(a_1)
ORDER BY (a_2, a_1)
SETTINGS index_granularity = 8192;

Does it reproduce on the most recent release?

No

How to reproduce

STEP 1: CREATE TABLE

CREATE TABLE logs
(
    `a_1`                       FixedString(17),
    `a_2`          DateTime64(3),
)
ENGINE = ReplacingMergeTree(a_1) 
PARTITION BY toYYYYMM(a_1)
ORDER BY (a_2, a_1)
SETTINGS index_granularity = 8192;

STEP 2
IMPORT FROM S3 PARQUET WITH WHERE STATEMENT

INSERT INTO logs_prod.logs (a_1,a_2) SELECT (a_1 as a1, a_2 as a2)
FROM s3(
  'https://prod.s3.ap-south-1.amazonaws.com/logs-parquet/logs-2025-12-*.parquet',
  'Parquet',
  extra_credentials(
    role_arn = 'arn:aws:iam::xxxx:role/clickhouse-s3'
  )
) where a2 = '111409077535327'; 

Expected behavior

select count()
FROM s3(
  'https://prod.s3.ap-south-1.amazonaws.com/logs-parquet/logs-2025-12-*.parquet',
  'Parquet',
  extra_credentials(
    role_arn = 'arn:aws:iam::xxxx:role/clickhouse-s3'
  )
)   where a2 = '111409077535327';  

// 200000 records

Insert should be 200000 records
** inseting only 15k records**

Error message and/or stacktrace

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    potential bugTo be reviewed by developers and confirmed/rejected.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions