Skip to content

[VL] GlutenCSVLegacyTimeParserSuite legacy Timestamp mode is not supported #11505

@jinchengchenghh

Description

@jinchengchenghh

Backend

VL (Velox)

Bug description

// Unsupported format: yyyy-MM-dd HH:mm:ss.SSS

2026-01-22T09:15:29.4117826Z - Write timestamps correctly in ISO8601 format by default *** FAILED ***
2026-01-22T09:15:29.4118820Z   Results do not match for query:
2026-01-22T09:15:29.4121572Z   Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
2026-01-22T09:15:29.4124474Z   Timezone Env: 
2026-01-22T09:15:29.4124883Z   
2026-01-22T09:15:29.4125269Z   == Parsed Logical Plan ==
2026-01-22T09:15:29.4125946Z   UnresolvedDataSource format: csv, isStreaming: false, paths: 1 provided
2026-01-22T09:15:29.4126636Z   
2026-01-22T09:15:29.4127042Z   == Analyzed Logical Plan ==
2026-01-22T09:15:29.4127508Z   date: string
2026-01-22T09:15:29.4127952Z   Relation [date#23116] csv
2026-01-22T09:15:29.4128393Z   
2026-01-22T09:15:29.4128803Z   == Optimized Logical Plan ==
2026-01-22T09:15:29.4129380Z   Relation [date#23116] csv
2026-01-22T09:15:29.4129752Z   
2026-01-22T09:15:29.4129996Z   == Physical Plan ==
2026-01-22T09:15:29.4131626Z   FileScan csv [date#23116] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-4339fa22-4a66-46b0-a8e1-535b6d2bfbcd/iso8601timestamps..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<date:string>
2026-01-22T09:15:29.4133503Z   
2026-01-22T09:15:29.4133787Z   == Results ==
2026-01-22T09:15:29.4134095Z   
2026-01-22T09:15:29.4134350Z   == Results ==
2026-01-22T09:15:29.4134765Z   !== Correct Answer - 5 ==             == Spark Answer - 5 ==
2026-01-22T09:15:29.4135354Z   !struct<>                             struct<date:string>
2026-01-22T09:15:29.4135898Z   ![1800-01-01T10:07:02.000-07:52:58]   [01/01/1800 18:00Z]
2026-01-22T09:15:29.4136432Z   ![1885-01-01T10:30:00.000-08:00]      [01/01/1885 18:30Z]
2026-01-22T09:15:29.4136922Z   ![2014-10-27T18:30:00.000-07:00]      [26/08/2015 18:00]
2026-01-22T09:15:29.4137434Z   ![2015-08-26T18:00:00.000-07:00]      [27/10/2014 18:30]
2026-01-22T09:15:29.4138037Z   ![2016-01-28T20:00:00.000-08:00]      [28/01/2016 20:00] (QueryTest.scala:273)
2026-01-22T09:16:00.9069041Z - csv with variant *** FAILED ***
2026-01-22T09:16:00.9069591Z   Results do not match for query:
2026-01-22T09:16:00.9072913Z   Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
2026-01-22T09:16:00.9075775Z   Timezone Env: 
2026-01-22T09:16:00.9076098Z   
2026-01-22T09:16:00.9076380Z   == Parsed Logical Plan ==
2026-01-22T09:16:00.9076829Z   'Project [unresolvedalias(cast('v as string))]
2026-01-22T09:16:00.9077337Z   +- Relation [v#30332] csv
2026-01-22T09:16:00.9077706Z   
2026-01-22T09:16:00.9078003Z   == Analyzed Logical Plan ==
2026-01-22T09:16:00.9078383Z   v: string
2026-01-22T09:16:00.9078726Z   Project [cast(v#30332 as string) AS v#30333]
2026-01-22T09:16:00.9079198Z   +- Relation [v#30332] csv
2026-01-22T09:16:00.9079545Z   
2026-01-22T09:16:00.9079842Z   == Optimized Logical Plan ==
2026-01-22T09:16:00.9080314Z   Project [cast(v#30332 as string) AS v#30333]
2026-01-22T09:16:00.9080799Z   +- Relation [v#30332] csv
2026-01-22T09:16:00.9081156Z   
2026-01-22T09:16:00.9081430Z   == Physical Plan ==
2026-01-22T09:16:00.9081867Z   *(1) Project [cast(v#30332 as string) AS v#30333]
2026-01-22T09:16:00.9083780Z   +- FileScan csv [v#30332] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e0f3df4e-41e5-468b-a044-8748de0bdfd6], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<v:variant>
2026-01-22T09:16:00.9085414Z   
2026-01-22T09:16:00.9085698Z   == Results ==
2026-01-22T09:16:00.9086012Z   
2026-01-22T09:16:00.9086285Z   == Results ==
2026-01-22T09:16:00.9086783Z   !== Correct Answer - 6 ==                                   == Spark Answer - 6 ==
2026-01-22T09:16:00.9087522Z   !struct<>                                                   struct<v:string>
2026-01-22T09:16:00.9088314Z   ![{"_c0":"2000-01-01","_c1":"2000-01-01 01:02:03+00:00"}]   [{"_c0":"2000-01-01","_c1":"2000-01-01 01:02:03"}]
2026-01-22T09:16:00.9089409Z    [{"_c0":"field 1","_c1":"field2"}]                         [{"_c0":"field 1","_c1":"field2"}]
2026-01-22T09:16:00.9090142Z    [{"_c0":"missing"}]                                        [{"_c0":"missing"}]
2026-01-22T09:16:00.9090904Z    [{"_c0":100,"_c1":1.1}]                                    [{"_c0":100,"_c1":1.1}]
2026-01-22T09:16:00.9091709Z    [{"_c0":1000000000,"_c1":"hello","_c2":"extra"}]           [{"_c0":1000000000,"_c1":"hello","_c2":"extra"}]
2026-01-22T09:16:00.9092851Z    [{"_c0":null,"_c1":true}]                                  [{"_c0":null,"_c1":true}] (QueryTest.scala:273)

Gluten version

No response

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions