Skip to content

[SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do n… #637

[SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do n…

[SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do n… #637

Triggered via push October 6, 2023 04:57
Status Failure
Total duration 5h 5m 33s
Artifacts 25

build_main.yml

on: push
Run  /  Check changes
49s
Run / Check changes
Run  /  Base image build
55s
Run / Base image build
Run  /  Breaking change detection with Buf (branch-3.5)
1m 0s
Run / Breaking change detection with Buf (branch-3.5)
Run  /  Run TPC-DS queries with SF=1
1h 16m
Run / Run TPC-DS queries with SF=1
Run  /  Run Docker integration tests
40m 55s
Run / Run Docker integration tests
Run  /  Run Spark on Kubernetes Integration test
1h 26m
Run / Run Spark on Kubernetes Integration test
Matrix: Run / build
Matrix: Run / java-other-versions
Run  /  Build modules: sparkr
35m 34s
Run / Build modules: sparkr
Run  /  Linters, licenses, dependencies and documentation generation
1h 38m
Run / Linters, licenses, dependencies and documentation generation
Matrix: Run / pyspark
Fit to window
Zoom out
Zoom in

Annotations

24 errors and 2 warnings
Run / Build modules: catalyst, hive-thriftserver
Process completed with exit code 18.
Run / Build modules: pyspark-sql, pyspark-resource, pyspark-testing
Process completed with exit code 19.
Run / Build modules: sql - other tests
Process completed with exit code 18.
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-9c83958b03806f1b-exec-1".
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-8981a58b03819bc1-exec-1".
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-fa20228b0385cd30-exec-1".
Run / Run Spark on Kubernetes Integration test
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=pods, name=spark-test-app-3851af25ab884f6a9ffcbfebcb653c38-driver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "spark-test-app-3851af25ab884f6a9ffcbfebcb653c38-driver" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={})..
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-99076e8b039b4c11-exec-1".
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-280f578b039c7e4b-exec-1".
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-45821d8b03a0ab14-exec-1".
Run / Run Spark on Kubernetes Integration test
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=pods, name=spark-test-app-582d0486bc854331984d83d80bb955cd-driver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "spark-test-app-582d0486bc854331984d83d80bb955cd-driver" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={})..
Run / Build modules: pyspark-mllib, pyspark-ml, pyspark-ml-connect
The job running on runner GitHub Actions 20 has exceeded the maximum execution time of 300 minutes.
Run / Build modules: pyspark-mllib, pyspark-ml, pyspark-ml-connect
The operation was canceled.
CSVInferSchemaSuite.SPARK-45433: inferring the schema when timestamps do not match specified timestampFormat with only one row: CSVInferSchemaSuite#L274
org.scalatest.exceptions.TestFailedException: TimestampType did not equal StringType
JsonInferSchemaSuite.SPARK-45433: inferring the schema when timestamps do not match specified timestampFormat with only one row: JsonInferSchemaSuite#L121
org.scalatest.exceptions.TestFailedException: StructType(StructField("a", TimestampType, true, {})) did not equal StructType(StructField("a", StringType, true, {}))
CSVLegacyTimeParserSuite.SPARK-37326: Timestamp type inference for a column with TIMESTAMP_NTZ values: CSVLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException: Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == Relation [col0#78885] csv == Analyzed Logical Plan == col0: string Relation [col0#78885] csv == Optimized Logical Plan == Relation [col0#78885] csv == Physical Plan == FileScan csv [col0#78885] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-eadd0bd6-2cf3-4f33..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 2 == !struct<col0:timestamp> struct<col0:string> ![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000]
CSVLegacyTimeParserSuite.SPARK-40474: Infer schema for columns with a mix of dates and timestamp: CSVLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException: Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == Relation [_c0#84318] csv == Analyzed Logical Plan == _c0: string Relation [_c0#84318] csv == Optimized Logical Plan == Relation [_c0#84318] csv == Physical Plan == FileScan csv [_c0#84318] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-abdc142e-b580-4d37..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string> == Results == == Results == !== Correct Answer - 3 == == Spark Answer - 3 == !struct<> struct<_c0:string> ![1423-11-12 23:41:00.0] [1423-11-12T23:41:00] ![1765-03-28 00:00:00.0] [1765-03-28] ![2016-01-28 20:00:00.0] [2016-01-28T20:00:00]
JsonLegacyTimeParserSuite.SPARK-37360: Timestamp type inference for a column with TIMESTAMP_NTZ values: JsonLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException: Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == Relation [col0#744162] json == Analyzed Logical Plan == col0: string Relation [col0#744162] json == Optimized Logical Plan == Relation [col0#744162] json == Physical Plan == FileScan json [col0#744162] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-2ea8518d-d91a-454d..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 2 == !struct<col0:timestamp> struct<col0:string> ![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000]
JsonLegacyTimeParserSuite.SPARK-37360: Timestamp type inference for a mix of TIMESTAMP_NTZ and TIMESTAMP_LTZ: JsonLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException: Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == Relation [col0#744236] json == Analyzed Logical Plan == col0: string Relation [col0#744236] json == Optimized Logical Plan == Relation [col0#744236] json == Physical Plan == FileScan json [col0#744236] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-b22ba0d5-2c7d-4749..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string> == Results == == Results == !== Correct Answer - 4 == == Spark Answer - 4 == !struct<col0:timestamp> struct<col0:string> ![2020-12-12 04:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 09:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000+05:00] ![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000Z]
JsonV1Suite.SPARK-37360: Timestamp type inference for a mix of TIMESTAMP_NTZ and TIMESTAMP_LTZ: JsonV1Suite#L1
org.scalatest.exceptions.TestFailedException: Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == Relation [col0#726500] json == Analyzed Logical Plan == col0: string Relation [col0#726500] json == Optimized Logical Plan == Relation [col0#726500] json == Physical Plan == FileScan json [col0#726500] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-419728bb-b614-4194..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string> == Results == == Results == !== Correct Answer - 4 == == Spark Answer - 4 == !struct<col0:timestamp> struct<col0:string> ![2020-12-12 04:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 09:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000+05:00] ![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000Z]
JsonV2Suite.SPARK-37360: Timestamp type inference for a mix of TIMESTAMP_NTZ and TIMESTAMP_LTZ: JsonV2Suite#L1
org.scalatest.exceptions.TestFailedException: Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == RelationV2[col0#737657] json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047 == Analyzed Logical Plan == col0: string RelationV2[col0#737657] json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047 == Optimized Logical Plan == RelationV2[col0#737657] json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047 == Physical Plan == *(1) Project [col0#737657] +- BatchScan json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047[col0#737657] JsonScan DataFilters: [], Format: json, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 4 == == Spark Answer - 4 == !struct<col0:timestamp> struct<col0:string> ![2020-12-12 04:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 09:12:12.0] [2020-12-12T12:12:12.000] ![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000+05:00] ![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000Z]
Run / Build modules: pyspark-core, pyspark-streaming
No files were found with the provided path: **/target/test-reports/*.xml. No artifacts will be uploaded.
Run / Build modules: pyspark-errors
No files were found with the provided path: **/target/test-reports/*.xml. No artifacts will be uploaded.

Artifacts

Produced during runtime
Name Size
site Expired
59.2 MB
test-results-catalyst, hive-thriftserver--17-hadoop3-hive2.3 Expired
2.61 MB
test-results-core, unsafe, kvstore, avro, network-common, network-shuffle, repl, launcher, examples, sketch, graphx--17-hadoop3-hive2.3 Expired
132 KB
test-results-docker-integration--8-hadoop3-hive2.3 Expired
119 KB
test-results-hive-- other tests-17-hadoop3-hive2.3 Expired
911 KB
test-results-hive-- slow tests-17-hadoop3-hive2.3 Expired
853 KB
test-results-pyspark-connect--8-hadoop3-hive2.3 Expired
409 KB
test-results-pyspark-mllib, pyspark-ml, pyspark-ml-connect--8-hadoop3-hive2.3 Expired
480 KB
test-results-pyspark-pandas--8-hadoop3-hive2.3 Expired
1.14 MB
test-results-pyspark-pandas-connect-part0--8-hadoop3-hive2.3 Expired
1.06 MB
test-results-pyspark-pandas-connect-part1--8-hadoop3-hive2.3 Expired
972 KB
test-results-pyspark-pandas-connect-part2--8-hadoop3-hive2.3 Expired
637 KB
test-results-pyspark-pandas-connect-part3--8-hadoop3-hive2.3 Expired
326 KB
test-results-pyspark-pandas-slow--8-hadoop3-hive2.3 Expired
1.85 MB
test-results-pyspark-sql, pyspark-resource, pyspark-testing--8-hadoop3-hive2.3 Expired
47.9 KB
test-results-sparkr--8-hadoop3-hive2.3 Expired
280 KB
test-results-sql-- extended tests-17-hadoop3-hive2.3 Expired
2.96 MB
test-results-sql-- other tests-17-hadoop3-hive2.3 Expired
4.29 MB
test-results-sql-- slow tests-17-hadoop3-hive2.3 Expired
2.76 MB
test-results-streaming, sql-kafka-0-10, streaming-kafka-0-10, mllib-local, mllib, yarn, kubernetes, hadoop-cloud, spark-ganglia-lgpl, connect, protobuf--17-hadoop3-hive2.3 Expired
2.12 MB
test-results-tpcds--8-hadoop3-hive2.3 Expired
21.8 KB
unit-tests-log-catalyst, hive-thriftserver--17-hadoop3-hive2.3 Expired
8.38 MB
unit-tests-log-pyspark-mllib, pyspark-ml, pyspark-ml-connect--8-hadoop3-hive2.3 Expired
322 MB
unit-tests-log-pyspark-sql, pyspark-resource, pyspark-testing--8-hadoop3-hive2.3 Expired
456 MB
unit-tests-log-sql-- other tests-17-hadoop3-hive2.3 Expired
288 MB