[SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do n… #637
build_main.yml
on: push
Run
/
Check changes
49s
Run
/
Breaking change detection with Buf (branch-3.5)
1m 0s
Run
/
Run TPC-DS queries with SF=1
1h 16m
Run
/
Run Docker integration tests
40m 55s
Run
/
Run Spark on Kubernetes Integration test
1h 26m
Matrix: Run / build
Matrix: Run / java-other-versions
Run
/
Build modules: sparkr
35m 34s
Run
/
Linters, licenses, dependencies and documentation generation
1h 38m
Matrix: Run / pyspark
Annotations
24 errors and 2 warnings
Run / Build modules: catalyst, hive-thriftserver
Process completed with exit code 18.
|
Run / Build modules: pyspark-sql, pyspark-resource, pyspark-testing
Process completed with exit code 19.
|
Run / Build modules: sql - other tests
Process completed with exit code 18.
|
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-9c83958b03806f1b-exec-1".
|
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-8981a58b03819bc1-exec-1".
|
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-fa20228b0385cd30-exec-1".
|
Run / Run Spark on Kubernetes Integration test
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=pods, name=spark-test-app-3851af25ab884f6a9ffcbfebcb653c38-driver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "spark-test-app-3851af25ab884f6a9ffcbfebcb653c38-driver" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={})..
|
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-99076e8b039b4c11-exec-1".
|
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-280f578b039c7e4b-exec-1".
|
Run / Run Spark on Kubernetes Integration test
HashSet() did not contain "decomtest-45821d8b03a0ab14-exec-1".
|
Run / Run Spark on Kubernetes Integration test
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=pods, name=spark-test-app-582d0486bc854331984d83d80bb955cd-driver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "spark-test-app-582d0486bc854331984d83d80bb955cd-driver" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={})..
|
Run / Build modules: pyspark-mllib, pyspark-ml, pyspark-ml-connect
The job running on runner GitHub Actions 20 has exceeded the maximum execution time of 300 minutes.
|
Run / Build modules: pyspark-mllib, pyspark-ml, pyspark-ml-connect
The operation was canceled.
|
CSVInferSchemaSuite.SPARK-45433: inferring the schema when timestamps do not match specified timestampFormat with only one row:
CSVInferSchemaSuite#L274
org.scalatest.exceptions.TestFailedException: TimestampType did not equal StringType
|
JsonInferSchemaSuite.SPARK-45433: inferring the schema when timestamps do not match specified timestampFormat with only one row:
JsonInferSchemaSuite#L121
org.scalatest.exceptions.TestFailedException: StructType(StructField("a", TimestampType, true, {})) did not equal StructType(StructField("a", StringType, true, {}))
|
|
python/pyspark/sql/tests/pandas/test_pandas_map.py.test_self_join:
python/pyspark/sql/tests/pandas/test_pandas_map.py#L1
[Errno 111] Connection refused
|
|
CSVLegacyTimeParserSuite.SPARK-37326: Timestamp type inference for a column with TIMESTAMP_NTZ values:
CSVLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException:
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
Relation [col0#78885] csv
== Analyzed Logical Plan ==
col0: string
Relation [col0#78885] csv
== Optimized Logical Plan ==
Relation [col0#78885] csv
== Physical Plan ==
FileScan csv [col0#78885] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-eadd0bd6-2cf3-4f33..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<col0:timestamp> struct<col0:string>
![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000]
|
CSVLegacyTimeParserSuite.SPARK-40474: Infer schema for columns with a mix of dates and timestamp:
CSVLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException:
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
Relation [_c0#84318] csv
== Analyzed Logical Plan ==
_c0: string
Relation [_c0#84318] csv
== Optimized Logical Plan ==
Relation [_c0#84318] csv
== Physical Plan ==
FileScan csv [_c0#84318] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-abdc142e-b580-4d37..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string>
== Results ==
== Results ==
!== Correct Answer - 3 == == Spark Answer - 3 ==
!struct<> struct<_c0:string>
![1423-11-12 23:41:00.0] [1423-11-12T23:41:00]
![1765-03-28 00:00:00.0] [1765-03-28]
![2016-01-28 20:00:00.0] [2016-01-28T20:00:00]
|
JsonLegacyTimeParserSuite.SPARK-37360: Timestamp type inference for a column with TIMESTAMP_NTZ values:
JsonLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException:
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
Relation [col0#744162] json
== Analyzed Logical Plan ==
col0: string
Relation [col0#744162] json
== Optimized Logical Plan ==
Relation [col0#744162] json
== Physical Plan ==
FileScan json [col0#744162] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-2ea8518d-d91a-454d..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<col0:timestamp> struct<col0:string>
![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 12:12:12.0] [2020-12-12T12:12:12.000]
|
JsonLegacyTimeParserSuite.SPARK-37360: Timestamp type inference for a mix of TIMESTAMP_NTZ and TIMESTAMP_LTZ:
JsonLegacyTimeParserSuite#L1
org.scalatest.exceptions.TestFailedException:
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
Relation [col0#744236] json
== Analyzed Logical Plan ==
col0: string
Relation [col0#744236] json
== Optimized Logical Plan ==
Relation [col0#744236] json
== Physical Plan ==
FileScan json [col0#744236] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-b22ba0d5-2c7d-4749..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string>
== Results ==
== Results ==
!== Correct Answer - 4 == == Spark Answer - 4 ==
!struct<col0:timestamp> struct<col0:string>
![2020-12-12 04:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 09:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000+05:00]
![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000Z]
|
JsonV1Suite.SPARK-37360: Timestamp type inference for a mix of TIMESTAMP_NTZ and TIMESTAMP_LTZ:
JsonV1Suite#L1
org.scalatest.exceptions.TestFailedException:
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
Relation [col0#726500] json
== Analyzed Logical Plan ==
col0: string
Relation [col0#726500] json
== Optimized Logical Plan ==
Relation [col0#726500] json
== Physical Plan ==
FileScan json [col0#726500] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-419728bb-b614-4194..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string>
== Results ==
== Results ==
!== Correct Answer - 4 == == Spark Answer - 4 ==
!struct<col0:timestamp> struct<col0:string>
![2020-12-12 04:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 09:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000+05:00]
![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000Z]
|
JsonV2Suite.SPARK-37360: Timestamp type inference for a mix of TIMESTAMP_NTZ and TIMESTAMP_LTZ:
JsonV2Suite#L1
org.scalatest.exceptions.TestFailedException:
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
RelationV2[col0#737657] json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047
== Analyzed Logical Plan ==
col0: string
RelationV2[col0#737657] json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047
== Optimized Logical Plan ==
RelationV2[col0#737657] json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047
== Physical Plan ==
*(1) Project [col0#737657]
+- BatchScan json file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2-8afe-2e00ac520047[col0#737657] JsonScan DataFilters: [], Format: json, Location: InMemoryFileIndex(1 paths)[file:/home/runner/work/spark/spark/target/tmp/spark-2cfc479c-0f8c-46b2..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:string> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 4 == == Spark Answer - 4 ==
!struct<col0:timestamp> struct<col0:string>
![2020-12-12 04:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 09:12:12.0] [2020-12-12T12:12:12.000]
![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000+05:00]
![2020-12-12 12:12:12.0] [2020-12-12T17:12:12.000Z]
|
Run / Build modules: pyspark-core, pyspark-streaming
No files were found with the provided path: **/target/test-reports/*.xml. No artifacts will be uploaded.
|
Run / Build modules: pyspark-errors
No files were found with the provided path: **/target/test-reports/*.xml. No artifacts will be uploaded.
|
Artifacts
Produced during runtime
Name | Size | |
---|---|---|
site
Expired
|
59.2 MB |
|
test-results-catalyst, hive-thriftserver--17-hadoop3-hive2.3
Expired
|
2.61 MB |
|
test-results-core, unsafe, kvstore, avro, network-common, network-shuffle, repl, launcher, examples, sketch, graphx--17-hadoop3-hive2.3
Expired
|
132 KB |
|
test-results-docker-integration--8-hadoop3-hive2.3
Expired
|
119 KB |
|
test-results-hive-- other tests-17-hadoop3-hive2.3
Expired
|
911 KB |
|
test-results-hive-- slow tests-17-hadoop3-hive2.3
Expired
|
853 KB |
|
test-results-pyspark-connect--8-hadoop3-hive2.3
Expired
|
409 KB |
|
test-results-pyspark-mllib, pyspark-ml, pyspark-ml-connect--8-hadoop3-hive2.3
Expired
|
480 KB |
|
test-results-pyspark-pandas--8-hadoop3-hive2.3
Expired
|
1.14 MB |
|
test-results-pyspark-pandas-connect-part0--8-hadoop3-hive2.3
Expired
|
1.06 MB |
|
test-results-pyspark-pandas-connect-part1--8-hadoop3-hive2.3
Expired
|
972 KB |
|
test-results-pyspark-pandas-connect-part2--8-hadoop3-hive2.3
Expired
|
637 KB |
|
test-results-pyspark-pandas-connect-part3--8-hadoop3-hive2.3
Expired
|
326 KB |
|
test-results-pyspark-pandas-slow--8-hadoop3-hive2.3
Expired
|
1.85 MB |
|
test-results-pyspark-sql, pyspark-resource, pyspark-testing--8-hadoop3-hive2.3
Expired
|
47.9 KB |
|
test-results-sparkr--8-hadoop3-hive2.3
Expired
|
280 KB |
|
test-results-sql-- extended tests-17-hadoop3-hive2.3
Expired
|
2.96 MB |
|
test-results-sql-- other tests-17-hadoop3-hive2.3
Expired
|
4.29 MB |
|
test-results-sql-- slow tests-17-hadoop3-hive2.3
Expired
|
2.76 MB |
|
test-results-streaming, sql-kafka-0-10, streaming-kafka-0-10, mllib-local, mllib, yarn, kubernetes, hadoop-cloud, spark-ganglia-lgpl, connect, protobuf--17-hadoop3-hive2.3
Expired
|
2.12 MB |
|
test-results-tpcds--8-hadoop3-hive2.3
Expired
|
21.8 KB |
|
unit-tests-log-catalyst, hive-thriftserver--17-hadoop3-hive2.3
Expired
|
8.38 MB |
|
unit-tests-log-pyspark-mllib, pyspark-ml, pyspark-ml-connect--8-hadoop3-hive2.3
Expired
|
322 MB |
|
unit-tests-log-pyspark-sql, pyspark-resource, pyspark-testing--8-hadoop3-hive2.3
Expired
|
456 MB |
|
unit-tests-log-sql-- other tests-17-hadoop3-hive2.3
Expired
|
288 MB |
|