-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timestamps not matching format are replaced with nulls #662
Comments
dolfinus
changed the title
Timestamps not matchinf format are replaced with nulls
Timestamps not matching format are replaced with nulls
Oct 9, 2023
7 tasks
You did not include the column |
Tried: from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, TimestampType, StringType
spark = SparkSession.builder.config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.17.0").getOrCreate()
schema = StructType([StructField("created-at", TimestampType()), StructField("_corrupt_record", StringType())])
spark.read.format("xml").options(rowTag='item').schema(schema).load("1.xml").show(10, False)
It is worth mentioning in Readme that |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi.
I'm trying to parse simple xml file:
Result:
But if timestamp does not match format, e.g.
T
is replaced with space:It is read as
null
:I see that there is an option
mode
withPERMISSIVE
as default, which leads towhen it encounters a field of the wrong datatype, it sets the offending field to null
. But malformed value is not being added to column_corrupt_record
because there is nothing wrong with xml structure.So there is no way to detect if input file contains tag with wrong field value or
nullValue
, unless user set a differentmode
.Is that desired behavior?
The text was updated successfully, but these errors were encountered: