Releases · databricks/spark-xml

10 Apr 16:06

srowen

v0.18.0

ddd1ef5

v0.18.0 Latest

Latest

What's Changed

Use defined timezone on write for formats that need TZ info by @srowen in #665
Add notes about file extensions and _corrupt_record to documentation by @dolfinus in #674
Fix for xml expression to not parse arbitrary strings by @xanderbailey in #679
Update for 0.18.0, move CICD configs to supported Spark versions by @srowen in #680

New Contributors

@dolfinus made their first contribution in #674
@xanderbailey made their first contribution in #679

Full Changelog: v0.17.0...v0.18.0

Contributors

srowen, dolfinus, and xanderbailey

Assets 2

07 Sep 05:11

srowen

v0.17.0

b2611bd

Version 0.17.0

Improve handling of XSD complex type, decimal (#631, #638)
Restore behavior of ignoreSurroundingSpaces (#637)
Improve schema inference performance (#660)
Fix corner case of double/float type inference (#644)

See https://github.com/databricks/spark-xml/milestone/14?closed=1

Note that this is intended to be the final stand-alone release of spark-xml, as it is being incorporated into Apache Spark 4.0.

Assets 2

05 Jan 14:39

srowen

v0.16.0

088c83f

Version 0.16.0

Minor bug fixes
Custom timestamp formats now use session timezone when not specified in the format/input (#621)
Some "ref" elements work in XSD schemas now ((#619)
'arrayElementName' can be used to control the schema name used for array elements when writing (#603)

See https://github.com/databricks/spark-xml/milestone/13?closed=1

Assets 2

03 Jun 17:17

srowen

v0.15.0

f4d592b

Version 0.15.0

This is a minor bug fix release, primarily for:

#582 Fix a Hadoop conf bug that interferes with running multiple separate spark-xml reads/write jobs concurrently

Assets 2

21 Oct 19:23

srowen

v0.14.0

a17f473

Version 0.14.0

This release is primarily to support Spark 3.2.0 and Scala 2.13. Support for Scala 2.11, previously deprecated, is removed. Spark 2 is not officially supported now, but should continue to work with Scala 2.12 builds.

It includes one new feature, otherwise:

Control XML declaration in XML output (#560)

See https://github.com/databricks/spark-xml/issues?q=is%3Aclosed+milestone%3A0.14.0

Assets 2

21 Sep 00:40

srowen

v0.13.0

80f8737

Version 0.13.0

This is a minor bug fix release; see https://github.com/databricks/spark-xml/pulls?q=is%3Apr+is%3Aclosed+milestone%3A0.13.0

Improvement: better handling of certain XSD complexTypes in XSD -> schema parsing (#559)
Fix: Return null for primitive types when value matches nullValue string (#542)
Deprecated Dataset[String] implicit and improve XmlReader options (#528)
Deprecated Scala 2.11 support

Assets 2

23 Feb 20:28

srowen

v0.12.0

ceed1b8

Version 0.12.0

Fixed schema inference for date types (#521)
Fixed some type inferences of primitive types (int vs long) from XSDs) (#522)
Fixed parsing of partial result when a row fails to parse (#518)
Fixed bug in parsing missing optional child tags in certain situations (#513)
Fixed parsing of non-UTF-8 XML data (#511)
Added support for additional timestampFormat, dateFormat format for reading, writing timestamp / date in XML

https://github.com/databricks/spark-xml/milestone/9?closed=1

https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.12.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.12.0/

Assets 2

07 Dec 17:51

srowen

v0.11.0

74b9802

Version 0.11.0

Reading:
- Support for 'wildcard' columns (wildcardColName) matching anything, corresponding to XSD xs:any types
- Can optionally ignore namespace prefixes with ignoreNamespace
- MapType columns now read attributes correctly
Writing:
- Root tag can have attributes
- Timestamp output format now follows XML standards
Minor fixes and improvements to XSD schema support

Changes: https://github.com/databricks/spark-xml/milestone/8?closed=1

https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.11.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/0.11.0/

Assets 2

25 Aug 22:04

srowen

v0.10.0

61d74ee

Version 0.10.0

Highlights:

Bug fix: in rare cases, parsing an uncompressed XML file could miss a record. (#468)
Bug fix: parsing XML subtree as string field would lose attributes (#469)
Feature: experimental support for inferring a Spark schema from an XSD (#457)
Other minor bug fixes

Changes: https://github.com/databricks/spark-xml/milestone/7?closed=1

https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.10.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/0.10.0/

Assets 2

02 Mar 02:18

srowen

v0.9.0

bb4b9ff

Version 0.9.0

Highlights:

Support XSD validation in from_xml (#433)
Don't ignore unclosed tag content (#437)
Helper functions to support manually using from_xml, etc from Python (#438)

Changes: https://github.com/databricks/spark-xml/milestone/6?closed=1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

Releases: databricks/spark-xml

v0.18.0

What's Changed

New Contributors

Contributors

Version 0.17.0

Version 0.16.0

Version 0.15.0

Version 0.14.0

Version 0.13.0

Version 0.12.0

Version 0.11.0

Version 0.10.0

Version 0.9.0