Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

PARQUET-111: Updates for apache release

Updates for first Apache release of parquet-mr.

Author: Ryan Blue <blue@apache.org>

Closes #109 from rdblue/PARQUET-111-update-for-apache-release and squashes the following commits:

bf19849 [Ryan Blue] PARQUET-111: Add ARRIS copyright header to parquet-tools.
f1a5c28 [Ryan Blue] PARQUET-111: Update headers in parquet-protobuf.
ee4ea88 [Ryan Blue] PARQUET-111: Remove leaked LICENSE and NOTICE files.
5bf178b [Ryan Blue] PARQUET-111: Update module names, urls, and binary LICENSE files.
6736320 [Ryan Blue] PARQUET-111: Add RAT exclusion for auto-generated POM files.
7db4553 [Ryan Blue] PARQUET-111: Add attribution for Spark dev script to LICENSE.
45e29f2 [Ryan Blue] PARQUET-111: Update LICENSE and NOTICE.
516c058 [Ryan Blue] PARQUET-111: Update license headers to pass RAT check.
da688e3 [Ryan Blue] PARQUET-111: Update NOTICE with Apache boilerplate.
234715d [Ryan Blue] PARQUET-111: Add DISCLAIMER and KEYS.
f1d3601 [Ryan Blue] PARQUET-111: Update to use Apache parent POM.
latest commit 3df3372a1e
@rdblue rdblue authored

README.md

Parquet Jackson

Parquet-Jackson is just a dummy module to shade Jackson artifacts.

Rationale

Parquet internally uses the well-known JSON processor Jackson. Because Apache Hadoop (amongst others) sometimes uses an older version of Jackson, Parquet "shades" its copy of Jackson to prevent any side-effect. Originally a copy of Jackson was embedded in each Parquet artifact requiring Jackson, but to prevent duplication, a shared module "Parquet-Jackson" has been created.

Note that this is not a fork of Jackson but the same classes as provided by Jackson artifacts, relocated under the parquet.org.codehaus.jackson namespace.

Detailed explanations

Shading is performed by the Apache Maven Shade plugin. It is done during the package lifecycle phase, right after the original jar creation. The plugin will replace both the jar and the pom files with new versions with specified dependencies embeded and pom.xml file updated to not refer to those dependencies.

parquet-jackson module will create a new jar artifact containing all Jackson classes, relocated under parquet.org.codehaus.jackson package. The shade plugin will transform pom.xml too to remove any reference to Jackson dependency.

Other Parquet modules which requires Jackson are configured to depend on parquet-jackson module and still perform shading. The difference is that all Jackson classes are excluded from inclusion, but references from Parquet to Jackson classes are still relocated. The shade plugin will also remove any reference to Jackson dependency but will preserve the parquet-jackson dependency which contains the relocated classes.

Why still refering directly to org.codehaus.jackson:* in Parquet modules

Because of the way Maven handles multi-modules project. Let's assume that parquet-foo module uses Jackson. When executing mvn package, parquet-jackson module will be built first and artifact will be packaged, and a new pom.xml without Jackson dependency is created and used by parquet-foo module. Since Jackson dependencies have been removed by the shade plugin, compilation of parquet-foo will fail.

Something went wrong with that request. Please try again.