Thrift integration for Hadoop Writables
Thadoop

Thadoop (Thrift + Hadoop) is a simple set of wrapper templates to integrate Apache Thrift implementation with Hadoop's Writable interface.

The goal of this module is to provide a quick and easy way to use auto-generate thrift codes as a Input/Output for the Hadoop related tasks.


  • JDK 1.7+

  • Maven 3.0.4+

  • Working Thrift generated codes

  • Working Hadoop Environment

    (Sample idl src/thrift/idl/thadoop.thrift and generated Java codes src/thrift/java/* are included for convenience.)

Maven dependency configuration

In the maven project file, add maven repository and dependency to retrieve the thadoop module.


		<name>after-sunrise Release</name>
		<name>after-sunrise Snapshot</name>




This is the very base of all the other integrations. Implement a subclass of the custom TWritable first. (Swap ThadoopSample class in the sample below.)


public class SampleWritable extends TWritable<ThadoopSample> {
	private final ThadoopSample base = new ThadoopSample();
	public ThadoopSample get() {
		return base; // Note : Do NOT create new instance here !!!

This subclass can be fed to Mapper/Reducer directly, as the superclass implements the Hadoop's writable interface.


public class SampleJob extends Configured implements Tool {

	public int run(String[] args) throws Exception {
		Job job = ...

Pig Storage

Create a subclass of TStorage. This superclass implements the Pig's load function.

  • Handles thrift records stored in Hadoop's Sequence file format.
  • Key is ignored, and only the value will be parsed.

Subclass implementation should look like something below:


public class SampleStorage extends TStorage<ThadoopSample._Fields, SampleWritable> {
	public SampleStorage() {
		super(SampleWritable.class, ThadoopSample.metaDataMap);

Hive SerDe

Create a subclass of TSerDe. This superclass implements the Hive's SerDe interface.


public class SampleSerDe extends TSerDe {

	public SampleSerDe() {