CascadingValueWriter should also handle hadoop value types and not just jdk value types #110

jeoffreylim · 2013-11-26T19:05:21Z

In some cases cascading tuple object can contain hadoop value types instead of plain jdk types. I encountered this in my cascading project and threw an exception of:

Caused by: org.elasticsearch.hadoop.serialization.SerializationException: Cannot handle type [class cascading.scheme.ConcreteCall], instance [cascading.scheme.ConcreteCall@5694fe42] using writer [org.elasticsearch.hadoop.cascading.CascadingValueWriter@4fc0cb76]

After some investigation, the problem is the JdkValueWriter does not successfully write the object to the sink due to the actual object from the tuple is of type class org.apache.hadoop.io.Text which should can be handled by WritableValueWriter.

With some minor fixes in the CascadingValueWriter this solved my problem:

public class CascadingValueWriter implements ValueWriter<SinkCall<Object[], ?>> {

private final ValueWriter<Object> jdkWriter;
private final ValueWriter<Writable> hadoopWriter;

public CascadingValueWriter() {
    this(false);
}

public CascadingValueWriter(boolean writeUnknownTypes) {
    jdkWriter = new JdkValueWriter(writeUnknownTypes);
    hadoopWriter = new WritableValueWriter(writeUnknownTypes);
}

@SuppressWarnings("unchecked")
@Override
public boolean write(SinkCall<Object[], ?> sinkCall, Generator generator) {
    Tuple tuple = sinkCall.getOutgoingEntry().getTuple();
    List<String> names = (List<String>) sinkCall.getContext()[0];

    generator.writeBeginObject();
    for (int i = 0; i < tuple.size(); i++) {
        String name = (i < names.size() ? names.get(i) : "tuple" + i);
        generator.writeFieldName(name);
        if (!jdkWriter.write(tuple.getObject(i), generator)) {
            //System.out.println("UNABLE TO WRITE - " + tuple.getObject(i) + " of type class=" + tuple.getObject(i).getClass());
            Object obj = tuple.getObject(i);
            if (obj instanceof Writable && hadoopWriter.write((Writable)obj, generator)) {
                continue;
            }
            return false;
        }
    }
    generator.writeEndObject();
    return true;
}

}

The text was updated successfully, but these errors were encountered:

costin · 2013-11-26T22:25:21Z

Thanks for catching and reporting this. I've pushed a fix in master and also a nightly build (#93).
Please try it out and let me know how it works for you.

Cheers,

jeoffreylim · 2013-11-27T04:34:44Z

Works like a charm, thanks a lot :)

costin closed this as completed in 113ab42 Nov 26, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CascadingValueWriter should also handle hadoop value types and not just jdk value types #110

CascadingValueWriter should also handle hadoop value types and not just jdk value types #110

jeoffreylim commented Nov 26, 2013

costin commented Nov 26, 2013

jeoffreylim commented Nov 27, 2013

CascadingValueWriter should also handle hadoop value types and not just jdk value types #110

CascadingValueWriter should also handle hadoop value types and not just jdk value types #110

Comments

jeoffreylim commented Nov 26, 2013

costin commented Nov 26, 2013

jeoffreylim commented Nov 27, 2013