[CASCADING] Provide the sink implementation for ParquetTupleScheme #285

Merged
merged 4 commits into from Feb 28, 2014

5 participants

@mickaellcr
Apache Parquet member

Add the sink implementation into ParquetTupleScheme

  • Only primitive type are supported

TODO (before accepting the pull request)

  • Unit test
  • Documentation
@mickaellcr
Apache Parquet member

Pull request for the issue : #284

@mickaellcr
Apache Parquet member

@JasonRuckman, Take a look at the code we have made, I will start looking at yours.

FYI, I'm only working with PrimitiveType for the moment (no map, array, list as you did). I will see how you did it.
And I'm missing some unit test :D

@JasonRuckman

Will do, I think the internals of my stuff could use a good refactor since when I did it, it was more of a personal project. So I'd love any sort of feedback.

@julienledem julienledem and 1 other commented on an outdated diff Jan 28, 2014
.../main/java/parquet/cascading/ParquetWriteSupport.java
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import parquet.hadoop.api.WriteSupport;
+import parquet.io.api.Binary;
+import parquet.io.api.RecordConsumer;
+import parquet.schema.MessageType;
+import parquet.schema.MessageTypeParser;
+import parquet.schema.PrimitiveType;
+import parquet.schema.Type;
+
+/**
+ *
+ *
+ * @author Mickaël Lacour <m.lacour@criteo.com>
+ */
+public class ParquetWriteSupport extends WriteSupport<TupleEntry> {
@julienledem
Apache Parquet member

[Cascading]TupleWriteSupport ?

@mickaellcr
Apache Parquet member

As you want, I'm in for TupleWriteSupport.

@julienledem
Apache Parquet member

sounds good

@mickaellcr
Apache Parquet member

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@JasonRuckman

@mickaellcr I think the submodule is in a state now (minus a few nits here and there) where I can start to integrate it with what you've done so I'll begin doing that over the next couple days.

@mickaellcr mickaellcr [CASCADING] Provide the sink implementation
in order to write some parquet files with ParquetTupleScheme
76bbf4a
@julienledem
Apache Parquet member

LGTM!
should I merge?

@mickaellcr
Apache Parquet member

I would say yes :p But I would like to create another issue to handle the case of complex type with Cascading. What do you think ?

@fs111

I am trying to create a Cascading lingual provider (http://www.cascading.org/lingual/) for parquet, so that it can be used for SQL processing. This patch looks like the missing link for me. Do you guys have any idea, when it will be merged?

@julienledem julienledem merged commit 6063921 into Parquet:master Feb 28, 2014

1 check passed

Details default The Travis CI build passed
@julienledem
Apache Parquet member

@fs111 I just did
@mickaellcr please open an issue. How do people do it? At twitter we use the thrift integration for complex types in cascading.

@fs111

@julienledem cool, thanks!

@quintona quintona referenced this pull request in Cascading/lingual Mar 18, 2014
Open

Parquet provider #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment