Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix supported file formats for Hadoop vs Native batch doc #7069

Merged
merged 3 commits into from Mar 1, 2019

Conversation

jihoonson
Copy link
Contributor

@jihoonson jihoonson commented Feb 13, 2019

See the discussion at #7044 (comment).

@jon-wei jon-wei added this to the 0.14.0 milestone Feb 13, 2019
@@ -38,6 +38,6 @@ ingestion method.
| Supported [rollup modes](http://druid.io/docs/latest/ingestion/index.html#roll-up-modes) | Perfect rollup | Best-effort rollup | Both perfect and best-effort rollup |
| Supported partitioning methods | [Both Hash-based and range partitioning](http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification) | N/A | Hash-based partitioning (when `forceGuaranteedRollup` = true) |
| Supported input locations | All locations accessible via HDFS client or Druid dataSource | All implemented [firehoses](./firehose.html) | All implemented [firehoses](./firehose.html) |
| Supported file formats | All implemented Hadoop InputFormats | Currently only text file format (CSV, TSV, JSON) | Currently only text file format (CSV, TSV, JSON) |
| Supported file formats | All implemented Hadoop InputFormats | Currently text file formats (CSV, TSV, JSON) by default. Any custom implmentation can be supported. See [FiniteFirehoseFactory](https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/data/input/FiniteFirehoseFactory.java) for custom implementation. | Currently text file formats (CSV, TSV, JSON) by default. Any custom implementation can be supported. See [FiniteFirehoseFactory](https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/data/input/FiniteFirehoseFactory.java) for custom implementation. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the final column should refer to FiniteFirehoseFactory? It doesn't split.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comment. FiniteFirehoseFactory is for any type of batch indexing and split is just optional hint for parallel indexing. I think it's better to recommend to use it for local index task as well because it assumes that input data is finite as opposed to FirehoseFactory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
Looking through the built in hierarchy, IngestSegment is the only FirehoseFactory that seems like you'd want to be able to use it from a local index task that isn't a FiniteFirehoseFactory, and we are fixing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Thanks for your understanding! I also raised a proposal about this: #7071.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest making it more clear that this would be done with an extension, maybe something like:

Additional formats can be added though a [custom extension](link to extension making docs) implementing [`FiniteFirehoseFactory`](https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/data/input/FiniteFirehoseFactory.java)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@fjy fjy merged commit 45f12de into apache:master Mar 1, 2019
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Mar 1, 2019
fjy pushed a commit that referenced this pull request Mar 1, 2019
)

* Fix supported file formats

* address comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants