Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1071: Update adopters page #985

Merged
merged 2 commits into from Dec 31, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 41 additions & 7 deletions site/_docs/adopters.md
Expand Up @@ -14,6 +14,33 @@ but with the ORC 1.1.0 release it is now easier than ever without pulling in
Hive's exec jar and all of its dependencies. OrcStruct now also implements
WritableComparable and can be serialized through the MapReduce shuffle.

### [Apache Spark](https://spark.apache.org/)

Apache Spark has [added
support](https://databricks.com/blog/2015/07/16/joint-blog-post-bringing-orc-support-into-apache-spark.html)
for reading and writing ORC files with support for column project and
predicate push down.

### [Apache Arrow](https://arrow.apache.org/)

Apache Arrow supports reading and writing [ORC file format](https://arrow.apache.org/docs/index.html?highlight=orc#apache-arrow).

### [Apache Flink](https://flink.apache.org/)

Apache Flink supports
[ORC format in Table API](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/table/formats/orc/)
for reading and writing ORC files.

### [Apache Iceberg](https://iceberg.apache.org/)

Apache Iceberg supports [ORC spec](https://iceberg.apache.org/#spec/#orc) to use ORC tables.

### [Apache Druid](https://druid.apache.org/)

Apache Druid supports
[ORC extension](https://druid.apache.org/docs/0.22.1/development/extensions-core/orc.html#orc-extension)
to ingest and understand the Apache ORC data format.

### [Apache Hive](https://hive.apache.org/)

Apache Hive was the original use case and home for ORC. ORC's strong
Expand All @@ -22,6 +49,12 @@ down, and vectorization support make Hive [perform
better](https://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/)
than any other format for your data.

### [Apache Gobblin](https://gobblin.apache.org/)

Apache Gobblin supports
[writing data to ORC files](https://gobblin.apache.org/docs/case-studies/Writing-ORC-Data/)
by leveraging Apache Hive's SerDe library.

### [Apache Nifi](https://nifi.apache.org/)

Apache Nifi is [adding
Expand All @@ -33,13 +66,6 @@ ORC files.
Apache Pig added support for reading and writing ORC files in [Pig
14.0](https://hortonworks.com/blog/announcing-apache-pig-0-14-0/).

### [Apache Spark](https://spark.apache.org/)

Apache Spark has [added
support](https://databricks.com/blog/2015/07/16/joint-blog-post-bringing-orc-support-into-apache-spark.html)
for reading and writing ORC files with support for column project and
predicate push down.

### [EEL](https://github.com/51zero/eel-sdk)

EEL is a Scala BigData API that supports reading and writing data for
Expand All @@ -58,6 +84,14 @@ or directly into Hive tables backed by an ORC file format.
With more than 300 PB of data, Facebook was an [early adopter of
ORC](https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/) and quickly put it into production.

### [LinkedIn](https://linkedin.com)

LinkedIn uses
[the ORC file format](https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin)
with Apache Iceberg metadata catalog and Apache Gobblin to provide our data customers with high-query performance.

https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin

### [Trino (formerly Presto SQL)](https://trino.io/)

The Trino team has done a lot of work [integrating
Expand Down