diff --git a/docs/configurations.md b/docs/configurations.md index 9b4cf5d0a2b3..23c8b8795ea8 100644 --- a/docs/configurations.md +++ b/docs/configurations.md @@ -81,4 +81,7 @@ summary: "Here we list all possible configurations and what they mean" - [S3Configs](s3_hoodie.html) (Hoodie S3 Configs)
Configurations required for S3 and Hoodie co-operability. + - [GCSConfigs](gcs_hoodie.html) (Hoodie GCS Configs)
+ Configurations required for GCS and Hoodie co-operability. + {% include callout.html content="Hoodie is a young project. A lot of pluggable interfaces and configurations to support diverse workloads need to be created. Get involved [here](https://github.com/uber/hoodie)" type="info" %} diff --git a/docs/gcs_filesystem.md b/docs/gcs_filesystem.md new file mode 100644 index 000000000000..7f9a0227bea3 --- /dev/null +++ b/docs/gcs_filesystem.md @@ -0,0 +1,62 @@ +--- +title: GCS Filesystem (experimental) +keywords: sql hive gcs spark presto +sidebar: mydoc_sidebar +permalink: gcs_hoodie.html +toc: false +summary: In this page, we go over how to configure hoodie with Google Cloud Storage. +--- +Hoodie works with HDFS by default and GCS **regional** buckets provide an HDFS API with strong consistency. + +## GCS Configs + +There are two configurations required for Hoodie GCS compatibility: + +- Adding GCS Credentials for Hoodie +- Adding required jars to classpath + +### GCS Credentials + +Add the required configs in your core-site.xml from where Hoodie can fetch them. Replace the `fs.defaultFS` with your GCS bucket name and Hoodie should be able to read/write from the bucket. + +```xml + + fs.defaultFS + gs://hoodie-bucket + + + + fs.gs.impl + com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem + The FileSystem for gs: (GCS) uris. + + + + fs.AbstractFileSystem.gs.impl + com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS + The AbstractFileSystem for gs: (GCS) uris. + + + + fs.gs.project.id + GCS_PROJECT_ID + + + google.cloud.auth.service.account.enable + true + + + google.cloud.auth.service.account.email + GCS_SERVICE_ACCOUNT_EMAIL + + + google.cloud.auth.service.account.keyfile + GCS_SERVICE_ACCOUNT_KEYFILE + +``` + +### GCS Libs + +GCS hadoop libraries to add to our classpath + +- com.google.cloud.bigdataoss:gcs-connector:1.6.0-hadoop2 diff --git a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieWrapperFileSystem.java b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieWrapperFileSystem.java index d413fc5c38da..3b8ba42ed5ec 100644 --- a/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieWrapperFileSystem.java +++ b/hoodie-client/src/main/java/com/uber/hoodie/io/storage/HoodieWrapperFileSystem.java @@ -53,6 +53,10 @@ public class HoodieWrapperFileSystem extends FileSystem { SUPPORT_SCHEMES.add("file"); SUPPORT_SCHEMES.add("hdfs"); SUPPORT_SCHEMES.add("s3"); + + // Hoodie currently relies on underlying object store being fully + // consistent so only regional buckets should be used. + SUPPORT_SCHEMES.add("gs"); } private ConcurrentMap openStreams =