Installing the connector
To install the connector manually, complete the following steps.
Ensure authenticated Cloud Storage access
Depending on where the machines which comprise your cluster are located, you must do one of the following:
- Google Cloud Platform - Each Google Compute Engine VM must be configured to have access to the Cloud Storage scope you intend to use the connector for.
- non-Google Cloud Platform - Obtain an OAuth 2.0 private key. Installing the connector on a machine other than a GCE VM can lead to higher Cloud Storage access costs. For more information, see Cloud Storage Pricing.
Add the connector jar to Hadoop's classpath
Placing the connector jar in the
$HADOOP_COMMON_LIB_JARS_DIR directory should
be sufficient to have Hadoop load the jar. Alternatively, to be certain that the
jar is loaded, you can add
hadoop-env.sh in the Hadoop configuration directory.
To begin, you will need a JSON keyfile so the connector can authenticate to Google Cloud Storage. You can follow these directions to obtain a JSON keyfile.
Once you have the JSON key file, you need to configure following properties in
core-site.xml on your server:
<property> <name>fs.AbstractFileSystem.gs.impl</name> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value> <description>The AbstractFileSystem for gs: uris.</description> </property> <property> <name>fs.gs.project.id</name> <value></value> <description> Optional. Google Cloud Project ID with access to GCS buckets. Required only for list buckets and create bucket operations. </description> </property> <property> <name>google.cloud.auth.service.account.enable</name> <value>true</value> <description> Whether to use a service account for GCS authorization. Setting this property to `false` will disable use of service accounts for authentication. </description> </property> <property> <name>google.cloud.auth.service.account.json.keyfile</name> <value>/path/to/keyfile</value> <description> The JSON key file of the service account used for GCS access when google.cloud.auth.service.account.enable is true. </description> </property>
You can alternatively set the environment variable
GOOGLE_APPLICATION_CREDENTIALS to your keyfile (
Additional properties can be specified for the Cloud Storage connector,
including alternative authentication options. For more information, see the
values documented in the gcs-core-default.xml
file inside the
Note that you do not need to configure Hadoop in order to use the GCS connector with Spark.
If you are using Spark with Hadoop, Spark may not install a Hadoop
core-site.xml in its
conf dir, so you may need to create one or,
spark.hadoop.* properties in the
Custom Hadoop/Hive Configuration).
Otherwise, with Spark the preferred configuration is to set the following
spark-defaults.conf instead of using
spark.hadoop.google.cloud.auth.service.account.enable true spark.hadoop.google.cloud.auth.service.account.json.keyfile <path/to/keyfile.json>
Test the installation
On the command line, type
hadoop fs -ls gs://<some-bucket>, where
<some-bucket> is the Google Cloud Storage bucket to which you gave the
connector read access. The command should output the top-level directories and
objects contained in
<some-bucket>. If there is a problem, see Troubleshooting
Troubleshooting the installation
- If the installation test reported
No FileSystem for scheme: gs, make sure that you correctly set the two properties in the correct
- If the test reported
java.lang.ClassNotFoundException: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem, check that you added the connector to the Hadoop/Spark classpath. If this error caused by
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkState(ZLjava/lang/String;J)V, it is likely due to a conflicting version of transitive dependencies (in this case, Guava); using a shaded version of the
gcs-connectorjar can resolve this.
- If the test issued a message related to authorization, make sure that you
have access to Cloud Storage using
gsutil ls -b gs://<some-bucket>), and that the credentials in your configuration are correct.