Skip to content

Using terraform, deploy multiple dataproc clusters using a shared hive metastore

Notifications You must be signed in to change notification settings

dunnhumby/democratizing-dataproc

Repository files navigation

This project demonstrates how to use Terraform to create multiple dataproc clusters that use the same, shared, hivemetastore. It is essentially a demonstration of how to use Terraform to achieve the same as what is being demonstrated at Using Apache Hive on Cloud Dataproc using gcloud.

Ensure you have enabled the relevant APIs:

by issuing gcloud services enable dataproc.googleapis.com sqladmin.googleapis.com


The project can be deployed by issuing

gcloud init #choose the project that you will be deploying to
gcloud services enable dataproc.googleapis.com sqladmin.googleapis.com
export GCP_PROJECT=$(gcloud config get-value project)
gsutil mb gs://${GCP_PROJECT}-tf-state #terraform state bucket used as the back-end for the Google provider
make init apply

and destroy using

make destroy

To generate a graph displaying all the resources in this terraform project run

docker build -t graphwiz . &&
   terraform graph -type=plan > graph.dot &&
   docker run -v $(PWD):/tmp graphwiz dot /tmp/graph.dot -Tpng -o /tmp/graph.png &&
   rm graph.dot

Here is the latest committed state of that graph:

About

Using terraform, deploy multiple dataproc clusters using a shared hive metastore

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published