This repository provides Terraform resources to quickly deploy Cloudera on Cloud and associated pre-requisite Cloud Service Provider (CSP) resources. It uses the CDP Terraform Modules to do this.
A summary requirements, configuration and execution steps to use this repository is given below.
To use the module provided here, you will need:
- An AWS, Azure, or GCP Cloud account;
- A Cloudera on cloud account (you can sign up for a 60-day free pilot );
- A recent version of Terraform software (version 0.13 or higher).
-
Terraform can be installed by following the instructions at https://developer.hashicorp.com/terraform/downloads.
-
If you have not yet configured your
~/.cdp/credentials
file, follow the steps for Generating an API access key. -
To create resources in the Cloud Provider, access credentials or service account are needed for authentication.
- For AWS access keys are required to be able to create the Cloud resources via the Terraform aws provider. See the AWS documentation for Managing access keys.
- For Azure, authentication with the Azure subscription is required. There are a number of ways to do this outlined in the Azure Terraform Provider documentation.
- For GCP, authentication with the GCP API is required. There are a number of ways to do this outlined in the Google Terraform Provider documentation.
Note
See the Additional Authentication & Configuration Notes section for further details on authentication with the Cloud Providers.
Important
Make sure your Cloudera and you Cloud provider credentials are properly configured before proceeding
git clone https://github.com/cloudera-labs/cdp-tf-quickstarts.git
cd cdp-tf-quickstarts
Change to required cloud provider directory and create a terraform.tfvars
file with variable configuration for your deployment.
Reference the terraform.tfvars.template
in each cloud provider directory and the sample contents with indicators of values to change shown below.
# Change into cloud provider directory, e.g. for aws
cd aws
cp terraform.tfvars.template terraform.tfvars
vi terraform.tfvars
Expand for AWS configuration file
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and CDP resources, e.g. cldr1
# ------- Cloud Settings -------
aws_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eu-west-1
# ------- CDP Environment Deployment -------
deployment_template = "<ENTER_VALUE>" # Specify the deployment pattern below. Options are public, semi-private or private
Expand for Azure configuration file
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and CDP resources, e.g. cldr1
# ------- Cloud Settings -------
azure_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eastus
# ------- CDP Environment Deployment -------
deployment_template = "<ENTER_VALUE>" # Specify the deployment pattern below. Options are public, semi-private or private
Expand for GCP configuration file
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and CDP resources, e.g. cldr1
# ------- Cloud Settings -------
gcp_project = "<ENTER_VALUE>" # Change this to specify the GCP Project ID
gcp_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. europe-west2
# ------- CDP Environment Deployment -------
deployment_template = "<ENTER_VALUE>" # Specify the deployment pattern below. Options are public, semi-private or private
terraform init
terraform apply
⏱️ Note: The deployment can take up to 60 minutes.
You can follow the deployment process on the Cloudera on cloud Management Console from your browser at cdp.cloudera.com.
After it completes, you can add Data Hubs and Data Services to your newly deployed environment from the Management Console UI or using the CLI.
If you no longer need the infrastructure and Cloudera on cloud environment that's provisioned by Terraform, run the following command to remove the deployment infrastructure and terminate all resources.
terraform destroy
⏱️ Note: Cleanup of the deployment will take about 20 minutes.
By default the Terraform quickstarts will create a new SSH keypair that will be associated with all nodes provisioned by Cloudera on cloud. The private key will be stored in the <env_prefix>-ssh-key.pem
file of the Terraform cloud provider project directory.
To use an existing SSH key, set the keypair name (for AWS) or public key text (for Azure and GCP) variable in the terraform.tvars
file.
The optional variable ingress_extra_cidrs_and_ports
in the terraform.tvars
file defines the list of client IP allowed to access - via ssh and https - the UI and API endpoints of your deployment.
When commented, this variable defaults to current public IP of the terraform client. In case this IP is a leased one - hence that might change overtime - you can uncomment this variable and set additional CIDRs or IP ranges via the ingress_extra_cidrs_and_ports
variable.
-
Details of the different methods to authenticate with AWS are available in the aws Terraform provider docs.
-
The most common ways to specify AWS access and secret keys are:
- via environment variables (i.e. setting the
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
) or; - via shared configuration/credential files (e.g. the
$HOME/.aws/credentials
file). TheAWS_PROFILE
environment variable can be set to specify a named AWS profile.
- via environment variables (i.e. setting the
-
Note that the AWS region to use should always be specifed as a Terraform input variable (with the
aws_region
variable). This region variable is also used an input to the CDP deploy module used to identify the Cloud Provider region.
-
Where you have more than one Azure Subscription the id to use can be passed via the the
ARM_SUBSCRIPTION_ID
environment variable. -
When using a Service Principal (SP) to authenticate with Azure, it is not possible to authenticate with azuread Terraform Provider (the provider used to create the Azure Cross Account AD Application) with the command az login --service-principal. We found the the best way to authenticate using an SP is by setting environment variables. Details of required environment variables are in the azuread docs and azurerm docs and summarized below.
export ARM_CLIENT_ID="<sp_client_id>" export ARM_CLIENT_SECRET="<sp_client_secret>" export ARM_TENANT_ID="<sp_tenant_id>" export ARM_SUBSCRIPTION_ID="<sp_subscription_id>"
-
The Azure API permissions listed are required by the provisioning account to create the Azure pre-requisite resources. Note that all permissions are of type Application (rather than Delegated).
API Permission | Purpose |
---|---|
Microsoft Graph - Application.Read.All | Read all applications |
Microsoft Graph - Application.ReadWrite.All | Read and write all applications |
Microsoft Graph - Application.ReadWrite.OwnedBy | Manage apps that this app creates or owns |
Microsoft Graph - Directory.ReadWrite.All | Read and write directory data |
Microsoft Graph - User.Read.All | Read all users' full profiles |
-
The Getting Started Docs for Google Terraform Provider gives details on the two recommended ways to authenticate with the GCP API.
-
The Google Cloud SDK (
gcloud
) can be installed and a User Application Default Credentials ("ADCs") can be created by running the commandgcloud auth application-default login
-
A Google Cloud Service Account key file can be generated and downloaded. The
GOOGLE_APPLICATION_CREDENTIALS
environment variable can then be set to the location of the file.export GOOGLE_APPLICATION_CREDENTIALS=<location_of_gcp_sa_json_file>
-
-
The Google Cloud IAM roles listed below are required by the provisioning account to create the GCP pre-requisite resources.
IAM Role Compute Network Admin Compute Security Admin Role Administrator Security Admin Service Account Admin Service Account Key Admin Storage Admin Viewer -
The Google project Id can be specified via the
gcp_project
input variable, theGOOGLE_PROJECT
environment variable or the default project set via the Cloud SDK. This is described in the Google Provider Default Values Configuration documentation.