- 08/2022 - Initial Release
This collection of Cloud Pak for Data - Data Fabric terraform automation layers has been crafted from a set of Terraform modules created by the IBM GSI Ecosystem Lab team part of the IBM Partner Ecosystem organization. Please contact Matthew Perrins mjperrin@us.ibm.com, or Andrew Trice amtrice@us.ibm.com or, Sumeet Kapoor sumeet_kapoor@in.ibm.com, or Sasikanth Gumpana bgumpana@in.ibm.com, or Snehal Pansare spansari@in.ibm.com for more details or raise an issue on the repository.
The automation will support the installation of Data Fabric Solution on three cloud platforms (AWS, Azure, and IBM Cloud). Data Fabric Solution on Cloud Pak for Data that is required to install additional tools, services or cartridges, such as Watson Knowledge Catalog, Watson Studio, Watson Machine Learning, Data Virtualization, or multi-product solutions like Data Fabric.
The Cloud Pak for Data 4.0 - Data Fabric automation assumes you have an OpenShift cluster already configured on your cloud of choice. The supported managed options are ROSA for AWS, ARO for Azure or ROKS for IBM Cloud .
Before you start to install and configure Cloud Pak for Data, you will need to identify what your target infrastructure is going to be. You can start from scratch and use one of the pre-defined reference architectures from IBM or bring your own.
The reference architectures are provided in three different forms, with increasing security and associated sophistication to support production configuration. These three forms are as follows:
-
Quick Start - a simple architecture to quickly get an OpenShift cluster provisioned
-
Standard - a standard production deployment environment with typical security protections, private endpoints, VPN server, key management encryption, etc
-
Advanced - a more advanced deployment that employs network isolation to securely route traffic between the different layers.
For each of these reference architecture, we have provided a detailed set of automation to create the environment for the software. If you do not have an OpenShift environment provisioned, please use one of these. They are optimized for the installation of this solution.
Note: Cloud Pak for Data 4.0 system requirements recommend at least 3 worker nodes, with minimum 16vCPU per node and minimum 64 GB RAM per node (128 GB RAM is recommended) for base platform. For Data Fabric Solution total 8 worker nodes, with minimum 16vCPU per node and minimum 64 GB RAM per node are required.
Cloud Platform | Automation and Documentation |
---|---|
IBM Cloud | IBM Cloud Quick Start IBM Cloud Standard IBM Cloud Advanced |
AWS | AWS Quick Start AWS Standard AWS Advanced |
Azure | Azure Quick Start Azure Standard Azure Advanced |
Bring Your Own Infrastructure | You will need a cluster with at least 16 CPUs and 64 GB of memory per node and at least 8 nodes to setup Data Fabric Solution on Cloud pak for Data. |
Within this repository you will find a set of Terraform template bundles that embody best practices for Configure and Setup the Data Fabric in multiple cloud environments. This README.md
describes the SRE steps required to Configure and Setup the Data Fabric Solution.
This suite of automation can be used for a Proof of Technology environment to Configure and Setup the Data Fabric Solution with a fully working end-to-end cloud-native environment. The software installs using GitOps best practices with Red Hat Open Shift GitOps
The following reference architecture represents the logical view of how Data Fabric Solution works after it is installed. Data Fabric is Configured on top of Data Foundation which deployed with either Portworx or OpenShift Data Foundation storage, within an OpenShift Cluster, on the Cloud provider of your choice.
The following instructions will help you to Configure and Setup the Data Fabric Solution on AWS, Azure, and IBM Cloud OpenShift Kubernetes environment.
Details on Cloud Pak for Data licensing available at https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=planning-licenses-entitlements
You must have your IBM entitlement API key to access images in the IBM Entitled Registry.
After you purchase Cloud Pak for Data, an entitlement API key for the software is associated with your My IBM account. You need this key to complete the Cloud Pak for Data installation. To obtain the entitlement key, complete the following steps:
- Log in to Container software library on My IBM with the IBM ID and password that are associated with the entitled software.
- On the Get entitlement key tab, select Copy key to copy the entitlement key to the clipboard.
- Save the API key for later in this installation.
The Data Fabric automation is broken into what we call layers of automation or bundles. The bundles enable SRE activities to be optimized. The automation is generic between clouds other than configuration storage options, which are platform specific.
BOM ID | Name | Description | Run Time |
---|---|---|---|
200 | 200 - OpenShift Gitops | Set up OpenShift GitOps tools in an OpenShift cluster. This is required to install the software using gitops approaches. | 10 Mins |
210 | 210 - IBM Portworx Storage 210 - IBM OpenShift Data Foundation 210 - AWS Portworx Storage 210 - Azure Portworx Storage |
Use this automation to deploy a storage solution for your cluster. | 10 Mins |
300 | 300 - Cloud Pak for Data Entitlement | Update the OpenShift Cluster with your entitlement key | 5 Mins |
305 | 305 - Cloud Pak for Data Foundation | Deploy the Cloud Pak for Data (4.0) Foundation components | 45 Mins |
600 | 600 - datafabric-services-odf 600 - datafabric-services-portworx | Deploy the Data Fabric Servides with odf or portworx | 120 Mins |
610 | 610 - datafabric-setup | Deploy the Data Farbic Setup | 5 Mins |
At this time the most reliable way of running this automation is with Terraform in your local machine either through a bootstrapped container image or with native tools installed. We provide a Container image that has all the common SRE tools installed. CLI Tools Image, Source Code for CLI Tools
Before you start the installation please install the pre-req tools on your machine.
We have tested this on a modern Mac laptop. We are testing on M1 machines. You will need to setup the tools natively in your M1 Mac OS and not run the
launch.sh
script.
Please install the following Pre-Req tools to help you get started with the SRE tasks for installing Data Foundation into an existing OpenShift Cluster on AWS, Azure, or IBM Cloud.
Pre-requisites:
-
Check you have a valid GitHub ID that can be used to create a repository in your own organization GitHub or GitHub Enterprise account.
-
Install a code editor, we recommend Visual Studio Code
-
Install Brew
Ensure you have the following before continuing:
-
Github account exists
-
A Github token is available with permissions set to create and remove repositories
-
You are able to login to the OpenShift cluster and obtain an OpenShift login token
-
Cloud Pak entitlement key, this can be obtained from visiting the IBM Container Library as described above.
The installation process will use a standard GitOps repository that has been built using the Modules to support Data Foundation installation. The automation is consistent across three cloud environments AWS, Azure, and IBM Cloud.
At this time the most reliable way of running this automation is with Terraform in your local machine either through a bootstrapped docker image or Virtual Machine. We provide both a container image and a virtual machine cloud-init script that have all the common SRE tools installed.
We recommend using Docker Desktop if choosing the container image method, and Multipass if choosing the virtual machine method. Detailed instructions for downloading and configuring both Docker Desktop and Multipass can be found in RUNTIMES.md
-
First step is to clone the automation code to your local machine. Run this git command in your favorite command line shell.
git clone https://github.com/IBM/automation-data-fabric.git
-
Navigate into the
automation-data-fabric
folder using your command line.a. The README.md has a comprehensive instructions on how to install this into other cloud environments than TechZone. This document focuses on getting it running in a TechZone requested environment.
-
Next you will need to set-up your credentials.properties file. This will enable a secure deployment to your cluster.
cp credentials.template credentials.properties code credential.properties
In the
credentials.properties
file you will need to populate the values for your deployment.## Add the values for the Credentials to access the OpenShift Environment ## Instructions to access this information can be found in the README.MD ## This is a template file and the ./launch.sh script looks for a file based on this template named credentials.properties ## gitops_repo_host: The host for the git repository export TF_VAR_gitops_repo_host=github.com ## gitops_repo_username: The username of the user with access to the repository export TF_VAR_gitops_repo_username= ## gitops_repo_token: The personal access token used to access the repository export TF_VAR_gitops_repo_token= ## TF_VAR_server_url: The url for the OpenShift api server export TF_VAR_server_url= ## TF_VAR_cluster_login_token: Token used for authentication to the api server export TF_VAR_cluster_login_token= ## TF_VAR_entitlement_key: The entitlement key used to access the IBM software images in the container registry. Visit https://myibm.ibm.com/products-services/containerlibrary to get the key export TF_VAR_entitlement_key= # Only needed if targeting IBM Cloud Deployment export TF_VAR_ibmcloud_api_key= # AWS Credentials are required to Create AWS S3 bucket and upload Datafiles to the S3 Bucket (https://github.com/IBM/automation-data-fabric/tree/main/610-datafabric-setup/terraform/Datafiles) ## particular permissions in order to interact with the account and the OpenShift cluster. Use the ## provided `aws-portworx-credentials.sh` script to retrieve/generate these credentials. ## TF_VAR_access_key= TF_VAR_secret_key= ## ## Azure credentials ## Credentials are required to install Portworx on an Azure account. These credentials must have ## particular permissions in order to interact with the account and the OpenShift cluster. Use the ## provided `azure-portworx-credentials.sh` script to retrieve/generate these credentials. ## ## TF_VAR_azure_subscription_id: The subscription id for the Azure account. This is required if Azure portworx is used export TF_VAR_azure_subscription_id= ## TF_VAR_azure_tenant_id: The tenant id for the Azure account. This is required if Azure portworx is used export TF_VAR_azure_tenant_id= ## TF_VAR_azure_client_id: The client id of the user for the Azure account. This is required if Azure portworx is used export TF_VAR_azure_client_id= ## TF_VAR_azure_client_secret: The client id of the user for the Azure account. This is required if Azure portworx is used export TF_VAR_azure_client_secret=
-
Add your Git Hub username and your Personal Access Token to
gitops_repo_username
andgitops_repo_token
-
From you OpenShift console click on top right menu and select Copy login command and click on Display Token
-
Copy the API Token value into the
cluster_login_token
value -
Copy the Server URL into the
server_url
value, only the part starting with https -
Copy the entitlement key, this can be obtained from visiting the IBM Container Library and place it in the
entitlement_key
variable.
- Provide the IBM Cloud API Key for the target IBM Cloud account as the value for
TF_VAR_ibmcloud_api_key
If Cloud Pak for Data(CP4D) will be deployed on OpenShift deployed on Azure, the credentials for the Azure account need to be provided. Several clis are required for these steps:
az
cli - https://docs.microsoft.com/en-us/cli/azure/install-azure-clijq
cli - https://stedolan.github.io/jq/download/
You can install these clis on your local machine OR run the following commands within the provided container image by running launch.sh
-
Log into your Azure account
az login
-
Run the
azure-portworx-credentials.sh
script to gather/create the credentials:./azure-portworx-credentials.sh -t {cluster type} -g {resource group name} -n {cluster name} [-s {subscription id}]
where:
- cluster type is the type of OpenShift cluster (
aro
oripi
). - resource group name is the name of the Azure resource group where the cluster has been provisioned.
- cluster name is the name of the OpenShift cluster.
- subscription id is the subscription id of the Azure account. If a value is not provided it will be looked up.
- cluster type is the type of OpenShift cluster (
-
Update
credentials.properties
with the values output from the script.{ "azure_client_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "azure_client_secret": "XXXXXXX", "azure_tenant_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "azure_subscription_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" }
-
If you used the container image to run the script, type
exit
to close the container shell then re-runglaunch.sh
to pick up the changes to the environment variables.
- Follow the steps to download the portworx confituration spec
- Copy the downloaded file into the root directory of the cloned automation-data-foundation repository
-
Launch the automation runtime.
- If using Docker Desktop, run
./launch.sh
. This will start a container image with the prompt opened in the/terraform
directory. - If using Multipass, run
mutlipass shell cli-tools
to start the interactive shell, and cd into the/automation/{template}
directory, where{template}
is the folder you've cloned this repo. Be sure to runsource credentials.properties
once in the shell.
- If using Docker Desktop, run
-
Next we need to create a workspace to run the Terraform automation. Below you can see the parameters to configure your workspace for terraform execution.
/terraform $ ./setup-workspace.sh -h Creates a workspace folder and populates it with automation bundles you require. Usage: setup-workspace.sh options: -p Cloud provider (aws, azure, ibm) -s Storage (portworx or odf) -n (optional) prefix that should be used for all variables -x (optional) Portworx spec file - the name of the file containing the Portworx configuration spec yaml -c (optional) Self-signed Certificate Authority issuer CRT file -h Print this help
You will need to select the cloud provider of your choice, storage option, and if desired, a prefix for naming new resource instances on the Cloud account. If you are using Azure, you will need a Portworx spec file name (as described above), and if your cluster is using a self-signed SSL certificate, you will need a copy of the issuer cert and the file name.
-
Run the command
setup-workspace.sh -p ibm -s portworx -n df
and include optional parameters as needed./terraform $ ./setup-workspace.sh -p ibm -s portworx -n df Setting up workspace in '/terraform/../workspaces/current' ***** Setting up workspace from '' template ***** Setting up automation /workspaces/current /terraform Setting up current/200-openshift-gitops from 200-openshift-gitops Skipping 210-aws-portworx-storage because it does't match ibm Skipping 210-azure-portworx-storage because it does't match ibm Setting up current/210-ibm-odf-storage from 210-ibm-odf-storage Setting up current/210-ibm-portworx-storage from 210-ibm-portworx-storage Setting up current/300-cloud-pak-for-data-entitlement from 300-cloud-pak-for-data-entitlement Setting up current/305-cloud-pak-for-data-foundation from 305-cloud-pak-for-data-foundation Setting up current/600-datafabric-services-odf [or] 600-datafabric-services-portworx from 600-datafabric-services-odf [or] 600-datafabric-services-portworx Setting up current/610-datafabric-setup from 610-datafabric-setup move to /workspaces/current this is where your automation is configured
-
The default
terraform.tfvars
file is symbolically linked to the newworkspaces/current
folder so this enables you to edit the file in your native operating system using your editor of choice. -
Edit the default
terraform.tfvars
file this will enable you to setup the GitOps parameters.
The following you will be prompted for and some suggested values.
Variable | Description | Suggested Value |
---|---|---|
gitops-repo_host | The host for the git repository. | github.com |
gitops-repo_type | The type of the hosted git repository (github or gitlab). | github |
gitops-repo_org | The org/group/username where the git repository exists | github userid or org - if left blank the value will default to your username |
gitops-repo_repo | The short name of the repository to create | cp4d-gitops |
The gitops-repo_repo
, gitops-repo_token
, entitlement_key
, server_url
, and cluster_login_token
values will be loaded automatically from the credentials.properties file that was configured in an earlier step.
-
The
cp4d-instance_storage_vendor
variable should have already been populated by thesetup-workspace.sh
script. This should have the valueportworx
orocs
, depending on the selected storage option. -
You will see that the
repo_type
andrepo_host
are set to GitHub you can change these to other Git Providers, like GitHub Enterprise or GitLab. -
For the
repo_org
value set it to your default org name, or specific a custom org value. This is the organization where the GitOps Repository will be created in. Click on top right menu and select Your Profile to take you to your default organization. -
Set the
repo_repo
value to a unique name that you will recognize as the place where the GitOps configuration is going to be placed before Data Foundation is installed into the cluster. -
You can change the
gitops-cluster-config_banner_text
banner text to something useful for your client project or demo. -
Save the
terraform.tfvars
file -
Navigate into the
/workspaces/current
folder❗️ Do not skip this step. You must execute from the
/workspaces/current
folder.
-
To perform the deployment automatically, execute the
./apply-all.sh
script in the/workspaces/current
directory. This will apply each of the Data Foundation layers sequentially. This operation will complete in 10-15 minutes, and the Data Foundation will continue asycnchronously in the background. This can take an additional 45 minutes.Alternatively you can run each of the layers individually, by following the manual deployment instructions.
Once complete, skip to the Access the Data Foundation Deployment section
-
You can also deploy each layer manually. To begin, navigate into the
200-openshift-gitops
folder and run the following commandscd 200-openshift-gitops terraform init terraform apply --auto-approve
-
This will kick off the automation for setting up the GitOps Operator into your cluster. Once complete, you should see message similar to:
Apply complete! Resources: 78 added, 0 changed, 0 destroyed.
-
You can check the progress by looking at two places, first look in your github repository. You will see the git repository has been created based on the name you have provided. The Cloud Pak for Data (CP4D) install will populate this with information to let OpenShift GitOps install the software. The second place is to look at the OpenShift console, Click Workloads->Pods and you will see the GitOps operator being installed.
-
Change directories to the
210-*
folder and run the following commands to deploy storage into your cluster:If you are using IBM Techzone's IBM Cloud ROKS OpenShift Cluster (VPC), OCS is already configured. You can skip this step.
cd 210-ibm-portworx-storage terraform init terraform apply --auto-approve
This folder will vary based on the platform and storage options that you selected in earlier steps.
Storage configuration will run asynchronously in the background inside of the Cluster and should be complete within 10 minutes.
-
Change directories to the
300-cloud-pak-for-data-entitlement
folder and run the following commands to deploy entitlements into your cluster:cd ../300-cloud-pak-for-data-entitlement terraform init terraform apply --auto-approve
This step does not require worker nodes to be restarted as some other installation methods describe.
-
Change directories to the
305-cloud-pak-for-data-foundation
folder and run the following commands to deploy Data Foundation 4.0 into the cluster.cd ../305-cloud-pak-for-data-foundation terraform init terraform apply --auto-approve
Data Foundation deployment will run asynchronously in the background, and may require up to 45 minutes to complete.
-
Change directories to the
600-datafabric-services-odf [or] 600-datafabric-services-portworx
folder and run the following commands to deploy Data Fabric Services (WKC, WS, WML, DV) into the cluster.cd ../600-datafabric-services-odf [or] 600-datafabric-services-portworx terraform init terraform apply --auto-approve
Data Fabric Services (WKC, WS, WML, DV & DV provision) will run asynchronously in the background, and may require up to 120 minutes to complete.
-
Change directories to the
610-datafabric-setup
folder and run the following commands to a. Create AWS S3 Bucket b. Upload Datafiles to AWS S3 Bucket c. Configure Data Fabric Solution on top of Cloud pak for Datacd ../610-datafabric-setup terraform init terraform apply --auto-approve
Data Fabric setup will run asynchronously in the background, and may require up to 5 minutes to complete.
-
You can check the progress of the deployment by opening up Argo CD (OpenShift GitOps). From the OpenShift user interface, click on the Application menu 3x3 Icon on the header and select Cluster Argo CD menu item.)
This process will take between 3 to 4 hours to complete. During the deployment, several cluster projects/namespaces and deployments will be created.
-
Once deployment is complete, go back into the OpenShift cluster user interface and navigate to view
Routes
for thecp4d
namespace. Here you can see the URL to the deployed Data Foundation instance. Open this url in a new browser window. -
Navigate to
Secrets
in thecp4d
namespace, and find theadmin-user-details
secret. Copy the value ofinitial_admin_password
key inside of that secret. -
Go back to the Cloud Pak for Data Foundation instance that you opened in a separate window. Log in using the username
admin
with the password copied in the previous step.
This concludes the instructions for installing Data Foundation on AWS, Azure, and IBM Cloud.
Now that the Data Foundation deployment is complete you can deploy Cloud Pak for Data services into this cluster.
Please refer to the Troubleshooting Guide for uninstallation instructions and instructions to correct common issues.
If you continue to experience issues with this automation, please file an issue or reach out on our public Dischord server.
This set of automation packages was generated using the open-source isacable
tool. This tool enables a Bill of Material yaml file to describe your software requirements. If you want up stream releases or versions you can use iascable
to generate a new terraform module.
The
iascable
tool is targeted for use by advanced SRE developers. It requires deep knowledge of how the modules plug together into a customized architecture. This repository is a fully tested output from that tool. This makes it ready to consume for projects.