Cloud Pak for Data 4.0 - Data Fabric Solution Automation for AWS, Azure, and IBM Cloud

Change Log

08/2022 - Initial Release

This collection of Cloud Pak for Data - Data Fabric terraform automation layers has been crafted from a set of Terraform modules created by the IBM GSI Ecosystem Lab team part of the IBM Partner Ecosystem organization. Please contact Matthew Perrins mjperrin@us.ibm.com, or Andrew Trice amtrice@us.ibm.com or, Sumeet Kapoor sumeet_kapoor@in.ibm.com, or Sasikanth Gumpana bgumpana@in.ibm.com, or Snehal Pansare spansari@in.ibm.com for more details or raise an issue on the repository.

The automation will support the installation of Data Fabric Solution on three cloud platforms (AWS, Azure, and IBM Cloud). Data Fabric Solution on Cloud Pak for Data that is required to install additional tools, services or cartridges, such as Watson Knowledge Catalog, Watson Studio, Watson Machine Learning, Data Virtualization, or multi-product solutions like Data Fabric.

Target Infrastructure

The Cloud Pak for Data 4.0 - Data Fabric automation assumes you have an OpenShift cluster already configured on your cloud of choice. The supported managed options are ROSA for AWS, ARO for Azure or ROKS for IBM Cloud .

Before you start to install and configure Cloud Pak for Data, you will need to identify what your target infrastructure is going to be. You can start from scratch and use one of the pre-defined reference architectures from IBM or bring your own.

Reference Architectures

The reference architectures are provided in three different forms, with increasing security and associated sophistication to support production configuration. These three forms are as follows:

Quick Start - a simple architecture to quickly get an OpenShift cluster provisioned
Standard - a standard production deployment environment with typical security protections, private endpoints, VPN server, key management encryption, etc
Advanced - a more advanced deployment that employs network isolation to securely route traffic between the different layers.

For each of these reference architecture, we have provided a detailed set of automation to create the environment for the software. If you do not have an OpenShift environment provisioned, please use one of these. They are optimized for the installation of this solution.

Note: Cloud Pak for Data 4.0 system requirements recommend at least 3 worker nodes, with minimum 16vCPU per node and minimum 64 GB RAM per node (128 GB RAM is recommended) for base platform. For Data Fabric Solution total 8 worker nodes, with minimum 16vCPU per node and minimum 64 GB RAM per node are required.

Cloud Platform	Automation and Documentation
IBM Cloud	IBM Cloud Quick Start IBM Cloud Standard IBM Cloud Advanced
AWS	AWS Quick Start AWS Standard AWS Advanced
Azure	Azure Quick Start Azure Standard Azure Advanced
Bring Your Own Infrastructure	You will need a cluster with at least 16 CPUs and 64 GB of memory per node and at least 8 nodes to setup Data Fabric Solution on Cloud pak for Data.

Getting Started

Within this repository you will find a set of Terraform template bundles that embody best practices for Configure and Setup the Data Fabric in multiple cloud environments. This README.md describes the SRE steps required to Configure and Setup the Data Fabric Solution.

This suite of automation can be used for a Proof of Technology environment to Configure and Setup the Data Fabric Solution with a fully working end-to-end cloud-native environment. The software installs using GitOps best practices with Red Hat Open Shift GitOps

Data Fabric Solution Architecture

The following reference architecture represents the logical view of how Data Fabric Solution works after it is installed. Data Fabric is Configured on top of Data Foundation which deployed with either Portworx or OpenShift Data Foundation storage, within an OpenShift Cluster, on the Cloud provider of your choice.

Deploying Data Fabric Solution

The following instructions will help you to Configure and Setup the Data Fabric Solution on AWS, Azure, and IBM Cloud OpenShift Kubernetes environment.

Licenses and Entitlements

Details on Cloud Pak for Data licensing available at https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=planning-licenses-entitlements

Obtaining your IBM entitlement API key

You must have your IBM entitlement API key to access images in the IBM Entitled Registry.

After you purchase Cloud Pak for Data, an entitlement API key for the software is associated with your My IBM account. You need this key to complete the Cloud Pak for Data installation. To obtain the entitlement key, complete the following steps:

Log in to Container software library on My IBM with the IBM ID and password that are associated with the entitled software.
On the Get entitlement key tab, select Copy key to copy the entitlement key to the clipboard.
Save the API key for later in this installation.

Data Fabric Layered Installation

The Data Fabric automation is broken into what we call layers of automation or bundles. The bundles enable SRE activities to be optimized. The automation is generic between clouds other than configuration storage options, which are platform specific.

BOM ID	Name	Description	Run Time
200	200 - OpenShift Gitops	Set up OpenShift GitOps tools in an OpenShift cluster. This is required to install the software using gitops approaches.	10 Mins
210	210 - IBM Portworx Storage 210 - IBM OpenShift Data Foundation 210 - AWS Portworx Storage 210 - Azure Portworx Storage	Use this automation to deploy a storage solution for your cluster.	10 Mins
300	300 - Cloud Pak for Data Entitlement	Update the OpenShift Cluster with your entitlement key	5 Mins
305	305 - Cloud Pak for Data Foundation	Deploy the Cloud Pak for Data (4.0) Foundation components	45 Mins
600	600 - datafabric-services-odf 600 - datafabric-services-portworx	Deploy the Data Fabric Servides with odf or portworx	120 Mins
610	610 - datafabric-setup	Deploy the Data Farbic Setup	5 Mins

At this time the most reliable way of running this automation is with Terraform in your local machine either through a bootstrapped container image or with native tools installed. We provide a Container image that has all the common SRE tools installed. CLI Tools Image, Source Code for CLI Tools

Installation Steps

Before you start the installation please install the pre-req tools on your machine.

We have tested this on a modern Mac laptop. We are testing on M1 machines. You will need to setup the tools natively in your M1 Mac OS and not run the launch.sh script.

Pre-Req Setup

Please install the following Pre-Req tools to help you get started with the SRE tasks for installing Data Foundation into an existing OpenShift Cluster on AWS, Azure, or IBM Cloud.

Pre-requisites:

Check you have a valid GitHub ID that can be used to create a repository in your own organization GitHub or GitHub Enterprise account.
Install a code editor, we recommend Visual Studio Code
Install Brew

Ensure you have the following before continuing:

Github account exists
A Github token is available with permissions set to create and remove repositories
You are able to login to the OpenShift cluster and obtain an OpenShift login token
Cloud Pak entitlement key, this can be obtained from visiting the IBM Container Library as described above.

Installing Data Foundation

The installation process will use a standard GitOps repository that has been built using the Modules to support Data Foundation installation. The automation is consistent across three cloud environments AWS, Azure, and IBM Cloud.

Set up the runtime environment

At this time the most reliable way of running this automation is with Terraform in your local machine either through a bootstrapped docker image or Virtual Machine. We provide both a container image and a virtual machine cloud-init script that have all the common SRE tools installed.

We recommend using Docker Desktop if choosing the container image method, and Multipass if choosing the virtual machine method. Detailed instructions for downloading and configuring both Docker Desktop and Multipass can be found in RUNTIMES.md

Set up environment credentials

First step is to clone the automation code to your local machine. Run this git command in your favorite command line shell.
```
git clone https://github.com/IBM/automation-data-fabric.git
```
Navigate into the automation-data-fabric folder using your command line.

a. The README.md has a comprehensive instructions on how to install this into other cloud environments than TechZone. This document focuses on getting it running in a TechZone requested environment.

Next you will need to set-up your credentials.properties file. This will enable a secure deployment to your cluster.

cp credentials.template credentials.properties
code credential.properties

In the credentials.properties file you will need to populate the values for your deployment.

## Add the values for the Credentials to access the OpenShift Environment
## Instructions to access this information can be found in the README.MD
## This is a template file and the ./launch.sh script looks for a file based on this template named credentials.properties

## gitops_repo_host: The host for the git repository
export TF_VAR_gitops_repo_host=github.com
## gitops_repo_username: The username of the user with access to the repository
export TF_VAR_gitops_repo_username=
## gitops_repo_token: The personal access token used to access the repository
export TF_VAR_gitops_repo_token=

## TF_VAR_server_url: The url for the OpenShift api server
export TF_VAR_server_url=
## TF_VAR_cluster_login_token: Token used for authentication to the api server
export TF_VAR_cluster_login_token=

## TF_VAR_entitlement_key: The entitlement key used to access the IBM software images in the container registry. Visit https://myibm.ibm.com/products-services/containerlibrary to get the key
export TF_VAR_entitlement_key=

# Only needed if targeting IBM Cloud Deployment
export TF_VAR_ibmcloud_api_key=

# AWS Credentials are required to Create AWS S3 bucket and upload Datafiles to the S3 Bucket (https://github.com/IBM/automation-data-fabric/tree/main/610-datafabric-setup/terraform/Datafiles)    
## particular permissions in order to interact with the account and the OpenShift cluster. Use the
## provided `aws-portworx-credentials.sh` script to retrieve/generate these credentials.
##
TF_VAR_access_key=
TF_VAR_secret_key=


##
## Azure credentials
## Credentials are required to install Portworx on an Azure account. These credentials must have
## particular permissions in order to interact with the account and the OpenShift cluster. Use the
## provided `azure-portworx-credentials.sh` script to retrieve/generate these credentials.
##

## TF_VAR_azure_subscription_id: The subscription id for the Azure account. This is required if Azure portworx is used
export TF_VAR_azure_subscription_id=
## TF_VAR_azure_tenant_id: The tenant id for the Azure account. This is required if Azure portworx is used
export TF_VAR_azure_tenant_id=
## TF_VAR_azure_client_id: The client id of the user for the Azure account. This is required if Azure portworx is used
export TF_VAR_azure_client_id=
## TF_VAR_azure_client_secret: The client id of the user for the Azure account. This is required if Azure portworx is used
export TF_VAR_azure_client_secret=

Add your Git Hub username and your Personal Access Token to gitops_repo_username and gitops_repo_token
From you OpenShift console click on top right menu and select Copy login command and click on Display Token
Copy the API Token value into the cluster_login_token value
Copy the Server URL into the server_url value, only the part starting with https
Copy the entitlement key, this can be obtained from visiting the IBM Container Library and place it in the entitlement_key variable.

Configure Storage

Deploying on IBM Cloud (Portworx or ODF)

Provide the IBM Cloud API Key for the target IBM Cloud account as the value for TF_VAR_ibmcloud_api_key

Deploying on Azure (Portworx)

If Cloud Pak for Data(CP4D) will be deployed on OpenShift deployed on Azure, the credentials for the Azure account need to be provided. Several clis are required for these steps:

az cli - https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
jq cli - https://stedolan.github.io/jq/download/

You can install these clis on your local machine OR run the following commands within the provided container image by running launch.sh

Log into your Azure account
```
az login
```
Run the azure-portworx-credentials.sh script to gather/create the credentials:
```
./azure-portworx-credentials.sh -t {cluster type} -g {resource group name} -n {cluster name} [-s {subscription id}]
```
where:
- cluster type is the type of OpenShift cluster (aro or ipi).
- resource group name is the name of the Azure resource group where the cluster has been provisioned.
- cluster name is the name of the OpenShift cluster.
- subscription id is the subscription id of the Azure account. If a value is not provided it will be looked up.

Update credentials.properties with the values output from the script.

{
  "azure_client_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "azure_client_secret": "XXXXXXX",
  "azure_tenant_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "azure_subscription_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
}

If you used the container image to run the script, type exit to close the container shell then re-rung launch.sh to pick up the changes to the environment variables.

Configure the automation

Get the Portworx configuration spec (for AWS or Azure deployments)

Follow the steps to download the portworx confituration spec
Copy the downloaded file into the root directory of the cloned automation-data-foundation repository

Set up the automation workspace

Launch the automation runtime.
- If using Docker Desktop, run ./launch.sh. This will start a container image with the prompt opened in the /terraform directory.
- If using Multipass, run mutlipass shell cli-tools to start the interactive shell, and cd into the /automation/{template} directory, where {template} is the folder you've cloned this repo. Be sure to run source credentials.properties once in the shell.
Next we need to create a workspace to run the Terraform automation. Below you can see the parameters to configure your workspace for terraform execution.
```
/terraform $ ./setup-workspace.sh -h
Creates a workspace folder and populates it with automation bundles you require.
 
Usage: setup-workspace.sh
options:
-p     Cloud provider (aws, azure, ibm)
-s     Storage (portworx or odf)
-n     (optional) prefix that should be used for all variables
-x     (optional) Portworx spec file - the name of the file containing the Portworx configuration spec yaml
-c     (optional) Self-signed Certificate Authority issuer CRT file
-h     Print this help
```
You will need to select the cloud provider of your choice, storage option, and if desired, a prefix for naming new resource instances on the Cloud account. If you are using Azure, you will need a Portworx spec file name (as described above), and if your cluster is using a self-signed SSL certificate, you will need a copy of the issuer cert and the file name.

Run the command setup-workspace.sh -p ibm -s portworx -n df and include optional parameters as needed.

/terraform $ ./setup-workspace.sh -p ibm -s portworx -n df
Setting up workspace in '/terraform/../workspaces/current'
*****
Setting up workspace from '' template
*****
Setting up automation  /workspaces/current
/terraform
Setting up current/200-openshift-gitops from 200-openshift-gitops
  Skipping 210-aws-portworx-storage because it does't match ibm
  Skipping 210-azure-portworx-storage because it does't match ibm
Setting up current/210-ibm-odf-storage from 210-ibm-odf-storage
Setting up current/210-ibm-portworx-storage from 210-ibm-portworx-storage
Setting up current/300-cloud-pak-for-data-entitlement from 300-cloud-pak-for-data-entitlement
Setting up current/305-cloud-pak-for-data-foundation from 305-cloud-pak-for-data-foundation
Setting up current/600-datafabric-services-odf [or] 600-datafabric-services-portworx from 600-datafabric-services-odf [or] 600-datafabric-services-portworx
Setting up current/610-datafabric-setup from 610-datafabric-setup
move to /workspaces/current this is where your automation is configured

The default terraform.tfvars file is symbolically linked to the new workspaces/current folder so this enables you to edit the file in your native operating system using your editor of choice.
Edit the default terraform.tfvars file this will enable you to setup the GitOps parameters.

The following you will be prompted for and some suggested values.

Variable	Description	Suggested Value
gitops-repo_host	The host for the git repository.	github.com
gitops-repo_type	The type of the hosted git repository (github or gitlab).	github
gitops-repo_org	The org/group/username where the git repository exists	github userid or org - if left blank the value will default to your username
gitops-repo_repo	The short name of the repository to create	cp4d-gitops

The gitops-repo_repo, gitops-repo_token, entitlement_key, server_url, and cluster_login_token values will be loaded automatically from the credentials.properties file that was configured in an earlier step.

The cp4d-instance_storage_vendor variable should have already been populated by the setup-workspace.sh script. This should have the value portworx or ocs, depending on the selected storage option.
You will see that the repo_type and repo_host are set to GitHub you can change these to other Git Providers, like GitHub Enterprise or GitLab.
For the repo_org value set it to your default org name, or specific a custom org value. This is the organization where the GitOps Repository will be created in. Click on top right menu and select Your Profile to take you to your default organization.
Set the repo_repo value to a unique name that you will recognize as the place where the GitOps configuration is going to be placed before Data Foundation is installed into the cluster.
You can change the gitops-cluster-config_banner_text banner text to something useful for your client project or demo.
Save the terraform.tfvars file
Navigate into the /workspaces/current folder

❗️ Do not skip this step. You must execute from the /workspaces/current folder.

Automated Deployment

To perform the deployment automatically, execute the ./apply-all.sh script in the /workspaces/current directory. This will apply each of the Data Foundation layers sequentially. This operation will complete in 10-15 minutes, and the Data Foundation will continue asycnchronously in the background. This can take an additional 45 minutes.

Alternatively you can run each of the layers individually, by following the manual deployment instructions.

Once complete, skip to the Access the Data Foundation Deployment section

Manual Deployment

You can also deploy each layer manually. To begin, navigate into the 200-openshift-gitops folder and run the following commands
```
cd 200-openshift-gitops
terraform init
terraform apply --auto-approve
```
This will kick off the automation for setting up the GitOps Operator into your cluster. Once complete, you should see message similar to:
```
Apply complete! Resources: 78 added, 0 changed, 0 destroyed.
```
You can check the progress by looking at two places, first look in your github repository. You will see the git repository has been created based on the name you have provided. The Cloud Pak for Data (CP4D) install will populate this with information to let OpenShift GitOps install the software. The second place is to look at the OpenShift console, Click Workloads->Pods and you will see the GitOps operator being installed.
Change directories to the 210-* folder and run the following commands to deploy storage into your cluster:

If you are using IBM Techzone's IBM Cloud ROKS OpenShift Cluster (VPC), OCS is already configured. You can skip this step.
```
cd 210-ibm-portworx-storage
terraform init
terraform apply --auto-approve
```
This folder will vary based on the platform and storage options that you selected in earlier steps.

Storage configuration will run asynchronously in the background inside of the Cluster and should be complete within 10 minutes.
Change directories to the 300-cloud-pak-for-data-entitlement folder and run the following commands to deploy entitlements into your cluster:
```
cd ../300-cloud-pak-for-data-entitlement
terraform init
terraform apply --auto-approve
```
This step does not require worker nodes to be restarted as some other installation methods describe.
Change directories to the 305-cloud-pak-for-data-foundation folder and run the following commands to deploy Data Foundation 4.0 into the cluster.
```
cd ../305-cloud-pak-for-data-foundation
terraform init
terraform apply --auto-approve
```
Data Foundation deployment will run asynchronously in the background, and may require up to 45 minutes to complete.
Change directories to the 600-datafabric-services-odf [or] 600-datafabric-services-portworx folder and run the following commands to deploy Data Fabric Services (WKC, WS, WML, DV) into the cluster.
```
cd ../600-datafabric-services-odf [or] 600-datafabric-services-portworx
terraform init
terraform apply --auto-approve
```
Data Fabric Services (WKC, WS, WML, DV & DV provision) will run asynchronously in the background, and may require up to 120 minutes to complete.
Change directories to the 610-datafabric-setup folder and run the following commands to a. Create AWS S3 Bucket b. Upload Datafiles to AWS S3 Bucket c. Configure Data Fabric Solution on top of Cloud pak for Data
```
cd ../610-datafabric-setup
terraform init
terraform apply --auto-approve
```
Data Fabric setup will run asynchronously in the background, and may require up to 5 minutes to complete.
You can check the progress of the deployment by opening up Argo CD (OpenShift GitOps). From the OpenShift user interface, click on the Application menu 3x3 Icon on the header and select Cluster Argo CD menu item.)

This process will take between 3 to 4 hours to complete. During the deployment, several cluster projects/namespaces and deployments will be created.

Access the Data Foundation Deployment

Once deployment is complete, go back into the OpenShift cluster user interface and navigate to view Routes for the cp4d namespace. Here you can see the URL to the deployed Data Foundation instance. Open this url in a new browser window.
Navigate to Secrets in the cp4d namespace, and find the admin-user-details secret. Copy the value of initial_admin_password key inside of that secret.
Go back to the Cloud Pak for Data Foundation instance that you opened in a separate window. Log in using the username admin with the password copied in the previous step.

Summary

This concludes the instructions for installing Data Foundation on AWS, Azure, and IBM Cloud.

Now that the Data Foundation deployment is complete you can deploy Cloud Pak for Data services into this cluster.

Uninstalling & Troubleshooting

Please refer to the Troubleshooting Guide for uninstallation instructions and instructions to correct common issues.

If you continue to experience issues with this automation, please file an issue or reach out on our public Dischord server.

How to Generate this repository from the source Bill of Materials.

This set of automation packages was generated using the open-source isacable tool. This tool enables a Bill of Material yaml file to describe your software requirements. If you want up stream releases or versions you can use iascable to generate a new terraform module.

The iascable tool is targeted for use by advanced SRE developers. It requires deep knowledge of how the modules plug together into a customized architecture. This repository is a fully tested output from that tool. This makes it ready to consume for projects.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
200-openshift-gitops		200-openshift-gitops
210-aws-portworx-storage		210-aws-portworx-storage
210-azure-portworx-storage		210-azure-portworx-storage
210-ibm-odf-storage		210-ibm-odf-storage
210-ibm-portworx-storage		210-ibm-portworx-storage
300-cloud-pak-for-data-entitlement		300-cloud-pak-for-data-entitlement
305-cloud-pak-for-data-foundation		305-cloud-pak-for-data-foundation
310-cloud-pak-for-data-db2uoperator		310-cloud-pak-for-data-db2uoperator
600-datafabric-services-odf		600-datafabric-services-odf
600-datafabric-services-portworx		600-datafabric-services-portworx
610-datafabric-setup		610-datafabric-setup
images		images
.gitignore		.gitignore
LICENSE		LICENSE
MANUAL-DEPLOY.md		MANUAL-DEPLOY.md
PORTWORX_CONFIG.md		PORTWORX_CONFIG.md
README.md		README.md
RUNTIMES.md		RUNTIMES.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
apply-all.sh		apply-all.sh
apply.sh		apply.sh
azure-portworx-credentials.sh		azure-portworx-credentials.sh
credentials.template		credentials.template
destroy-all.sh		destroy-all.sh
destroy.sh		destroy.sh
launch.sh		launch.sh
setup-workspace.sh		setup-workspace.sh
terraform.tfvars.template		terraform.tfvars.template

License

IBM/automation-data-fabric

Folders and files

Latest commit

History

Repository files navigation