Skip to content

IBM/automation-data-fabric

Repository files navigation

Cloud Pak for Data 4.0 - Data Fabric Solution Automation for AWS, Azure, and IBM Cloud

Change Log

  • 08/2022 - Initial Release

This collection of Cloud Pak for Data - Data Fabric terraform automation layers has been crafted from a set of Terraform modules created by the IBM GSI Ecosystem Lab team part of the IBM Partner Ecosystem organization. Please contact Matthew Perrins mjperrin@us.ibm.com, or Andrew Trice amtrice@us.ibm.com or, Sumeet Kapoor sumeet_kapoor@in.ibm.com, or Sasikanth Gumpana bgumpana@in.ibm.com, or Snehal Pansare spansari@in.ibm.com for more details or raise an issue on the repository.

The automation will support the installation of Data Fabric Solution on three cloud platforms (AWS, Azure, and IBM Cloud). Data Fabric Solution on Cloud Pak for Data that is required to install additional tools, services or cartridges, such as Watson Knowledge Catalog, Watson Studio, Watson Machine Learning, Data Virtualization, or multi-product solutions like Data Fabric.

Target Infrastructure

The Cloud Pak for Data 4.0 - Data Fabric automation assumes you have an OpenShift cluster already configured on your cloud of choice. The supported managed options are ROSA for AWS, ARO for Azure or ROKS for IBM Cloud .

Before you start to install and configure Cloud Pak for Data, you will need to identify what your target infrastructure is going to be. You can start from scratch and use one of the pre-defined reference architectures from IBM or bring your own.

Reference Architectures

The reference architectures are provided in three different forms, with increasing security and associated sophistication to support production configuration. These three forms are as follows:

  • Quick Start - a simple architecture to quickly get an OpenShift cluster provisioned

  • Standard - a standard production deployment environment with typical security protections, private endpoints, VPN server, key management encryption, etc

  • Advanced - a more advanced deployment that employs network isolation to securely route traffic between the different layers.

For each of these reference architecture, we have provided a detailed set of automation to create the environment for the software. If you do not have an OpenShift environment provisioned, please use one of these. They are optimized for the installation of this solution.

Note: Cloud Pak for Data 4.0 system requirements recommend at least 3 worker nodes, with minimum 16vCPU per node and minimum 64 GB RAM per node (128 GB RAM is recommended) for base platform. For Data Fabric Solution total 8 worker nodes, with minimum 16vCPU per node and minimum 64 GB RAM per node are required.

Cloud Platform Automation and Documentation
IBM Cloud IBM Cloud Quick Start
IBM Cloud Standard
IBM Cloud Advanced
AWS AWS Quick Start
AWS Standard
AWS Advanced
Azure Azure Quick Start
Azure Standard
Azure Advanced
Bring Your Own Infrastructure You will need a cluster with at least 16 CPUs and 64 GB of memory per node and at least 8 nodes to setup Data Fabric Solution on Cloud pak for Data.

Getting Started

Within this repository you will find a set of Terraform template bundles that embody best practices for Configure and Setup the Data Fabric in multiple cloud environments. This README.md describes the SRE steps required to Configure and Setup the Data Fabric Solution.

This suite of automation can be used for a Proof of Technology environment to Configure and Setup the Data Fabric Solution with a fully working end-to-end cloud-native environment. The software installs using GitOps best practices with Red Hat Open Shift GitOps

Data Fabric Solution Architecture

The following reference architecture represents the logical view of how Data Fabric Solution works after it is installed. Data Fabric is Configured on top of Data Foundation which deployed with either Portworx or OpenShift Data Foundation storage, within an OpenShift Cluster, on the Cloud provider of your choice.

Reference Architecture

Deploying Data Fabric Solution

The following instructions will help you to Configure and Setup the Data Fabric Solution on AWS, Azure, and IBM Cloud OpenShift Kubernetes environment.

Licenses and Entitlements

Details on Cloud Pak for Data licensing available at https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=planning-licenses-entitlements

Obtaining your IBM entitlement API key

You must have your IBM entitlement API key to access images in the IBM Entitled Registry.

After you purchase Cloud Pak for Data, an entitlement API key for the software is associated with your My IBM account. You need this key to complete the Cloud Pak for Data installation. To obtain the entitlement key, complete the following steps:

  • Log in to Container software library on My IBM with the IBM ID and password that are associated with the entitled software.
  • On the Get entitlement key tab, select Copy key to copy the entitlement key to the clipboard.
  • Save the API key for later in this installation.

Data Fabric Layered Installation

The Data Fabric automation is broken into what we call layers of automation or bundles. The bundles enable SRE activities to be optimized. The automation is generic between clouds other than configuration storage options, which are platform specific.

BOM ID Name Description Run Time
200 200 - OpenShift Gitops Set up OpenShift GitOps tools in an OpenShift cluster. This is required to install the software using gitops approaches. 10 Mins
210 210 - IBM Portworx Storage
210 - IBM OpenShift Data Foundation
210 - AWS Portworx Storage
210 - Azure Portworx Storage
Use this automation to deploy a storage solution for your cluster. 10 Mins
300 300 - Cloud Pak for Data Entitlement Update the OpenShift Cluster with your entitlement key 5 Mins
305 305 - Cloud Pak for Data Foundation Deploy the Cloud Pak for Data (4.0) Foundation components 45 Mins
600 600 - datafabric-services-odf 600 - datafabric-services-portworx Deploy the Data Fabric Servides with odf or portworx 120 Mins
610 610 - datafabric-setup Deploy the Data Farbic Setup 5 Mins

At this time the most reliable way of running this automation is with Terraform in your local machine either through a bootstrapped container image or with native tools installed. We provide a Container image that has all the common SRE tools installed. CLI Tools Image, Source Code for CLI Tools

Installation Steps

Before you start the installation please install the pre-req tools on your machine.

We have tested this on a modern Mac laptop. We are testing on M1 machines. You will need to setup the tools natively in your M1 Mac OS and not run the launch.sh script.

Pre-Req Setup

Please install the following Pre-Req tools to help you get started with the SRE tasks for installing Data Foundation into an existing OpenShift Cluster on AWS, Azure, or IBM Cloud.

Pre-requisites:

  • Check you have a valid GitHub ID that can be used to create a repository in your own organization GitHub or GitHub Enterprise account.

  • Install a code editor, we recommend Visual Studio Code

  • Install Brew

Ensure you have the following before continuing:

  • Github account exists

  • A Github token is available with permissions set to create and remove repositories

  • You are able to login to the OpenShift cluster and obtain an OpenShift login token

  • Cloud Pak entitlement key, this can be obtained from visiting the IBM Container Library as described above.

Installing Data Foundation

The installation process will use a standard GitOps repository that has been built using the Modules to support Data Foundation installation. The automation is consistent across three cloud environments AWS, Azure, and IBM Cloud.

Set up the runtime environment

At this time the most reliable way of running this automation is with Terraform in your local machine either through a bootstrapped docker image or Virtual Machine. We provide both a container image and a virtual machine cloud-init script that have all the common SRE tools installed.

We recommend using Docker Desktop if choosing the container image method, and Multipass if choosing the virtual machine method. Detailed instructions for downloading and configuring both Docker Desktop and Multipass can be found in RUNTIMES.md

Set up environment credentials

  1. First step is to clone the automation code to your local machine. Run this git command in your favorite command line shell.

    git clone https://github.com/IBM/automation-data-fabric.git
    
  2. Navigate into the automation-data-fabric folder using your command line.

    a. The README.md has a comprehensive instructions on how to install this into other cloud environments than TechZone. This document focuses on getting it running in a TechZone requested environment.

  3. Next you will need to set-up your credentials.properties file. This will enable a secure deployment to your cluster.

    cp credentials.template credentials.properties
    code credential.properties

    In the credentials.properties file you will need to populate the values for your deployment.

    ## Add the values for the Credentials to access the OpenShift Environment
    ## Instructions to access this information can be found in the README.MD
    ## This is a template file and the ./launch.sh script looks for a file based on this template named credentials.properties
    
    ## gitops_repo_host: The host for the git repository
    export TF_VAR_gitops_repo_host=github.com
    ## gitops_repo_username: The username of the user with access to the repository
    export TF_VAR_gitops_repo_username=
    ## gitops_repo_token: The personal access token used to access the repository
    export TF_VAR_gitops_repo_token=
    
    ## TF_VAR_server_url: The url for the OpenShift api server
    export TF_VAR_server_url=
    ## TF_VAR_cluster_login_token: Token used for authentication to the api server
    export TF_VAR_cluster_login_token=
    
    ## TF_VAR_entitlement_key: The entitlement key used to access the IBM software images in the container registry. Visit https://myibm.ibm.com/products-services/containerlibrary to get the key
    export TF_VAR_entitlement_key=
    
    # Only needed if targeting IBM Cloud Deployment
    export TF_VAR_ibmcloud_api_key=
    
    # AWS Credentials are required to Create AWS S3 bucket and upload Datafiles to the S3 Bucket (https://github.com/IBM/automation-data-fabric/tree/main/610-datafabric-setup/terraform/Datafiles)    
    ## particular permissions in order to interact with the account and the OpenShift cluster. Use the
    ## provided `aws-portworx-credentials.sh` script to retrieve/generate these credentials.
    ##
    TF_VAR_access_key=
    TF_VAR_secret_key=
    
    
    ##
    ## Azure credentials
    ## Credentials are required to install Portworx on an Azure account. These credentials must have
    ## particular permissions in order to interact with the account and the OpenShift cluster. Use the
    ## provided `azure-portworx-credentials.sh` script to retrieve/generate these credentials.
    ##
    
    ## TF_VAR_azure_subscription_id: The subscription id for the Azure account. This is required if Azure portworx is used
    export TF_VAR_azure_subscription_id=
    ## TF_VAR_azure_tenant_id: The tenant id for the Azure account. This is required if Azure portworx is used
    export TF_VAR_azure_tenant_id=
    ## TF_VAR_azure_client_id: The client id of the user for the Azure account. This is required if Azure portworx is used
    export TF_VAR_azure_client_id=
    ## TF_VAR_azure_client_secret: The client id of the user for the Azure account. This is required if Azure portworx is used
    export TF_VAR_azure_client_secret=
    
  4. Add your Git Hub username and your Personal Access Token to gitops_repo_username and gitops_repo_token

  5. From you OpenShift console click on top right menu and select Copy login command and click on Display Token

  6. Copy the API Token value into the cluster_login_token value

  7. Copy the Server URL into the server_url value, only the part starting with https

  8. Copy the entitlement key, this can be obtained from visiting the IBM Container Library and place it in the entitlement_key variable.

Configure Storage

Deploying on IBM Cloud (Portworx or ODF)
  1. Provide the IBM Cloud API Key for the target IBM Cloud account as the value for TF_VAR_ibmcloud_api_key
Deploying on Azure (Portworx)

If Cloud Pak for Data(CP4D) will be deployed on OpenShift deployed on Azure, the credentials for the Azure account need to be provided. Several clis are required for these steps:

You can install these clis on your local machine OR run the following commands within the provided container image by running launch.sh

  1. Log into your Azure account

    az login
  2. Run the azure-portworx-credentials.sh script to gather/create the credentials:

    ./azure-portworx-credentials.sh -t {cluster type} -g {resource group name} -n {cluster name} [-s {subscription id}]

    where:

    • cluster type is the type of OpenShift cluster (aro or ipi).
    • resource group name is the name of the Azure resource group where the cluster has been provisioned.
    • cluster name is the name of the OpenShift cluster.
    • subscription id is the subscription id of the Azure account. If a value is not provided it will be looked up.
  3. Update credentials.properties with the values output from the script.

    {
      "azure_client_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
      "azure_client_secret": "XXXXXXX",
      "azure_tenant_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
      "azure_subscription_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
    }
  4. If you used the container image to run the script, type exit to close the container shell then re-rung launch.sh to pick up the changes to the environment variables.

Configure the automation

Get the Portworx configuration spec (for AWS or Azure deployments)
  1. Follow the steps to download the portworx confituration spec
  2. Copy the downloaded file into the root directory of the cloned automation-data-foundation repository
Set up the automation workspace
  1. Launch the automation runtime.

    • If using Docker Desktop, run ./launch.sh. This will start a container image with the prompt opened in the /terraform directory.
    • If using Multipass, run mutlipass shell cli-tools to start the interactive shell, and cd into the /automation/{template} directory, where {template} is the folder you've cloned this repo. Be sure to run source credentials.properties once in the shell.
  2. Next we need to create a workspace to run the Terraform automation. Below you can see the parameters to configure your workspace for terraform execution.

    /terraform $ ./setup-workspace.sh -h
    Creates a workspace folder and populates it with automation bundles you require.
     
    Usage: setup-workspace.sh
    options:
    -p     Cloud provider (aws, azure, ibm)
    -s     Storage (portworx or odf)
    -n     (optional) prefix that should be used for all variables
    -x     (optional) Portworx spec file - the name of the file containing the Portworx configuration spec yaml
    -c     (optional) Self-signed Certificate Authority issuer CRT file
    -h     Print this help
    

    You will need to select the cloud provider of your choice, storage option, and if desired, a prefix for naming new resource instances on the Cloud account. If you are using Azure, you will need a Portworx spec file name (as described above), and if your cluster is using a self-signed SSL certificate, you will need a copy of the issuer cert and the file name.

  3. Run the command setup-workspace.sh -p ibm -s portworx -n df and include optional parameters as needed.

    /terraform $ ./setup-workspace.sh -p ibm -s portworx -n df
    Setting up workspace in '/terraform/../workspaces/current'
    *****
    Setting up workspace from '' template
    *****
    Setting up automation  /workspaces/current
    /terraform
    Setting up current/200-openshift-gitops from 200-openshift-gitops
      Skipping 210-aws-portworx-storage because it does't match ibm
      Skipping 210-azure-portworx-storage because it does't match ibm
    Setting up current/210-ibm-odf-storage from 210-ibm-odf-storage
    Setting up current/210-ibm-portworx-storage from 210-ibm-portworx-storage
    Setting up current/300-cloud-pak-for-data-entitlement from 300-cloud-pak-for-data-entitlement
    Setting up current/305-cloud-pak-for-data-foundation from 305-cloud-pak-for-data-foundation
    Setting up current/600-datafabric-services-odf [or] 600-datafabric-services-portworx from 600-datafabric-services-odf [or] 600-datafabric-services-portworx
    Setting up current/610-datafabric-setup from 610-datafabric-setup
    move to /workspaces/current this is where your automation is configured
    
  4. The default terraform.tfvars file is symbolically linked to the new workspaces/current folder so this enables you to edit the file in your native operating system using your editor of choice.

  5. Edit the default terraform.tfvars file this will enable you to setup the GitOps parameters.

The following you will be prompted for and some suggested values.

Variable Description Suggested Value
gitops-repo_host The host for the git repository. github.com
gitops-repo_type The type of the hosted git repository (github or gitlab). github
gitops-repo_org The org/group/username where the git repository exists github userid or org - if left blank the value will default to your username
gitops-repo_repo The short name of the repository to create cp4d-gitops

The gitops-repo_repo, gitops-repo_token, entitlement_key, server_url, and cluster_login_token values will be loaded automatically from the credentials.properties file that was configured in an earlier step.

  1. The cp4d-instance_storage_vendor variable should have already been populated by the setup-workspace.sh script. This should have the value portworx or ocs, depending on the selected storage option.

  2. You will see that the repo_type and repo_host are set to GitHub you can change these to other Git Providers, like GitHub Enterprise or GitLab.

  3. For the repo_org value set it to your default org name, or specific a custom org value. This is the organization where the GitOps Repository will be created in. Click on top right menu and select Your Profile to take you to your default organization.

  4. Set the repo_repo value to a unique name that you will recognize as the place where the GitOps configuration is going to be placed before Data Foundation is installed into the cluster.

  5. You can change the gitops-cluster-config_banner_text banner text to something useful for your client project or demo.

  6. Save the terraform.tfvars file

  7. Navigate into the /workspaces/current folder

    ❗️ Do not skip this step. You must execute from the /workspaces/current folder.

Automated Deployment
  1. To perform the deployment automatically, execute the ./apply-all.sh script in the /workspaces/current directory. This will apply each of the Data Foundation layers sequentially. This operation will complete in 10-15 minutes, and the Data Foundation will continue asycnchronously in the background. This can take an additional 45 minutes.

    Alternatively you can run each of the layers individually, by following the manual deployment instructions.

    Once complete, skip to the Access the Data Foundation Deployment section

Manual Deployment
  1. You can also deploy each layer manually. To begin, navigate into the 200-openshift-gitops folder and run the following commands

    cd 200-openshift-gitops
    terraform init
    terraform apply --auto-approve
    
  2. This will kick off the automation for setting up the GitOps Operator into your cluster. Once complete, you should see message similar to:

    Apply complete! Resources: 78 added, 0 changed, 0 destroyed.
    
  3. You can check the progress by looking at two places, first look in your github repository. You will see the git repository has been created based on the name you have provided. The Cloud Pak for Data (CP4D) install will populate this with information to let OpenShift GitOps install the software. The second place is to look at the OpenShift console, Click Workloads->Pods and you will see the GitOps operator being installed.

  4. Change directories to the 210-* folder and run the following commands to deploy storage into your cluster:

    If you are using IBM Techzone's IBM Cloud ROKS OpenShift Cluster (VPC), OCS is already configured. You can skip this step.

    cd 210-ibm-portworx-storage
    terraform init
    terraform apply --auto-approve
    

    This folder will vary based on the platform and storage options that you selected in earlier steps.

    Storage configuration will run asynchronously in the background inside of the Cluster and should be complete within 10 minutes.

  5. Change directories to the 300-cloud-pak-for-data-entitlement folder and run the following commands to deploy entitlements into your cluster:

    cd ../300-cloud-pak-for-data-entitlement
    terraform init
    terraform apply --auto-approve
    

    This step does not require worker nodes to be restarted as some other installation methods describe.

  6. Change directories to the 305-cloud-pak-for-data-foundation folder and run the following commands to deploy Data Foundation 4.0 into the cluster.

    cd ../305-cloud-pak-for-data-foundation
    terraform init
    terraform apply --auto-approve
    

    Data Foundation deployment will run asynchronously in the background, and may require up to 45 minutes to complete.

  7. Change directories to the 600-datafabric-services-odf [or] 600-datafabric-services-portworx folder and run the following commands to deploy Data Fabric Services (WKC, WS, WML, DV) into the cluster.

    cd ../600-datafabric-services-odf [or] 600-datafabric-services-portworx
    terraform init
    terraform apply --auto-approve
    

    Data Fabric Services (WKC, WS, WML, DV & DV provision) will run asynchronously in the background, and may require up to 120 minutes to complete.

  8. Change directories to the 610-datafabric-setup folder and run the following commands to a. Create AWS S3 Bucket b. Upload Datafiles to AWS S3 Bucket c. Configure Data Fabric Solution on top of Cloud pak for Data

    cd ../610-datafabric-setup
    terraform init
    terraform apply --auto-approve
    

    Data Fabric setup will run asynchronously in the background, and may require up to 5 minutes to complete.

  9. You can check the progress of the deployment by opening up Argo CD (OpenShift GitOps). From the OpenShift user interface, click on the Application menu 3x3 Icon on the header and select Cluster Argo CD menu item.)

    This process will take between 3 to 4 hours to complete. During the deployment, several cluster projects/namespaces and deployments will be created.

Access the Data Foundation Deployment
  1. Once deployment is complete, go back into the OpenShift cluster user interface and navigate to view Routes for the cp4d namespace. Here you can see the URL to the deployed Data Foundation instance. Open this url in a new browser window.

    Reference Architecture

  2. Navigate to Secrets in the cp4d namespace, and find the admin-user-details secret. Copy the value of initial_admin_password key inside of that secret.

  3. Go back to the Cloud Pak for Data Foundation instance that you opened in a separate window. Log in using the username admin with the password copied in the previous step.

Summary

This concludes the instructions for installing Data Foundation on AWS, Azure, and IBM Cloud.

Now that the Data Foundation deployment is complete you can deploy Cloud Pak for Data services into this cluster.

Uninstalling & Troubleshooting

Please refer to the Troubleshooting Guide for uninstallation instructions and instructions to correct common issues.

If you continue to experience issues with this automation, please file an issue or reach out on our public Dischord server.

How to Generate this repository from the source Bill of Materials.

This set of automation packages was generated using the open-source isacable tool. This tool enables a Bill of Material yaml file to describe your software requirements. If you want up stream releases or versions you can use iascable to generate a new terraform module.

The iascable tool is targeted for use by advanced SRE developers. It requires deep knowledge of how the modules plug together into a customized architecture. This repository is a fully tested output from that tool. This makes it ready to consume for projects.

About

Automation to provision Data Fabric on an OpenShfit cluster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published