
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Using Terraform with Databricks

In this lab you will learn how to:
* Install and configure open source Terraform
* Remotely administer Databricks using open source Terraform and Terraform Cloud

## Overview

Terraform is a software tool that allows you to define infrastructure as code. Terraform integrates with hundreds of upstream APIs including Databricks. Most of the resources exposed by Databricks APIs can be managed with Terraform.

Terraform is accessible in two ways:
* A free, open source self-managed tool available in binary form that can be run in a variety of operating systems
* A managed SaaS platform known as **Terraform Cloud** that offers free and paid tiers

We will explore both options in this lab, but we'll starting with the self-managed version. During that time, we'll also get acquainted with the constructs of a Terraform environment and its operation, before seeing how it all fits in with Terraform Cloud.

## Open source Terraform

Hashicorp offers a completely free version of Terraform that you download and manage on your own. Users typically install it in their own environment, where they can invoke it manually or integrate it with upstream CI/CD processes. In this lab, we will take advantage of the execution environment provided by the attached all-purpose cluster for the purpose of demonstrating installation and usage.

When managing Terraform on your own, there's a couple important considerations to keep in mind. The details of these fall outside the scope of this lab, but we mention them here so that you will be aware of them if you choose to go down this route.

* **Configuration files:** Terraform configurations are defined by a collection of text files written in Terraform language. Since this is infrastructure as code, these files should be treated like any other code. It's definitely a good idea to manage them using revision control.
* **Authentication:** because Terraform uses Databricks APIs, it needs authentication credentials. If you happen to be using the Databricks CLI in the same environment, Terraform can use its authentication setup. Otherwise, environment variables are generally considered the safest option. As a final resort, credentials can be embedded in the configuration files themselves, but be careful with this since it's easy to inadvertently distribute to others directly or through revision control.
* **State management:** Terraform tracks and records the current state of the system using a *backend*, and this part is crucial for Terraform to function correctly and reliably. The backend storage must be persistent (at least for the life of the resources it manages) and accessible by all who may be managing the configuration.

### Classroom Setup

In [None]:
%run ./Includes/Classroom-Setup-03

### Other Conventions

In [None]:
# List the DA object components
print(f"Username: {DA.username}")
print(f"Catalog Name: {DA.catalog_name}")
print(f"Schema Name: {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"Dataset Location: {DA.paths.datasets}")
print(f"Secondary Principal: {DA.iam.secondary}")
print(f"Cluster Name: {DA.cluster_name}")

In [None]:
%sh wget -P /tmp https://releases.hashicorp.com/terraform/1.2.8/terraform_1.2.8_linux_amd64.zip

In [None]:
%sh unzip -d $VIRTUAL_ENV/bin /tmp/terraform_1.2.8_linux_amd64.zip 

In [None]:
%sh terraform -v

### Record Credentials

In [None]:
DA.get_credentials()

### Create Databricks Personal Access Token

In [None]:
import os

# Create the folder and file path
folder_path = './terraform_resources'
file_path = os.path.join(folder_path, 'databricks.tf')

# Ensure the directory exists
os.makedirs(folder_path, exist_ok=True)

# Write the Terraform configuration to the file
with open(file_path, 'w') as f:
    f.write("""
            provider "databricks" {{
                host = "{os.getenv['DATABRICKS_HOST']}"
                token = "{os.getenv['DATABRICKS_TOKEN']}"
                }}
            """)
    
print(f"Terraform configuration written to {file_path}")

### Configuring provider

In [None]:
import os

# Create folder and file path
folder_path = './terraform_resources'
file_path = os.path.join(folder_path, 'databricks.tf')

# Ensure the directory exists
os.makedirs(folder_path, exist_ok=True)

# Write the Terraform configuration to the file
with open(file_path, 'w') as f:
    f.write("""
            terraform{
                required_providers {
                    databricks = {
                        source  = "databricks/databricks"
                        version = "~> 1.0.0"
                    }
                }
            }
            backend "local" {
                path = "/tmp/terraform/terraform.tfstate"
            }
            """)
    
print(f"Terraform configuration written to {file_path}")

### Initializing Terraform

In [None]:
%sh terraform -chdir=./terraform_resources init

In [None]:
%sql
SELECT "${DA.schema_name}" AS Target

### Declaring a new schema
For those who followed along with the labs Using Databricks Utilities and CLI and Using Databricks APIs, let's work toward definined a Terraform configuration that builds the elements that we created in those labs. As a first step, let's establish a new schema in the main catalog named myschema_tfos.

To add elements to a Terraform configuration, we can simply add an arbitrarily named .tf file to the folder. In this case we will simply add the file schema.tf to specify the schema.

Terraform configuration files are written in the Terraform language, which is built on a simple, declarative syntax. The configuration files describe the desired state of the system, which makes defining and managing the system extremely easy for an admin since Terraform manages all the changes needed to get the system to the desired state.

Because we're running this in the context of a notebook, there's some extra code wrapped around the actual configuration; the essence of the configuration is found within the triple-quotation fences.

In [None]:
dbutils.fs.put(
    "/terraform/schema.tf",
    """
    resource "databricks_schema" "myschema" {
        catalog_name = "main"
        name         = "myschema_tfos"
        comment      = "This schema is managed by Terraform Open Source"
    }
    """,
    True)

### Declaring a new table

In [None]:
import os

# Create the folder and file path
folder_path = './terraform_resources'
file_path = os.path.join(folder_path, 'table.tf')

# Ensure the directory exists
os.makedirs(folder_path, exist_ok=True)

# Write the Terraform configuration to the file
with open(file_path, 'w') as f:
    f.write("""
            resource "databricks_sql_table" "Terraform_Table" {
                catalog_name = "dbacademy"
                schema_name  = "gifted_target"
                name         = "terraform_table"
                table_type  = "MANAGED"
                data_source_format = "DELTA"
                cluster_id = "1211-060256-y75n00eh
                
                column {
                    name = "id"
                    type = "int"
                    comment = "Primary Key"
                }
                column {
                    name = "name"
                    type = "string"
                    comment = "Name of gifted student"
                }
                column {
                    name = "score"
                    type = "float"
                    comment = "Score obtained by the student"
                }
            }
            """)
    
print(f"Terraform configuration written to {file_path}")

### Examining the plan

In [None]:
%sh terraform -chdir=./terraform_resources plan

### Apply plan

In [None]:
%sh terraform -chdir=./terraform_resources apply -auto-approve

### Cleanup

In [None]:
%sh terraform -chdir=./terraform_resources apply -destroy -auto-approve

In [None]:
%sh rm -rf ./terraform_resources

In [None]:
%sh rm -rm ./var