
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Deploying a workspace in a customer-managed VPC

In this lab you will learn how to:
* Create your own VPC
* Integrate your VPC into the Databricks account console
* Create new workspaces using that VPC

## Prerequisites

If you would like to follow along with this lab, you will need:
* administrator access to your AWS console, with the ability to create VPCs, buckets and IAM roles
* account administrator capabilities in your Databricks account in order to access the account console
* performing the *Supporting Databricks workspaces and metastores* lab from the *AWS Databricks Platform Administration Fundamentals* course will be a benefit, as this lab is largely an extension of that one

## Supporting a workspace in a custom VPC

You will recall from the *Supporting Databricks workspaces and metastores* lab from the *AWS Databricks Platform Administration Fundamentals* course, we created the AWS and Databricks elements needed to support the creation of a Databricks workspace using the default VPC configuration. In this lab, we'll work through a modified approach that enable us to have full control of the VPC. While some of this may seem familiar, there are differences to accomodate for the custom VPC.

With that said, let's proceed.

## Creating a VPC

The first thing we need in this scenario is a VPC. In the *Supporting Databricks workspaces and metastores* lab, we allowed Databricks to manage this aspect for us, but here we must create and configure a suitable VPC for workspace deployment.

1. In the AWS VPC console, let's select the region in which we're deploying our workspaces; let's use *us-east-1*.
1. Click **Create VPC**.
1. Let's select **VPC and more**.
1. Let's specify a value for **Name tag auto-generation**. Databricks recommends including the region in the name. Let's use *dbacademy-test-vpc-us-east-1*.
1. Let's leave the IPv4 and IPv6 CIDR block settings as they are, though we could modify these if needed.
1. Select *2* for the nubmer of public subnets. Databricks doesn't need them both, but they are required to enable NATs.
1. Select *2* for the number of private subnets. Each workspace needs two, so two will be sufficient to get started with one workspace.
1. Select *In 1 AZ* for **NAT gateways**.
1. Ensure that both **Enable DNS hostnames** and **Enable DNS resolution** are enabled.
1. Finally, let's click **Create VPC**. 

This will trigger the creation of the VPC and all related resources, and will take a few moments to complete. Once done, you can proceed.

### Configuring the VPC

Databricks has some requirements for its VPCs at outlined in the <a href="https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#vpc-requirements-1" target="_blank">documentation</a>. Though the default parameters will work for Databricks workspaces, you'll likely want to reconfigure various aspects of your VPC at some point.

In the **VPC Management Console** let's use the filter functionality to isolate items related to the VPC we created. From here we can review or configure elements related to the VPC, which we will do shortly. For now, let's proceed to create a workspace using this VPC.

## Creating a cross-account IAM role

In this section, we'll create the cross-acount role. Rather than using the one we created in the *Supporting Databricks workspaces and metastores* lab, we'll create a new one with fewer permissions, since we do not need to allow Databricks to manage VPCs or their associated resources. The policy we use here is a watered-down version of the policy needed when using Databricks default VPCs. This policy does not allow Databricks to manage VPCs or the associated resources like addresses, routes and tables, subnets, gateways, and security groups.

1. In the AWS IAM console, let's select **Roles**.
1. Click **Create role**.
1. Select **AWS account**. This will let us set up a cross-account trust relationship that will allow Databricks to provision resources in our account.
   * Select **Another AWS account**.
   * For **Account ID**, let's substitute in the Databricks account ID, *414351767826*.
   * Select **Require external ID**.
   * For **External ID**, let's paste our Databricks account ID. We can easily get this from the user menu in the account console.
   * Now let's click **Next** until we get to the final page.
   * Let's assign the name for our role (use *dbacademy-test-cross-account-role-novpc*).
   * Click **Create role**.
1. Now let's view the role we just created.
1. Let's click the **Permissions** tab.
1. Let's select **Add permissions > Create inline policy**.
1. Click the **JSON** tab.
1. Replace the default policy with the following:
    ```
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "Stmt1403287045000",
          "Effect": "Allow",
          "Action": [
            "ec2:AssociateIamInstanceProfile",
            "ec2:AttachVolume",
            "ec2:AuthorizeSecurityGroupEgress",
            "ec2:AuthorizeSecurityGroupIngress",
            "ec2:CancelSpotInstanceRequests",
            "ec2:CreateTags",
            "ec2:CreateVolume",
            "ec2:DeleteTags",
            "ec2:DeleteVolume",
            "ec2:DescribeAvailabilityZones",
            "ec2:DescribeIamInstanceProfileAssociations",
            "ec2:DescribeInstanceStatus",
            "ec2:DescribeInstances",
            "ec2:DescribeInternetGateways",
            "ec2:DescribeNatGateways",
            "ec2:DescribeNetworkAcls",
            "ec2:DescribePrefixLists",
            "ec2:DescribeReservedInstancesOfferings",
            "ec2:DescribeRouteTables",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeSpotInstanceRequests",
            "ec2:DescribeSpotPriceHistory",
            "ec2:DescribeSubnets",
            "ec2:DescribeVolumes",
            "ec2:DescribeVpcAttribute",
            "ec2:DescribeVpcs",
            "ec2:DetachInternetGateway",
            "ec2:DetachVolume",
            "ec2:DisassociateIamInstanceProfile",
            "ec2:ReplaceIamInstanceProfileAssociation",
            "ec2:RequestSpotInstances",
            "ec2:RevokeSecurityGroupEgress",
            "ec2:RevokeSecurityGroupIngress",
            "ec2:RunInstances",
            "ec2:TerminateInstances"
          ],
          "Resource": [
            "*"
          ]
        },
        {
          "Effect": "Allow",
          "Action": [
              "iam:CreateServiceLinkedRole",
              "iam:PutRolePolicy"
          ],
          "Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
          "Condition": {
            "StringLike": {
                "iam:AWSServiceName": "spot.amazonaws.com"
            }
          }
        }
      ]
    }
    ```
1. Now let's click **Review policy** to get to the final page.
1. Let's assign the name for our policy (use *dbacademy-test-cross-account-policy-novpc*).
1. Click **Create policy**.
1. Let's take note of the **ARN**; the account administrator will need this in order to create a credential configuration that captures this IAM role.

## Creating the workspace root storage bucket

As we did in the *Supporting Databricks workspaces and metastore* lab, let's create an S3 bucket to function as the workspace root storage.

1. In the AWS S3 console, click **Create bucket**.
1. Let's specify a name. When choosing your own names, be mindful to not include dots in your names (use *dbacademy-test-workspace-bucket-novpc*).
1. Let's choose the region where we created the VPC.
1. For this example, let's accept the default settings for the rest, and create the bucket. We will need to revisit it in a moment to add a policy.

## Creating Databricks cloud resources

With everything created on the AWS side, let's go to the Databricks account console to create the resources needed to set up a new workspace.

### Creating the credential configuration

If you'll recall, the credential configuration is the piece that encapsulates the cross account IAM role. As we did in the *Supporting Databricks workspaces and metastores* lab, let's create a credential configuration for the cross-account IAM role we just created.

1. In the account console, let's click on the **Cloud Resources** icon in the left sidebar.
1. Let's click the **Credential configuration** tab.
1. Let's click **Add credential configuration**.
1. Let's provide a name for the configuration. This name will have no user visibility (use *dbacademy-test-credential-configuration-novpc*).
1. Paste the **ARN** for the role we created moments ago.
1. Finally, let's click **Add**.

### Creating a storage configuration

If you'll recall, the storage configuration is the piece that encapsulates the S3 bucket that will store workspace-related objects. Let's create that now.

1. Still in the **Cloud Resources** page, let's click the **Storage configuration** tab.
1. Let's click **Add storage configuration**.
1. Let's provide a name for the configuration. This name will have no user visibility (use *dbacademy-test-storage-configuration-novpc*).
1. Let's enter the name for the bucket we created moments ago (*dbacademy-test-workspace-bucket-novpc*).
1. Now we need to add a policy to that bucket. Let's click the **Generate policy** link and copy the JSON policy description.
1. Finally, let's click **Add**.

With a policy on the clipboard, let's revisit the S3 console to add that policy to the bucket we created earlier.

1. In the AWS S3 console, let's find the bucket we created and select it.
1. Let's click the **Permissions** tab.
1. In the **Bucket policy** area, click **Edit**.
1. Let's paste the JSON policy.
1. Finally, let's click **Save changes**.

### Creating the network configuration

The network configuration encapsulates the VPC and subnets which the workspace will use. In order to create this we will need, at a minimum, the following pieces of information related to the VPC we created earler:
* the VPC ID
* the IDs of the two private subnets
* the security group ID

Let's obtain that information now.

1. In the VPC Management Console let's filter on our VPC.
1. Let's take note of the VPC ID.
1. Select **Subnets**. The 4 subnets related to our VPC are displayed. Two of these are public and two are private; we are primarily interested in the private ones for now, which can be identified by their names. Let's take note of the Subnet IDs for both.
1. Finally, let's select **Security groups** and take note of the Security group ID.

Let's return to the **Cloud Resources** page of the account console.

1. In the **Network** tab, let's click **Add network configuration**.
1. Let's provide a name for the configuration. This name will have no user visibility (use *dbacademy-test-network-configuration-ws1*).
1. Supply the values we gathered for **VPC ID**, **Subnet IDs** and **Security group IDs**.
1. Finally, let's click **Add**.

## Creating a workspace

With all the supporting resources in place, we are now ready to create a workspace.

1. In the account console, let's click on the **Workspaces** icon in the left sidebar.
1. Let's click **Create workspace**.
1. Let's provide the **Workspace name** (let's use *dbacademy-test-workspace-ws1* for this example).
1. Let's fill out the **Workspace URL**.
1. Let's choose the region that matches the region in which we created the other resources.
1. Let's choose the credential configuration and storage configuration we created previously.
1. Let's leave **Unity Catalog** disabled. The VPC configuration in this example does not impact the procedure for creating and setting up a metastore, which we did in the *Supporting Databricks workspaces and metastores* lab.
1. Let's open **Advanced configurations**.
1. For **Network configuration**, let's select the network configuration we created earlier.
1. Finally, let's click **Save**.

The workspace will take a few moments to provision. Apart from completing faster, there will no apparent difference. But remember that in this scenario, the Databricks control plane is creating the workspace under a significantly reduced set of permissions, using a VPC that we created ourselves.

## Creating additional workspaces

Housing multiple workspaces is a common use case for customer-managed VPCs. But it's important to note that each workspace requires two private subnets that cannot be shared. Because of this, we must:
* Create two additional subnets in our VPC
* Create a new network configuration (since the account console will not allow a second workspace to be created using the same network configuration)

Before we proceed, note the following constraints:
* The subnets must be private (that is, IP addresses are private, with routing to the outside provided via a NAT)
* The subnets must be assigned an address block that doesn't overlap with any other subnets in the VPC
* The two must be in different availability zones
* Both must have a routing to the outside using the VPC's NAT

Let's do this now.

### Creating subnets

Let's go ahead and create the subnets.

1. In the **VPC Management Console** let's filter on our VPC.
1. Now let's select **Subnets**. Note the IPv4 CIDR blocks of the existing subnets, for we must create two new subnets that do not overlap. Based on the standard configuration offered by the VPC wizard, *10.0.160.0/20* and *10.0.176.0/20* are available.
1. Let's click **Create subnet**.
1. Let's select our VPC, *dbacademy-test-vpc-us-east-1-vpc*.
1. Let's specify a name. If we wish, we can adopt the convention used by the VPC creation wizard, or we can use a simpler approach. For the purpose of this exercise, let's simply use *my-subnet-01*.
1. Let's select *us-east-1a* for the **Availability Zone**.
1. Let's specify *10.0.160.0/20* for the **IPv4 CIDR block**.
1. Now let's click **Add new subnet** to fill in information for the second subnet:
   * *my-subnet-02* for the name
   * *us-east-1b* for the **Availability Zone**
   * *10.0.176.0/20* for the **IPv4 CIDR block**
1. Finally, let's click **Create subnet**.

#### Creating route tables

The two subnets we created will by default be associated with the VPC's default route table. However this route table lacks the needed routing to the outside world to communicate

According to the <a href="https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#subnets" target="_blank">documentation</a>, the route table for workspace subnets must have quad-zero (0.0.0.0/0) traffic that targets a NAT Gateway or your own managed NAT device or proxy appliance.

Let's set up a new route table that will accomplish this now.

1. In the **VPC Management Console** let's select **Route tables**.
1. Let's click **Create route table**.
1. Let's specify a name. Once again, we can keep the naming simple by choosing a name like *my-route-table-01*.
1. Let's select our VPC, *dbacademy-test-vpc-us-east-1-vpc*.
1. Let's click **Create route table**.
1. With the newly create table display, let's click **Edit routes**.
1. Now let's click **Add route**.
1. Specify *0.0.0.0/0* for the **Destination**.
1. For the **Target**, let's select *NAT gateway*. This will present the one and only NAT gateway available in the VPC, so let's choose that.
1. Let's click **Save changes**.

With a route table configured let's associate that with one of our subnets.

1. In the **VPC Management Console** let's select **Subnets**.
1. Let's locate and select the first subnet we created (*my-subnet-01*).
1. Select **Actions > Edit route table association**.
1. Select the route table we just created (*my-route-table-01*) and then click **Save**.

Now, let's repeat this process once more to create a similarly configured second route table, *my-route-table-02*, and associated that with *my-subnet-02*.

Before proceeding, let's take note of the two new subnet IDs that we will need to create a new network configuration. The VPC and security group IDs remain unchanged from before.

### Creating a new network configuration

Let's return to the **Cloud Resources** page of the account console to create a new network configuration encapsulating our new subnets.

1. In the **Network** tab, let's click **Add network configuration**.
1. Let's provide a name for the configuration (use *dbacademy-test-network-configuration-ws2*).
1. Supply the values for **VPC ID**, the **Subnet IDs** for the two subnets we just created, and **Security group IDs**.
1. Finally, let's click **Add**.

### Creating a second workspace

Finally, let's create a new workspace.

1. In the account console, let's click on the **Workspaces** icon in the left sidebar.
1. Let's click **Create workspace**.
1. Let's provide the **Workspace name** (let's use *dbacademy-test-workspace-ws2* for this example).
1. Let's fill out the **Workspace URL**.
1. Let's choose the region that matches the region in which we created the other resources.
1. Let's choose the credential configuration and storage configuration we used for the previous workspace.
1. As before, let's leave **Unity Catalog** disabled.
1. Let's open **Advanced configurations** and select the new network configuration.
1. Finally, let's click **Save**.

Once again, there will no apparent difference, but now the two workspaces will be sharing a VPC, its configuration, and all AWS resources within it. The ability to architect your Databricks setup in this way provides a significant amount of flexibility.

&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>