Skip to content

Conversation

@HariGS-DB
Copy link

closes #178

This PR adds a standard module for all things AWS infra. The objective is to use the base aws infra for developing any databricks examples involving aws. It contains the following:

  1. code for s3 buckets, iam roles, vpc, sg
  2. optional flag to create resources for private link backend
  3. optional flag to create hub and spoke arch
  4. optional flag to create firewall to control outbound

@HariGS-DB HariGS-DB requested review from a team as code owners November 11, 2025 16:18
@HariGS-DB HariGS-DB requested review from alexott and rauchy November 11, 2025 16:18
@alexott alexott requested a review from Copilot November 25, 2025 14:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive AWS infrastructure module (aws-infra) for Databricks deployments, providing a standardized foundation for developing AWS-based Databricks examples. The module creates VPC infrastructure, S3 storage buckets, IAM roles, and security configurations with optional features for Private Link, hub-spoke architecture, and network firewalls.

Key changes:

  • Creates core AWS infrastructure components (VPC, S3 buckets, IAM roles, VPC endpoints)
  • Adds optional Private Link support for secure Databricks connectivity
  • Implements optional hub-spoke architecture with Transit Gateway and Network Firewall
  • Includes comprehensive documentation and example configurations

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
modules/aws/aws-infra/main.tf Orchestrates module components and conditionally creates hub networking submodule
modules/aws/aws-infra/variables.tf Defines input variables for networking, storage, IAM, security, and advanced networking configurations
modules/aws/aws-infra/locals.tf Computes local values for tags, availability zones, subnet CIDRs, and IAM configurations
modules/aws/aws-infra/networking.tf Creates VPC infrastructure using AWS VPC module with Databricks-specific security group rules
modules/aws/aws-infra/workspacestorage.tf Creates root S3 bucket for Databricks workspace with encryption and public access blocking
modules/aws/aws-infra/ucstorage.tf Creates Unity Catalog metastore and data S3 buckets with security configurations
modules/aws/aws-infra/iam.tf Creates IAM roles and policies for Databricks cross-account access and Unity Catalog
modules/aws/aws-infra/vpc-endpoints.tf Creates VPC endpoints for S3, STS, and Kinesis services
modules/aws/aws-infra/private-link.tf Creates Private Link resources including subnets, security groups, and VPC endpoints
modules/aws/aws-infra/outputs.tf Exposes VPC ID, S3 bucket names, and IAM role ARNs
modules/aws/aws-infra/versions.tf Specifies Terraform and provider version requirements
modules/aws/aws-infra/modules/hub-networking/transit-gateway.tf Implements Transit Gateway with hub-spoke architecture and routing configuration
modules/aws/aws-infra/modules/hub-networking/firewall.tf Creates Network Firewall with FQDN and network-based rule groups
modules/aws/aws-infra/modules/hub-networking/variables.tf Defines hub networking submodule input variables
modules/aws/aws-infra/modules/hub-networking/locals.tf Computes hub VPC subnet CIDRs and Transit Gateway name
modules/aws/aws-infra/modules/hub-networking/outputs.tf Exposes hub VPC ID
modules/aws/aws-infra/components/iam.tf Duplicate IAM configuration file (appears to be unused)
modules/aws/aws-infra/README.md Comprehensive documentation with architecture diagrams, usage examples, and configuration details
Comments suppressed due to low confidence (1)

modules/aws/aws-infra/components/iam.tf:1

  • This file appears to be a duplicate of modules/aws/aws-infra/iam.tf with identical content. Having duplicate IAM configurations can lead to maintenance issues and confusion about which file is the source of truth. Remove this duplicate file and use only modules/aws/aws-infra/iam.tf.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

"Module" = "aws-infra"
"Prefix" = var.prefix
"Region" = var.region
"CreatedDate" = formatdate("YYYY-MM-DD", timestamp())
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using timestamp() in common tags will cause Terraform to detect changes on every plan/apply, even when no actual infrastructure changes are needed. This is a known anti-pattern that leads to unnecessary plan noise and potential state drift. Consider removing this tag or using a static value set once during initial deployment.

Suggested change
"CreatedDate" = formatdate("YYYY-MM-DD", timestamp())

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends_on = [aws_s3_bucket_public_access_block.root]
}


Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
restrict_public_buckets = true
}


Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
}
}


Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
# Current region (for firewall rules)
current_region = var.region
}

Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
This module creates a complete AWS infrastructure foundation optimized for Databricks, featuring:

- **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking
- **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the encryption optional/configurable?


- **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking
- **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog
- **👤 IAM Integration**: Cross-account and Unity Catalog roles with Databricks-generated policies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How configurable are policies? For cross-account we have different policy types: restricted/managed/...

## Module Components

### Core Components (Always Created)
- **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always need a NAT gateway when implementing a Hub & Spoke architecture?

### Core Components (Always Created)
- **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module)
- **workspacestorage.tf** - Root S3 bucket for Databricks workspace
- **ucstorage.tf** - Unity Catalog S3 buckets (metastore & data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it makes sense to have a separate module for UC S3 buckets, but allow them to be linked with this module.

Comment on lines +214 to +217
# Unity Catalog Configuration
create_metastore_bucket = true
unity_catalog_account_id = "414351767826"
external_id = "12345678-1234-1234-1234-123456789abc"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it makes sense to move to a separate module

}
```

## Inputs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use terraform-docs for generation of that tables

}

# Cross-Account Role Policy
data "aws_iam_policy_document" "cross_account_policy" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

# Cross-Account Role Policy
data "aws_iam_policy_document" "cross_account_policy" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the duplication with modules/aws/aws-infra/components/iam.tf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize aws infra modules

3 participants