-
Notifications
You must be signed in to change notification settings - Fork 197
Feature/aws infra module #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive AWS infrastructure module (aws-infra) for Databricks deployments, providing a standardized foundation for developing AWS-based Databricks examples. The module creates VPC infrastructure, S3 storage buckets, IAM roles, and security configurations with optional features for Private Link, hub-spoke architecture, and network firewalls.
Key changes:
- Creates core AWS infrastructure components (VPC, S3 buckets, IAM roles, VPC endpoints)
- Adds optional Private Link support for secure Databricks connectivity
- Implements optional hub-spoke architecture with Transit Gateway and Network Firewall
- Includes comprehensive documentation and example configurations
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
modules/aws/aws-infra/main.tf |
Orchestrates module components and conditionally creates hub networking submodule |
modules/aws/aws-infra/variables.tf |
Defines input variables for networking, storage, IAM, security, and advanced networking configurations |
modules/aws/aws-infra/locals.tf |
Computes local values for tags, availability zones, subnet CIDRs, and IAM configurations |
modules/aws/aws-infra/networking.tf |
Creates VPC infrastructure using AWS VPC module with Databricks-specific security group rules |
modules/aws/aws-infra/workspacestorage.tf |
Creates root S3 bucket for Databricks workspace with encryption and public access blocking |
modules/aws/aws-infra/ucstorage.tf |
Creates Unity Catalog metastore and data S3 buckets with security configurations |
modules/aws/aws-infra/iam.tf |
Creates IAM roles and policies for Databricks cross-account access and Unity Catalog |
modules/aws/aws-infra/vpc-endpoints.tf |
Creates VPC endpoints for S3, STS, and Kinesis services |
modules/aws/aws-infra/private-link.tf |
Creates Private Link resources including subnets, security groups, and VPC endpoints |
modules/aws/aws-infra/outputs.tf |
Exposes VPC ID, S3 bucket names, and IAM role ARNs |
modules/aws/aws-infra/versions.tf |
Specifies Terraform and provider version requirements |
modules/aws/aws-infra/modules/hub-networking/transit-gateway.tf |
Implements Transit Gateway with hub-spoke architecture and routing configuration |
modules/aws/aws-infra/modules/hub-networking/firewall.tf |
Creates Network Firewall with FQDN and network-based rule groups |
modules/aws/aws-infra/modules/hub-networking/variables.tf |
Defines hub networking submodule input variables |
modules/aws/aws-infra/modules/hub-networking/locals.tf |
Computes hub VPC subnet CIDRs and Transit Gateway name |
modules/aws/aws-infra/modules/hub-networking/outputs.tf |
Exposes hub VPC ID |
modules/aws/aws-infra/components/iam.tf |
Duplicate IAM configuration file (appears to be unused) |
modules/aws/aws-infra/README.md |
Comprehensive documentation with architecture diagrams, usage examples, and configuration details |
Comments suppressed due to low confidence (1)
modules/aws/aws-infra/components/iam.tf:1
- This file appears to be a duplicate of
modules/aws/aws-infra/iam.tfwith identical content. Having duplicate IAM configurations can lead to maintenance issues and confusion about which file is the source of truth. Remove this duplicate file and use onlymodules/aws/aws-infra/iam.tf.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| "Module" = "aws-infra" | ||
| "Prefix" = var.prefix | ||
| "Region" = var.region | ||
| "CreatedDate" = formatdate("YYYY-MM-DD", timestamp()) |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using timestamp() in common tags will cause Terraform to detect changes on every plan/apply, even when no actual infrastructure changes are needed. This is a known anti-pattern that leads to unnecessary plan noise and potential state drift. Consider removing this tag or using a static value set once during initial deployment.
| "CreatedDate" = formatdate("YYYY-MM-DD", timestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, time_static would be better here: https://registry.terraform.io/providers/hashicorp/time/latest/docs/resources/static#basic-usage
| depends_on = [aws_s3_bucket_public_access_block.root] | ||
| } | ||
|
|
||
|
|
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| restrict_public_buckets = true | ||
| } | ||
|
|
||
|
|
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| } | ||
| } | ||
|
|
||
|
|
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| # Current region (for firewall rules) | ||
| current_region = var.region | ||
| } | ||
|
|
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| This module creates a complete AWS infrastructure foundation optimized for Databricks, featuring: | ||
|
|
||
| - **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking | ||
| - **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the encryption optional/configurable?
|
|
||
| - **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking | ||
| - **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog | ||
| - **👤 IAM Integration**: Cross-account and Unity Catalog roles with Databricks-generated policies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How configurable are policies? For cross-account we have different policy types: restricted/managed/...
| ## Module Components | ||
|
|
||
| ### Core Components (Always Created) | ||
| - **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always need a NAT gateway when implementing a Hub & Spoke architecture?
| ### Core Components (Always Created) | ||
| - **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module) | ||
| - **workspacestorage.tf** - Root S3 bucket for Databricks workspace | ||
| - **ucstorage.tf** - Unity Catalog S3 buckets (metastore & data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it makes sense to have a separate module for UC S3 buckets, but allow them to be linked with this module.
| # Unity Catalog Configuration | ||
| create_metastore_bucket = true | ||
| unity_catalog_account_id = "414351767826" | ||
| external_id = "12345678-1234-1234-1234-123456789abc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it makes sense to move to a separate module
| } | ||
| ``` | ||
|
|
||
| ## Inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use terraform-docs for generation of that tables
| } | ||
|
|
||
| # Cross-Account Role Policy | ||
| data "aws_iam_policy_document" "cross_account_policy" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| # Cross-Account Role Policy | ||
| data "aws_iam_policy_document" "cross_account_policy" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the duplication with modules/aws/aws-infra/components/iam.tf?
closes #178
This PR adds a standard module for all things AWS infra. The objective is to use the base aws infra for developing any databricks examples involving aws. It contains the following: