Refactor Lambda deployment to use container images and streamline workflows#96
Refactor Lambda deployment to use container images and streamline workflows#96
Conversation
- Introduced Dockerfile.lambda for building a container image for Lambda functions, utilizing the AWS Lambda Python base image. - Updated pyproject.toml to define dependency groups for Lambda functions. - Modified GitHub Actions workflows to deploy Lambda functions using the new container image approach, replacing S3 bucket references with ECR image URIs. - Refactored CloudFormation templates to accommodate the new Lambda deployment method, including permissions for ECR access. - Removed legacy Lambda packaging scripts and updated related documentation to reflect the transition to container-based deployment.
|
@claude please review this PR |
|
Claude finished @jfrench9's task in 3m 11s —— View job PR Review Progress
Code ReviewThis is a well-structured refactoring that modernizes Lambda deployments from ZIP files to container images. The implementation is solid overall, but there are several areas that need attention before merging. Critical Issues1. Missing Lambda dependency group validation (cloudformation/*.yaml)The Issue: The dependency group includes Recommendation:
Location: 2. Incomplete workflow file rename (.github/workflows/deploy-postgres.yml)The workflow was renamed from Issue: Need to verify:
Recommendation: Verify the old workflow file was properly deleted in the PR. Location: 3. ARM64-only deployment lacks fallback (Dockerfile.lambda, build.yml)The Lambda container image is built exclusively for ARM64 (Graviton), which is great for cost savings. However, there's no fallback or documentation about what happens if ARM64 becomes unavailable in a region. Issue:
Recommendation:
Location: High Priority Issues4. ECR image pull permissions are broad (cloudformation/*.yaml)All CloudFormation templates add ECR permissions with a wildcard for Issue: This is correct, but the ARN uses a hardcoded repository name: Resource: !Sub "arn:${AWS::Partition}:ecr:${AWS::Region}:${AWS::AccountId}:repository/robosystems"Recommendation: Verify the ECR repository is actually named "robosystems" in all environments. If it varies by environment, this needs to be parameterized. Location: All cloudformation/*.yaml files (postgres.yaml:276, valkey.yaml:387, graph-infra.yaml:464, graph-volumes.yaml:188, etc.) 5. Lambda timeout unchanged despite container overhead (cloudformation/*.yaml)Container-based Lambdas typically have longer cold start times than ZIP-based deployments, especially for the first invocation or after scaling. However, all Lambda timeout values remain unchanged at 300 seconds (5 minutes). Issue: The PR description mentions "Monitor Lambda cold start times as container images may affect startup performance" but doesn't adjust timeouts as a precaution. Recommendation:
Location: All Lambda function definitions in CloudFormation templates 6. Manifest generation removed but no migration path documented (bin/tools/package-scripts.sh)The packaging script previously generated a manifest file ( Issue:
Recommendation:
Location: Medium Priority Issues7. Docker buildx cache strategy could be optimized (.github/workflows/build.yml)The Lambda image build uses Recommendation: Review if the main image build should also use Location: 8. Missing rollback documentationThe PR description mentions this is a breaking change but doesn't provide a rollback procedure. Issue: If the container deployment fails in production, teams need to know:
Recommendation: Add a "Rollback Procedure" section to the PR description or link to runbook documentation. 9. uv export command uses --frozen but no lock file validation (Dockerfile.lambda)The Dockerfile runs Issue:
Recommendation:
Location: Minor Issues / Suggestions10. Inconsistent Lambda handler naming convention (Dockerfile.lambda)The Lambda functions use two different naming patterns:
vs. the Dockerfile's default:
Observation: This is actually consistent (all use Recommendation: The default CMD in Dockerfile.lambda could include a comment explaining that each CloudFormation template overrides this via Location: 11. Build workflow output doesn't validate both images built successfullyThe build workflow now outputs two images ( Recommendation: Add a validation step at the end of the Location: 12. Dependency version ranges could be tighter (pyproject.toml)The Lambda dependency group uses wide version ranges:
Recommendation:
Location: 13. CloudFormation template parameter descriptions could mention container deploymentSeveral templates still have parameter descriptions that reference "S3 bucket" or "deployment package" without mentioning the new container-based approach. Example: Recommendation: Ensure all parameter descriptions are updated to reflect container deployment (spot check shows they are mostly updated, good work). Security Review✅ No major security concerns identified
Minor security note: The ARM64 base image Breaking Changes VerificationThe PR description lists breaking changes, which are accurate: ✅ Deployment Method: Correctly documented Testing RecommendationsThe PR description includes good testing notes. I'd add:
SummaryOverall Assessment: This is a well-executed refactoring with a clear architecture improvement. The code quality is high, and the approach is sound. However, there are several issues that should be addressed before merging: Must Fix Before Merge:
Should Fix Before Merge: Nice to Have: Approval Status: Request Changes - The redis version upgrade and missing documentation are blocking issues that should be resolved before merge. CI Status: 2 workflow runs currently in progress
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ImageConfig: | ||
| Command: | ||
| - graph_volume_monitor.lambda_handler |
There was a problem hiding this comment.
Point volume monitor to the actual handler function
The Lambda container entrypoint is set to graph_volume_monitor.lambda_handler, but bin/lambda/graph_volume_monitor.py only defines def handler(...) (line 48) and does not export lambda_handler. With container images, Lambda resolves the handler from ImageConfig.Command, so this will raise a “handler not found” error on every invocation and break scheduled monitoring/alarm-driven expansions. Use graph_volume_monitor.handler or add an alias in the module so the configured handler exists.
Useful? React with 👍 / 👎.
| ImageConfig: | ||
| Command: | ||
| - graph_volume_detachment.lambda_handler |
There was a problem hiding this comment.
Point volume detachment to the actual handler function
The Lambda container entrypoint is set to graph_volume_detachment.lambda_handler, but bin/lambda/graph_volume_detachment.py defines def handler(...) (line 31) and has no lambda_handler. With container-based Lambda, this mismatch causes cold start failures and prevents the ASG lifecycle hook from detaching volumes. Update the command to graph_volume_detachment.handler or export a lambda_handler alias in the module.
Useful? React with 👍 / 👎.
| runs-on: ubuntu-latest | ||
| outputs: | ||
| lambda_image: ${{ steps.build-lambda.outputs.lambda_image }} | ||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| repository: ${{ github.repository }} | ||
| ref: ${{ github.ref }} | ||
| token: ${{ secrets.ACTIONS_TOKEN }} | ||
|
|
||
| - name: Set up QEMU | ||
| uses: docker/setup-qemu-action@v3 | ||
|
|
||
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v3 | ||
| with: | ||
| driver-opts: | | ||
| image=moby/buildkit:master | ||
| network=host | ||
|
|
||
| - name: Configure AWS credentials | ||
| uses: aws-actions/configure-aws-credentials@v4 | ||
| with: | ||
| aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | ||
| aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | ||
| aws-region: ${{ inputs.aws_region }} | ||
|
|
||
| - name: Login to Amazon ECR | ||
| id: login-ecr | ||
| uses: aws-actions/amazon-ecr-login@v2 | ||
| with: | ||
| mask-password: "true" | ||
|
|
||
| - name: Build and push Lambda container image | ||
| id: build-lambda | ||
| env: | ||
| ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }} | ||
| ECR_REPOSITORY: ${{ inputs.ecr_repository }} | ||
| run: | | ||
| # Determine version tag (same logic as main build) | ||
| if [[ "$GITHUB_REF" == refs/tags/v*.*.* ]]; then | ||
| VERSION_TAG=${GITHUB_REF#refs/tags/} | ||
| elif [[ "$GITHUB_REF" == refs/heads/release/* ]]; then | ||
| BRANCH_VERSION=${GITHUB_REF#refs/heads/release/} | ||
| VERSION_TAG="v${BRANCH_VERSION}-hotfix.${GITHUB_SHA:0:8}" | ||
| else | ||
| VERSION_TAG="git-${GITHUB_SHA:0:8}" | ||
| fi | ||
|
|
||
| # Lambda image tags (ARM64 only - for Graviton cost savings) | ||
| LAMBDA_TAG="lambda-${{ inputs.environment }}" | ||
| LAMBDA_VERSION_TAG="lambda-$VERSION_TAG" | ||
|
|
||
| echo "🐑 Building Lambda container image: $LAMBDA_TAG" | ||
|
|
||
| # Build Lambda image (ARM64 only) | ||
| docker buildx build \ | ||
| --platform linux/arm64 \ | ||
| --push \ | ||
| --cache-from type=registry,ref=$ECR_REGISTRY/$ECR_REPOSITORY:lambda-cache \ | ||
| --cache-to type=registry,ref=$ECR_REGISTRY/$ECR_REPOSITORY:lambda-cache,mode=max \ | ||
| -t $ECR_REGISTRY/$ECR_REPOSITORY:$LAMBDA_TAG \ | ||
| -t $ECR_REGISTRY/$ECR_REPOSITORY:$LAMBDA_VERSION_TAG \ | ||
| -f Dockerfile.lambda . | ||
|
|
||
| # Verify the image was pushed | ||
| aws ecr describe-images --repository-name $ECR_REPOSITORY --image-ids imageTag=$LAMBDA_TAG || exit 1 | ||
|
|
||
| LAMBDA_IMAGE="$ECR_REGISTRY/$ECR_REPOSITORY:$LAMBDA_TAG" | ||
| echo "lambda_image=$LAMBDA_IMAGE" >> $GITHUB_OUTPUT | ||
| echo "✅ Lambda image pushed: $LAMBDA_IMAGE" |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
…cripts - Introduced a new alias `lambda_handler` for the existing `handler` function in both `graph_volume_detachment.py` and `graph_volume_monitor.py`. - This change facilitates Lambda container deployment using CloudFormation ImageConfig.Command.
Summary
This PR modernizes our Lambda deployment architecture by introducing container image support and significantly streamlining our CI/CD workflows. The changes reduce complexity while improving maintainability and deployment reliability.
Key Accomplishments
Lambda Container Image Support
Workflow Optimization
Infrastructure Updates
Dependency Management
Breaking Changes
Testing Notes
Infrastructure Considerations
🤖 Generated with Claude Code
Branch Info:
refactor/lambda-packagingmainCo-Authored-By: Claude noreply@anthropic.com