# Complete Implementation Guide: Patient Data Pipeline with Step Functions, dbt-glue, and AWS Glue

Based on our previous conversation, I'll provide you with a comprehensive, step-by-step implementation guide to build this data pipeline. This guide will take you from zero to a working pipeline with detailed instructions for every component.

## Project Overview

We're building a serverless data pipeline that:
1. Accepts Excel files with patient data (with messy `sex` column values)
2. Transforms the data using dbt models running on AWS Glue
3. Validates data quality with automated tests
4. Saves clean data to S3 for downstream consumption
5. Orchestrates everything with AWS Step Functions

## Phase 1: Repository and Project Structure Setup

### Step 1.1: Create the Project Repository

First, let's set up your project structure. This will be your main repository.

```bash
# Create the main project directory
mkdir patient-data-pipeline
cd patient-data-pipeline

# Initialize git repository
git init
```

### Step 1.2: Create the Complete Project Structure

Based on dbt best practices[1][2], create this exact folder structure:

```bash
# Create the main project directories
mkdir -p dbt_project/models/staging
mkdir -p dbt_project/models/marts
mkdir -p dbt_project/tests
mkdir -p dbt_project/macros
mkdir -p infrastructure/lambda_functions
mkdir -p infrastructure/step_functions
mkdir -p test_data
mkdir -p docs

# Create the files we'll need
touch dbt_project/dbt_project.yml
touch dbt_project/profiles.yml
touch infrastructure/requirements.txt
touch README.md
touch .gitignore
```

Your final structure should look like this:
```
patient-data-pipeline/
├── dbt_project/
│   ├── dbt_project.yml
│   ├── profiles.yml
│   ├── models│   ├── _sources.yml
│   │   │   ├── stg_patient_data.sql
│   │   │   └── schema.yml
│   │   └── marts/
│   │       └── patient_analytics/
│   ├── tests/
│   ├── macros/
│   └── README.md
├── infrastructure/
│   ├── lambda_functions/
│   │   └── dbt_runner/
│   │       ├── lambda_function.py
│   │       └── requirements.txt
│   └── step_functions/
│       └── state_machine.json
├── test_data/
│   └── sample_patient_data.xlsx
├── docs/
├── README.md
└── .gitignore
```

### Step 1.3: Create Essential Configuration Files

Create `.gitignore`:
```gitignore
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
.venv/
pip-log.txt
pip-delete-this-directory.txt

# dbt
target/
dbt_packages/
logs/
.dbt/

# AWS
.aws/
*.zip
*.pem

# OS
.DS_Store
Thumbs.db

# IDE
.vscode/
.idea/
*.swp
*.swo

# Terraform (if used later)
*.tfstate
*.tfstate.backup
.terraform/
```

## Phase 2: Local Development Environment Setup

### Step 2.1: Python Environment Setup

Based on the latest dbt-glue requirements[3][4], set up your local environment:

```bash
# Create Python virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install required packages
pip install dbt-core==1.8.0
pip install dbt-glue>=1.8.0
pip install boto3
pip install pandas
pip install openpyxl  # For Excel file handling
```

Create `infrastructure/requirements.txt`:
```txt
dbt-core==1.8.0
dbt-glue>=1.8.0
boto3>=1.26.0
pandas>=1.5.0
openpyxl>=3.1.0
```

### Step 2.2: AWS CLI Configuration

Ensure your AWS CLI is configured with appropriate credentials:

```bash
# Configure AWS CLI (if not already done)
aws configure

# Test your connection
aws sts get-caller-identity
```

## Phase 3: dbt Project Setup

### Step 3.1: Create dbt Project Configuration

Create `dbt_project/dbt_project.yml`:
```yaml
name: 'patient_data_pipeline'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'patient_data_pipeline'

# These configurations specify where dbt should look for different types of files.
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:
  - "target"
  - "dbt_packages"

models:
  patient_data_pipeline:
    # Configuration for staging models
    staging:
      +materialized: view
      +tags: ["staging"]
    
    # Configuration for mart models  
    marts:
      +materialized: table
      +tags: ["marts"]
      patient_analytics:
        +schema: patient_analytics
```

### Step 3.2: Create dbt Profile Configuration

Create `dbt_project/profiles.yml` based on the dbt-glue setup guide[3][5]:
```yaml
patient_data_pipeline:
  target: dev
  outputs:
    dev:
      type: glue
      role_arn: "arn:aws:iam::YOUR_ACCOUNT_ID:role/GlueInteractiveSessionRole"  # You'll create this in Phase 4
      region: us-east-1  # Change to your preferred region
      workers: 2
      worker_type: G.1X
      idle_timeout: 10
      glue_version: "4.0"
      schema: "patient_data_dev"
      location: "s3://patient-data-lake-YOUR_ACCOUNT_ID/processed/"  # You'll create this bucket
      glue_session_reuse: true
      session_provisioning_timeout_in_seconds: 120
```

### Step 3.3: Create dbt Models

Create `dbt_project/models/staging/_sources.yml`:
```yaml
version: 2

sources:
  - name: raw_data
    description: "Raw patient data from Excel uploads"
    tables:
      - name: patient_excel_data
        description: "Patient data with messy sex column values (0,1,M,F,male,female)"
        columns:
          - name: patient_id
            description: "Unique patient identifier"
            tests:
              - not_null
              - unique
          - name: sex
            description: "Sex column with mixed formats that need standardization"
```

Create `dbt_project/models/staging/stg_patient_data.sql`:
```sql
{{ config(
    materialized='table',
    tags=['staging', 'patient_data']
) }}

WITH source_data AS (
    SELECT 
        patient_id,
        sex AS raw_sex,
        CURRENT_TIMESTAMP() AS processed_at,
        '{{ run_started_at }}' AS dbt_run_timestamp
    FROM {{ source('raw_data', 'patient_excel_data') }}
),

cleaned_data AS (
    SELECT 
        patient_id,
        CASE 
            WHEN UPPER(TRIM(raw_sex)) IN ('M', 'MALE', '1') THEN 'M'
            WHEN UPPER(TRIM(raw_sex)) IN ('F', 'FEMALE', '0') THEN 'F'
            ELSE NULL
        END AS sex,
        raw_sex AS original_sex_value,
        processed_at,
        dbt_run_timestamp
    FROM source_data
)

SELECT * 
FROM cleaned_data
WHERE sex IS NOT NULL  -- Only keep records with valid sex values
```

Create `dbt_project/models/staging/schema.yml`:
```yaml
version: 2

models:
  - name: stg_patient_data
    description: "Cleaned and standardized patient data"
    columns:
      - name: patient_id
        description: "Unique patient identifier"
        tests:
          - not_null
          - unique
        
      - name: sex
        description: "Patient sex standardized to M or F"
        tests:
          - not_null
          - accepted_values:
              values: ['M', 'F']
              quote: true
        
      - name: original_sex_value
        description: "Original raw value before transformation"
        
      - name: processed_at
        description: "Timestamp when the record was processed"
        tests:
          - not_null
```

### Step 3.4: Create Test Data

Create `test_data/sample_patient_data.xlsx` with this content (create in Excel or using Python):

```python
# Run this to create test data
import pandas as pd

test_data = {
    'patient_id': ['P001', 'P002', 'P003', 'P004', 'P005', 'P006', 'P007'],
    'sex': ['M', 'F', '1', '0', 'male', 'female', 'MALE']
}

df = pd.DataFrame(test_data)
df.to_excel('test_data/sample_patient_data.xlsx', index=False)
print("Test data created successfully!")
```

## Phase 4: AWS Infrastructure Setup

### Step 4.1: Create IAM Roles

You'll need several IAM roles. Create these using the AWS Console or CLI.

**IAM Role 1: Glue Interactive Session Role**

Based on AWS Glue IAM requirements[6][7], create this role:

1. Go to AWS IAM Console → Roles → Create Role
2. Choose "Glue" as the service
3. Attach these policies:
   - `AWSGlueServiceRole`
   - `AmazonS3FullAccess` (or a more restrictive policy)
4. Name it: `GlueInteractiveSessionRole`

**IAM Role 2: Lambda Execution Role**

1. Create role for Lambda service
2. Attach these policies:
   - `AWSLambdaBasicExecutionRole`
   - `AWSGlueConsoleFullAccess`
   - `AmazonS3ReadOnlyAccess`
3. Name it: `PatientDataLambdaRole`

**IAM Role 3: Step Functions Execution Role**

1. Create role for Step Functions service
2. Attach these policies:
   - `AWSStepFunctionsFullAccess`
   - `AWSLambdaInvokeFunction`
3. Name it: `PatientDataStepFunctionsRole`

### Step 4.2: Create S3 Buckets

Create the required S3 buckets:

```bash
# Replace YOUR_ACCOUNT_ID with your actual AWS account ID
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Create raw data bucket
aws s3 mb s3://patient-data-raw-${AWS_ACCOUNT_ID}

# Create processed data bucket  
aws s3 mb s3://patient-data-lake-${AWS_ACCOUNT_ID}

# Create clean data bucket
aws s3 mb s3://patient-data-clean-${AWS_ACCOUNT_ID}

# Enable EventBridge notifications on the raw bucket
aws s3api put-bucket-notification-configuration \
  --bucket patient-data-raw-${AWS_ACCOUNT_ID} \
  --notification-configuration '{
    "EventBridgeConfiguration": {}
  }'
```

### Step 4.3: Create AWS Glue Database

```bash
# Create Glue database for our data
aws glue create-database \
  --database-input '{
    "Name": "patient_data_dev",
    "Description": "Database for patient data pipeline development"
  }'
```

## Phase 5: Lambda Function Creation

### Step 5.1: Create the dbt Runner Lambda Function

Create `infrastructure/lambda_functions/dbt_runner/lambda_function.py`:

```python
import json
import os
import subprocess
import boto3
import logging
from typing import Dict, Any

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def lambda_handler(event: Dict[str, Any], context) -> Dict[str, Any]:
    """
    Lambda function to run dbt transformations using AWS Glue Interactive Sessions
    """
    try:
        # Extract S3 details from the event
        s3_bucket = event.get('s3Bucket', '')
        s3_key = event.get('s3Key', '')
        
        logger.info(f"Processing file: s3://{s3_bucket}/{s3_key}")
        
        # Set up environment for dbt
        os.environ['DBT_PROFILES_DIR'] = '/tmp'
        os.environ['AWS_DEFAULT_REGION'] = os.environ.get('AWS_REGION', 'us-east-1')
        
        # Ensure the Excel data is accessible in Glue Data Catalog
        setup_glue_table(s3_bucket, s3_key)
        
        # Run dbt debug to verify connection
        debug_result = run_dbt_command(['dbt', 'debug', '--profiles-dir', '/tmp'])
        if debug_result['return_code'] != 0:
            raise Exception(f"dbt debug failed: {debug_result['stderr']}")
        
        # Run dbt transformations
        run_result = run_dbt_command([
            'dbt', 'run', 
            '--models', 'stg_patient_data',
            '--profiles-dir', '/tmp'
        ])
        
        # Run dbt tests
        test_result = run_dbt_command([
            'dbt', 'test', 
            '--models', 'stg_patient_data',
            '--profiles-dir', '/tmp'
        ])
        
        # Check results
        transformation_success = run_result['return_code'] == 0
        tests_passed = test_result['return_code'] == 0
        
        response = {
            'statusCode': 200,
            'transformationSuccess': transformation_success,
            'testsPass': tests_passed,
            'runOutput': run_result.get('stdout', ''),
            'testOutput': test_result.get('stdout', ''),
            'processedFile': f"s3://{s3_bucket}/{s3_key}"
        }
        
        if not (transformation_success and tests_passed):
            response['errors'] = {
                'run_errors': run_result.get('stderr', ''),
                'test_errors': test_result.get('stderr', '')
            }
            
        logger.info(f"Pipeline completed. Success: {transformation_success and tests_passed}")
        return response
        
    except Exception as e:
        logger.error(f"Error in lambda_handler: {str(e)}")
        return {
            'statusCode': 500,
            'error': str(e),
            'transformationSuccess': False,
            'testsPass': False
        }

def run_dbt_command(command: list) -> Dict[str, Any]:
    """Run a dbt command and capture output"""
    try:
        result = subprocess.run(
            command,
            capture_output=True,
            text=True,
            timeout=900,  # 15 minute timeout
            cwd='/tmp'
        )
        
        return {
            'return_code': result.returncode,
            'stdout': result.stdout,
            'stderr': result.stderr
        }
    except subprocess.TimeoutExpired:
        return {
            'return_code': 1,
            'stdout': '',
            'stderr': 'Command timed out after 15 minutes'
        }
    except Exception as e:
        return {
            'return_code': 1,
            'stdout': '',
            'stderr': str(e)
        }

def setup_glue_table(bucket: str, key: str) -> None:
    """
    Set up Glue table to make Excel data accessible to dbt
    This is a simplified version - in production, you'd want more robust table creation
    """
    glue_client = boto3.client('glue')
    
    # Extract table name from S3 key
    table_name = 'patient_excel_data'
    database_name = 'patient_data_dev'
    
    try:
        # Create or update Glue table pointing to the Excel file
        # Note: This assumes the Excel file has been converted to Parquet
        # In a real implementation, you'd have a separate process to convert Excel to Parquet
        
        table_input = {
            'Name': table_name,
            'StorageDescriptor': {
                'Columns': [
                    {'Name': 'patient_id', 'Type': 'string'},
                    {'Name': 'sex', 'Type': 'string'}
                ],
                'Location': f's3://{bucket}/processed/',
                'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat',
                'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',
                'SerdeInfo': {
                    'SerializationLibrary': 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
                }
            }
        }
        
        # Try to update existing table, create if it doesn't exist
        try:
            glue_client.update_table(
                DatabaseName=database_name,
                TableInput=table_input
            )
        except glue_client.exceptions.EntityNotFoundException:
            glue_client.create_table(
                DatabaseName=database_name,
                TableInput=table_input
            )
            
        logger.info(f"Glue table {table_name} set up successfully")
        
    except Exception as e:
        logger.error(f"Error setting up Glue table: {str(e)}")
        raise
```

Create `infrastructure/lambda_functions/dbt_runner/requirements.txt`:
```txt
dbt-core==1.8.0
dbt-glue>=1.8.0
boto3>=1.26.0
```

### Step 5.2: Deploy the Lambda Function

Package and deploy the Lambda function:

```bash
cd infrastructure/lambda_functions/dbt_runner

# Create deployment package
zip -r ../../../dbt-runner-lambda.zip . -x "*.pyc" "__pycache__/*"

# Add your dbt project to the Lambda package
cd ../../../dbt_project
zip -r ../dbt-runner-lambda.zip . -x "target/*" "dbt_packages/*" "logs/*"

cd ..

# Deploy Lambda function
aws lambda create-function \
  --function-name patient-data-dbt-runner \
  --runtime python3.11 \
  --role arn:aws:iam::YOUR_ACCOUNT_ID:role/PatientDataLambdaRole \
  --handler lambda_function.lambda_handler \
  --zip-file fileb://dbt-runner-lambda.zip \
  --timeout 900 \
  --memory-size 1024 \
  --environment Variables='{
    "AWS_DEFAULT_REGION":"us-east-1"
  }'
```

## Phase 6: Step Functions Workflow Creation

### Step 6.1: Create State Machine Definition

Create `infrastructure/step_functions/state_machine.json` based on Step Functions best practices[8][9]:

```json
{
  "Comment": "Patient Data Processing Pipeline with dbt and AWS Glue",
  "StartAt": "ProcessWithDBT",
  "States": {
    "ProcessWithDBT": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "arn:aws:lambda:us-east-1:YOUR_ACCOUNT_ID:function:patient-data-dbt-runner",
        "Payload": {
          "s3Bucket.$": "$.detail.bucket.name",
          "s3Key.$": "$.detail.object.key"
        }
      },
      "ResultPath": "$.dbtResult",
      "Next": "CheckProcessingResults",
      "Retry": [
        {
          "ErrorEquals": ["States.TaskFailed"],
          "IntervalSeconds": 30,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "Next": "ProcessingFailed",
          "ResultPath": "$.error"
        }
      ]
    },
    
    "CheckProcessingResults": {
      "Type": "Choice",
      "Choices": [
        {
          "And": [
            {
              "Variable": "$.dbtResult.Payload.transformationSuccess",
              "BooleanEquals": true
            },
            {
              "Variable": "$.dbtResult.Payload.testsPass", 
              "BooleanEquals": true
            }
          ],
          "Next": "ExportCleanData"
        }
      ],
      "Default": "ProcessingFailed"
    },
    
    "ExportCleanData": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:s3:copyObject",
      "Parameters": {
        "Bucket": "patient-data-clean-YOUR_ACCOUNT_ID",
        "CopySource": {
          "Bucket": "patient-data-lake-YOUR_ACCOUNT_ID",
          "Key.$": "States.Format('processed/stg_patient_data/{}', $.detail.object.key)"
        },
        "Key.$": "States.Format('processed/{}/clean_patient_data.parquet', $.detail.object.key)"
      },
      "Next": "Success",
      "ResultPath": "$.exportResult",
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "Next": "ExportFailed",
          "ResultPath": "$.exportError"
        }
      ]
    },
    
    "Success": {
      "Type": "Succeed",
      "Comment": "Pipeline completed successfully"
    },
    
    "ProcessingFailed": {
      "Type": "Fail",
      "Error": "ProcessingError",
      "Cause": "dbt transformation or testing failed"
    },
    
    "ExportFailed": {
      "Type": "Fail", 
      "Error": "ExportError",
      "Cause": "Failed to export clean data to final bucket"
    }
  }
}
```

### Step 6.2: Create the State Machine

```bash
# Create the state machine
aws stepfunctions create-state-machine \
  --name patient-data-pipeline \
  --definition file://infrastructure/step_functions/state_machine.json \
  --role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/PatientDataStepFunctionsRole
```

## Phase 7: EventBridge Rule Setup

### Step 7.1: Create EventBridge Rule

Based on S3 to EventBridge integration patterns[10][11], create the rule:

```bash
# Create EventBridge rule to trigger Step Functions on S3 uploads
aws events put-rule \
  --name patient-data-s3-trigger \
  --event-pattern '{
    "source": ["aws.s3"],
    "detail-type": ["Object Created"],
    "detail": {
      "bucket": {
        "name": ["patient-data-raw-'${AWS_ACCOUNT_ID}'"]
      },
      "object": {
        "key": [{
          "suffix": ".xlsx"
        }]
      }
    }
  }' \
  --state ENABLED

# Add Step Functions as target
aws events put-targets \
  --rule patient-data-s3-trigger \
  --targets '[{
    "Id": "1",
    "Arn": "arn:aws:states:us-east-1:'${AWS_ACCOUNT_ID}':stateMachine:patient-data-pipeline",
    "RoleArn": "arn:aws:iam::'${AWS_ACCOUNT_ID}':role/PatientDataStepFunctionsRole"
  }]'
```

## Phase 8: Testing the Complete Pipeline

### Step 8.1: Prepare Test Environment

First, test your dbt setup locally:

```bash
cd dbt_project

# Test dbt connection
dbt debug --profiles-dir .

# If debug passes, try running models (this will fail initially until AWS resources are set up)
dbt run --profiles-dir . --models stg_patient_data
```

### Step 8.2: Test the Complete Pipeline

1. **Upload test data to trigger the pipeline:**

```bash
# Upload the test Excel file
aws s3 cp test_data/sample_patient_data.xlsx s3://patient-data-raw-${AWS_ACCOUNT_ID}/test_uploads/sample_patient_data.xlsx
```

2. **Monitor the execution:**

```bash
# Check Step Functions executions
aws stepfunctions list-executions \
  --state-machine-arn arn:aws:states:us-east-1:${AWS_ACCOUNT_ID}:stateMachine:patient-data-pipeline \
  --max-items 5
```

3. **Check the results:**

```bash
# List processed files
aws s3 ls s3://patient-data-clean-${AWS_ACCOUNT_ID}/processed/ --recursive

# Check Glue Data Catalog
aws glue get-tables --database-name patient_data_dev
```

## Phase 9: Troubleshooting and Monitoring

### Step 9.1: Common Issues and Solutions

1. **dbt connection issues:**
   - Verify IAM roles have correct permissions
   - Check that Glue database exists
   - Ensure S3 buckets are accessible

2. **Lambda timeout issues:**
   - Increase Lambda timeout to 15 minutes
   - Check CloudWatch logs for detailed errors

3. **Step Functions failures:**
   - Review execution history in Step Functions console
   - Check EventBridge rule patterns match your S3 bucket

### Step 9.2: Monitoring Setup

Create CloudWatch dashboards to monitor:
- Lambda function execution times and errors
- Step Functions execution success/failure rates  
- S3 bucket object counts
- Glue job execution metrics

## Next Steps and Enhancements

Once your basic pipeline is working, consider these enhancements:

1. **Add data validation with Great Expectations integration**
2. **Implement data lineage tracking**
3. **Add automated data quality monitoring**
4. **Set up CI/CD for dbt model deployment**
5. **Add human-in-the-loop approval for data releases**
6. **Implement data versioning and rollback capabilities**

This implementation provides a solid foundation for a production-ready data pipeline that can handle real-world patient data processing requirements while maintaining data quality and security.

[1] https://www.thedataschool.co.uk/curtis-paterson/organising-a-dbt-project-best-practices/
[2] https://hevodata.com/data-transformation/dbt-best-practices/
[3] https://docs.getdbt.com/docs/core/connect-data-platform/glue-setup
[4] https://pypi.org/project/dbt-glue/
[5] https://aws.amazon.com/blogs/big-data/build-your-data-pipeline-in-your-aws-modern-data-platform-using-aws-lake-formation-aws-glue-and-dbt-core/
[6] https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html
[7] https://docs.aws.amazon.com/glue/latest/dg/glue-is-security.html
[8] https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html
[9] https://docs.aws.amazon.com/en_us/step-functions/latest/dg/concepts-amazon-states-language.html
[10] https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-cloudwatch-events-s3.html
[11] https://repost.aws/questions/QUw_gvpQ5tSAez6mg7FdwOjg/s3-trigger-for-step-function-using-event-bridge
[12] https://aws.amazon.com/blogs/big-data/build-and-manage-your-modern-data-stack-using-dbt-and-aws-glue-through-dbt-glue-the-new-trusted-dbt-adapter/
[13] https://dzone.com/articles/orchestrating-dbt-workflows-the-duel-of-apache-air
[14] https://www.mechanicalrock.io/blog/automating-data-workflows-with-aws-step-functions-an-example-using-fivetran-and-dbt
[15] https://www.thedataschool.co.uk/curtis-paterson/organising-a-dbt-project-best-practices
[16] https://www.secoda.co/learn/how-to-set-up-aws-glue-with-dbt-developer-hub
[17] https://dev.to/virajlakshitha/orchestrating-the-cloud-building-robust-workflows-with-aws-step-functions-3574
[18] https://jaehyeon.me/blog/2022-10-09-dbt-on-aws-part-2-glue/
[19] https://www.packtpub.com/en-us/product/data-engineering-with-aws-second-edition-9781804614426/chapter/orchestrating-the-data-pipeline-12/section/hands-on-orchestrating-a-data-pipeline-using-aws-step-functions-ch12lvl1sec85
[20] https://docs.getdbt.com/best-practices/how-we-mesh/mesh-3-structures
[21] https://www.jeeviacademy.com/how-to-use-aws-step-functions-to-orchestrate-workflows/
[22] https://docs.getdbt.com/best-practices/how-we-structure/5-the-rest-of-the-project
[23] https://github.com/AirLiquide/al-dbt-glue
[24] https://docs.aws.amazon.com/step-functions/latest/dg/sample-lambda-orchestration.html
[25] https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview
[26] https://www.youtube.com/watch?v=B7wfWCpfIdQ
[27] https://aws.amazon.com/step-functions/
[28] https://stackoverflow.com/questions/71255224/how-to-run-dbt-in-aws-lambda
[29] https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-resource-stepfunctions-statemachine.html
[30] https://docs.getdbt.com/reference/resource-configs/glue-configs
[31] https://aws.amazon.com/blogs/big-data/create-a-modern-data-platform-using-the-data-build-tool-dbt-in-the-aws-cloud/
[32] https://www.getorchestra.io/guides/dlt-concepts-deployment-on-aws-lambda
[33] https://www.linkedin.com/pulse/dbt-integration-aws-glue-runbook-chaitanya-varma
[34] https://discourse.getdbt.com/t/dbt-cloud-webhook-with-aws-lambda/10776
[35] https://docs.aws.amazon.com/step-functions/latest/dg/concepts-statemachines.html
[36] https://docs.aws.amazon.com/step-functions/latest/dg/input-output-example.html
[37] https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-configuration/
[38] https://www.datacamp.com/tutorial/aws-lambda
[39] https://asecure.cloud/a/p_stepfunctions_step_functions_state_machine_with_definitionsubstitutions/
[40] https://docs.aws.amazon.com/mwaa/latest/userguide/samples-dbt.html
[41] https://docs.aws.amazon.com/step-functions/latest/dg/concepts-input-output-filtering.html
[42] https://github.com/TheDataFoundryAU/dbt_sample_project
[43] https://dev.to/pizofreude/study-notes-421-422-dbt-project-setup-2d4d
[44] https://repost.aws/questions/QUIPd2s9UoTkGIMiVCepHHjg/how-to-trigger-a-step-function-from-a-s3-object-notification
[45] https://docs.getdbt.com/docs/build/projects
[46] https://www.youtube.com/watch?v=vpDOkl4X0Dc
[47] https://joonsolutions.com/build-dbt-project-from-scratch/
[48] https://docs.aws.amazon.com/step-functions/latest/dg/using-user-notifications-sfn.html
[49] https://aws.amazon.com/about-aws/whats-new/2023/09/aws-glue-interactive-sessions-kernel-support-iam-conditionals/
[50] https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
[51] https://tuanchris.com/blog/2021-10-11-anatomy-of-a-dbt-project/
[52] https://cevo.com.au/post/dbt-on-aws-part-2/
[53] https://stackoverflow.com/questions/58221024/trigger-aws-step-function-state-machine-with-s3-event-from-a-different-aws-accou
[54] https://articles.xebia.com/birth-of-dbt-excel-adapter
[55] https://xebia.com/blog/implementing-aws-cdk-cicd-with-cdk-pipelines/
[56] https://github.com/duckdb/dbt-duckdb
[57] https://dev.classmethod.jp/articles/s3-event-eventbridge-rule-step-functions/
[58] https://github.com/jbcodeforce/aws-cdk-project-templates
[59] https://stackoverflow.com/questions/63490730/using-external-parquet-tables-in-a-dbt-pipeline
[60] https://docs.aws.amazon.com/step-functions/latest/dg/eventbridge-integration.html
[61] https://github.com/danmgs/AWS.Pipeline.CloudFormation
[62] https://docs.getdbt.com/reference/resource-configs/duckdb-configs
[63] https://dev.to/zirkelc/eventbridge-rules-to-invoke-lambda-and-stepfunction-584m
[64] https://dev.to/annpastushko/create-serverless-data-pipeline-using-aws-cdk-python-5cg2
[65] https://pypi.org/project/dbt-excel/
[66] https://www.youtube.com/watch?v=f_uNbo2cunk
[67] https://blog.serverlessadvocate.com/serverless-aws-cdk-pipeline-best-practices-patterns-part-1-ab80962f109d
[68] https://stackoverflow.com/questions/64152343/convert-excel-to-parquet-file
[69] https://docs.aws.amazon.com/step-functions/latest/dg/service-quotas.html
[70] https://github.com/bhanotblocker/CDKTemplates
[71] https://github.com/AlexanderVR/dbt-parquet
[72] https://serverlessland.com/patterns/s3-eventbridge-sfn