diff --git a/workshops/modernizr/entity_relationships_diagram.md b/workshops/modernizr/entity_relationships_diagram.md new file mode 100644 index 00000000..56740a0c --- /dev/null +++ b/workshops/modernizr/entity_relationships_diagram.md @@ -0,0 +1,118 @@ +# Entity Relationship Diagram - Modernizr E-commerce Platform + +This diagram shows the entity relationships for the DynamoDB design based on the API access patterns analysis. + +```mermaid +erDiagram + USERS { + string id PK + string username UK "Unique" + string email UK "Unique" + string password_hash + string first_name + string last_name + boolean is_seller + datetime created_at + datetime updated_at + } + + CATEGORIES { + string id PK + string name + string description + string parent_id FK "Self-referencing" + datetime created_at + datetime updated_at + } + + PRODUCTS { + string id PK + string name + string description + decimal price + integer stock_quantity + string category_id FK + string seller_id FK + string image_url + datetime created_at + datetime updated_at + } + + ORDERS { + string id PK + string user_id FK + string status + decimal total_amount + datetime order_date + datetime updated_at + string shipping_address + } + + ORDER_ITEMS { + string order_id PK,FK + string product_id PK,FK + integer quantity + decimal unit_price + decimal total_price + } + + CART_ITEMS { + string user_id PK,FK + string product_id PK,FK + integer quantity + datetime added_at + datetime updated_at + } + + %% User Relationships + USERS ||--o{ PRODUCTS : "sells (as seller)" + USERS ||--o{ ORDERS : "places (as customer)" + USERS ||--o{ CART_ITEMS : "has items in cart" + + %% Category Relationships + CATEGORIES ||--o{ PRODUCTS : "contains" + CATEGORIES ||--o{ CATEGORIES : "parent-child hierarchy" + + %% Product Relationships + PRODUCTS ||--o{ ORDER_ITEMS : "ordered in" + PRODUCTS ||--o{ CART_ITEMS : "added to cart" + + %% Order Relationships + ORDERS ||--o{ ORDER_ITEMS : "contains items" +``` + +## Key Relationship Notes: + +### **One-to-Many Relationships:** +- **Users → Products**: A seller (user) can have many products +- **Users → Orders**: A customer (user) can have many orders +- **Users → Cart Items**: A user can have many items in their cart +- **Categories → Products**: A category can contain many products +- **Products → Order Items**: A product can appear in many order items +- **Products → Cart Items**: A product can be in many users' carts +- **Orders → Order Items**: An order can contain many order items + +### **Self-Referencing Relationship:** +- **Categories → Categories**: Categories form a hierarchical tree structure where each category can have a parent category + +### **Composite Primary Keys:** +- **Order Items**: `(order_id, product_id)` - Links orders to products with quantities +- **Cart Items**: `(user_id, product_id)` - Links users to products in their shopping cart + +### **Unique Constraints:** +- **Users**: `username` and `email` must be unique across all users + +This entity model supports all 48 access patterns identified in the API analysis, including: +- User authentication and profile management +- Product catalog browsing and management +- Category hierarchy navigation +- Order processing and history +- Shopping cart operations +- Seller-specific functionality + +The relationships enable efficient querying for complex operations like: +- Product searches with category filtering +- User order history with item details +- Seller product management +- Cart validation and checkout processing +- Category breadcrumb navigation diff --git a/workshops/modernizr/prompts/07-data-migration-execution/design.md b/workshops/modernizr/prompts/07-data-migration-execution/design.md index 4d0ba6b5..866f42dd 100644 --- a/workshops/modernizr/prompts/07-data-migration-execution/design.md +++ b/workshops/modernizr/prompts/07-data-migration-execution/design.md @@ -2,448 +2,763 @@ ## Overview -The Data Migration Execution stage implements a safe, monitored data migration process using MySQL views for data transformation and AWS Glue ETL jobs for data transfer. The design emphasizes data integrity, comprehensive validation, and detailed monitoring throughout the migration process. +The Data Migration Execution stage implements a comprehensive MCP server-driven approach to migrate data from MySQL to DynamoDB. The design emphasizes proper MCP server integration, replacing subprocess calls with native MCP server operations for MySQL, AWS Glue, DynamoDB, and S3 services. ## Architecture -### Migration Execution Architecture +### MCP Server-Driven Migration Architecture ```mermaid graph TD - A[migrationContract.json] --> B[MySQL Views Generation] - B --> C[View Execution on MySQL] - C --> D[AWS Glue Job Creation] - D --> E[ETL Job Execution] - E --> F[Data Validation] - F --> G[Migration Reports] + A[Migration Contract] --> B[MCP Server Orchestrator] + B --> C[MySQL MCP Server] + B --> D[Glue MCP Server] + B --> E[DynamoDB MCP Server] + B --> F[S3 MCP Server] - subgraph "Data Sources" - H[MySQL Database] - I[Generated Views] - end + C --> G[View Generation & Execution] + D --> H[ETL Job Creation & Management] + E --> I[Table Creation & Validation] + F --> J[Script Management & Storage] + + G --> K[Data Transformation] + H --> L[Migration Execution] + I --> M[Schema Validation] + J --> N[Resource Management] + + K --> O[Migration Monitoring] + L --> O + M --> O + N --> O + + O --> P[Validation & Reporting] - subgraph "AWS Infrastructure" - J[DynamoDB Tables] - K[AWS Glue Jobs] - L[CloudWatch Monitoring] + subgraph "MCP Server Integration" + Q[MySQL Operations] + R[Glue Operations] + S[DynamoDB Operations] + T[S3 Operations] end - B --> H - C --> I - D --> K - E --> J - F --> L + C --> Q + D --> R + E --> S + F --> T ``` ## Components and Interfaces -### 1. MySQL Views Generation System -**Purpose**: Transform MySQL data structure to match DynamoDB requirements - -**View Generation Process**: -```python -class MySQLViewGenerator: - def __init__(self, migration_contract: MigrationContract): - self.contract = migration_contract - self.generated_views = [] +### 1. MCP Server Orchestrator +**Purpose**: Coordinate operations across multiple MCP servers and manage migration workflow + +**Orchestration Process**: +```typescript +interface MCPServerOrchestrator { + validateMCPServers(): Promise; + executeMigration(contract: MigrationContract): Promise; + monitorProgress(): Promise; + handleErrors(error: MCPError): Promise; +} + +class DataMigrationOrchestrator implements MCPServerOrchestrator { + private mysqlMCP: MySQLMCPClient; + private glueMCP: GlueMCPClient; + private dynamodbMCP: DynamoDBMCPClient; + private s3MCP: S3MCPClient; - def generate_views_for_all_tables(self) -> str: - """Generate MySQL views for each DynamoDB table in the migration contract""" - sql_statements = [] - - for table_config in self.contract: - if table_config['type'] == 'Table': - view_sql = self.generate_view_for_table(table_config) - sql_statements.append(view_sql) - self.generated_views.append({ - 'table_name': table_config['table'], - 'view_name': f"migration_view_{table_config['table']}", - 'source_table': table_config['source_table'] - }) - - return '\n\n'.join(sql_statements) + constructor() { + this.mysqlMCP = new MySQLMCPClient(); + this.glueMCP = new GlueMCPClient(); + this.dynamodbMCP = new DynamoDBMCPClient(); + this.s3MCP = new S3MCPClient(); + } - def generate_view_for_table(self, table_config: dict) -> str: - """Generate SQL view for a specific DynamoDB table""" - view_name = f"migration_view_{table_config['table']}" - source_table = table_config['source_table'] - - # Build SELECT clause with attribute mappings - select_clauses = [] - join_clauses = [] - - for attr_name, attr_config in table_config['attributes'].items(): - if attr_config.get('denormalized', False): - # Handle denormalized attributes with joins - join_info = attr_config['join'] - join_table = attr_config['source_table'] - - select_clauses.append( - f"{join_table}.{attr_config['source_column']} AS {attr_name}" - ) - - join_clauses.append( - f"LEFT JOIN {join_table} ON {source_table}.{join_info['local_column']} = {join_table}.{join_info['source_column']}" - ) - else: - # Handle regular attributes - select_clauses.append( - f"{source_table}.{attr_config['source_column']} AS {attr_name}" - ) + async validateMCPServers(): Promise { + const validations = await Promise.all([ + this.validateMySQLMCP(), + this.validateGlueMCP(), + this.validateDynamoDBMCP(), + this.validateS3MCP() + ]); + + return { + allServersOperational: validations.every(v => v.isOperational), + serverStatuses: validations, + readyForMigration: validations.every(v => v.isOperational && v.hasRequiredPermissions) + }; + } + + async executeMigration(contract: MigrationContract): Promise { + // Phase 1: MySQL View Generation and Execution + const viewResults = await this.executeViewGeneration(contract); + + // Phase 2: DynamoDB Table Creation + const tableResults = await this.createDynamoDBTables(contract); + + // Phase 3: Glue ETL Job Creation and Execution + const migrationResults = await this.executeGlueJobs(contract, viewResults, tableResults); + + // Phase 4: Data Validation and Reporting + const validationResults = await this.validateMigration(contract, migrationResults); - # Construct the complete SQL view - sql = f""" -CREATE OR REPLACE VIEW {view_name} AS -SELECT - {', '.join(select_clauses)} -FROM {source_table} -{' '.join(join_clauses)} -WHERE {source_table}.deleted_at IS NULL -- Exclude soft-deleted records -ORDER BY {source_table}.id; -""" - - return sql.strip() + return { + success: validationResults.isValid, + viewsCreated: viewResults.createdViews, + tablesCreated: tableResults.createdTables, + jobsExecuted: migrationResults.executedJobs, + validationReport: validationResults, + migrationMetrics: this.generateMetrics(viewResults, tableResults, migrationResults) + }; + } + + private async validateMySQLMCP(): Promise { + try { + const connectionTest = await this.mysqlMCP.testConnection(); + const permissionTest = await this.mysqlMCP.validatePermissions(['CREATE VIEW', 'SELECT', 'SHOW TABLES']); + + return { + serverType: 'MySQL MCP', + isOperational: connectionTest.success, + hasRequiredPermissions: permissionTest.hasAllPermissions, + lastChecked: new Date(), + details: { + connectionInfo: connectionTest.connectionInfo, + availableOperations: permissionTest.availableOperations + } + }; + } catch (error) { + return { + serverType: 'MySQL MCP', + isOperational: false, + hasRequiredPermissions: false, + lastChecked: new Date(), + error: error.message + }; + } + } + + private async validateGlueMCP(): Promise { + try { + const serviceTest = await this.glueMCP.testService(); + const permissionTest = await this.glueMCP.validatePermissions([ + 'glue:CreateJob', + 'glue:StartJobRun', + 'glue:GetJobRun', + 's3:PutObject', + 's3:GetObject' + ]); + + return { + serverType: 'Glue MCP', + isOperational: serviceTest.success, + hasRequiredPermissions: permissionTest.hasAllPermissions, + lastChecked: new Date(), + details: { + availableRegions: serviceTest.availableRegions, + supportedJobTypes: serviceTest.supportedJobTypes + } + }; + } catch (error) { + return { + serverType: 'Glue MCP', + isOperational: false, + hasRequiredPermissions: false, + lastChecked: new Date(), + error: error.message + }; + } + } +} ``` -### 2. AWS Glue ETL Job Management System -**Purpose**: Create and execute ETL jobs for data migration +### 2. MySQL MCP Integration System +**Purpose**: Handle all MySQL operations through MCP server calls + +**MySQL Operations**: +```typescript +interface MySQLMCPClient { + discoverConnection(): Promise; + executeQuery(query: string): Promise; + createView(viewDefinition: ViewDefinition): Promise; + validateView(viewName: string): Promise; +} -**Glue Job Configuration**: -```python -class GlueJobManager: - def __init__(self, region: str, migration_contract: MigrationContract): - self.glue_client = boto3.client('glue', region_name=region) - self.region = region - self.contract = migration_contract +class MySQLMCPOperations implements MySQLMCPClient { + private mcpClient: MCPClient; - def create_migration_jobs(self, view_names: List[str]) -> List[str]: - """Create AWS Glue jobs for each migration view""" - job_names = [] - - for i, table_config in enumerate(self.contract): - if table_config['type'] == 'Table': - job_name = f"mysql-to-dynamodb-{table_config['table']}" - view_name = view_names[i] - - job_definition = self.create_job_definition( - job_name, - table_config, - view_name - ) - - self.glue_client.create_job(**job_definition) - job_names.append(job_name) + constructor(mcpEndpoint: string) { + this.mcpClient = new MCPClient(mcpEndpoint); + } + + async discoverConnection(): Promise { + const response = await this.mcpClient.call('mysql_discover_connection', { + discovery_hosts: ['localhost', '127.0.0.1'], + discovery_ports: [3306, 3307, 33060], + discovery_users: ['root', 'mysql', 'admin'], + timeout: 30 + }); + + if (!response.success) { + throw new Error(`MySQL connection discovery failed: ${response.error}`); + } - return job_names + return { + host: response.connection.host, + port: response.connection.port, + user: response.connection.user, + database: response.connection.database, + availableDatabases: response.connection.available_databases + }; + } - def create_job_definition(self, job_name: str, table_config: dict, view_name: str) -> dict: - """Create Glue job definition for a specific table migration""" + async createView(viewDefinition: ViewDefinition): Promise { + const response = await this.mcpClient.call('mysql_execute_ddl', { + query: viewDefinition.sql, + operation_type: 'CREATE_VIEW', + view_name: viewDefinition.name + }); + + if (!response.success) { + throw new Error(`View creation failed for ${viewDefinition.name}: ${response.error}`); + } + return { - 'Name': job_name, - 'Role': 'arn:aws:iam::ACCOUNT:role/GlueServiceRole', # User must provide - 'Command': { - 'Name': 'glueetl', - 'ScriptLocation': f's3://migration-scripts/{job_name}.py', - 'PythonVersion': '3' - }, - 'DefaultArguments': { - '--job-bookmark-option': 'job-bookmark-enable', - '--enable-metrics': 'true', - '--enable-continuous-cloudwatch-log': 'true', - '--mysql-view-name': view_name, - '--dynamodb-table-name': table_config['table'], - '--region': self.region, - '--migration-contract': json.dumps(table_config) - }, - 'MaxRetries': 3, - 'Timeout': 2880, # 48 hours - 'GlueVersion': '3.0', - 'NumberOfWorkers': 10, - 'WorkerType': 'G.1X', - 'Tags': { - 'Project': 'MySQL-DynamoDB-Migration', - 'Table': table_config['table'], - 'Environment': 'Migration' + viewName: viewDefinition.name, + created: true, + rowCount: await this.getViewRowCount(viewDefinition.name), + executionTime: response.execution_time + }; + } + + async generateViewsFromContract(contract: MigrationContract): Promise { + const results: ViewGenerationResult[] = []; + + for (const tableConfig of contract.tables) { + try { + const viewDefinition = this.generateViewDefinition(tableConfig); + const creationResult = await this.createView(viewDefinition); + + results.push({ + tableName: tableConfig.table, + viewName: viewDefinition.name, + success: true, + result: creationResult + }); + } catch (error) { + results.push({ + tableName: tableConfig.table, + viewName: `ddb_${tableConfig.table.toLowerCase()}_view`, + success: false, + error: error.message + }); } } + + return results; + } - def execute_migration_jobs(self, job_names: List[str]) -> List[str]: - """Execute all migration jobs and return job run IDs""" - job_run_ids = [] - - for job_name in job_names: - response = self.glue_client.start_job_run( - JobName=job_name, - Arguments={ - '--migration-timestamp': datetime.now().isoformat() - } - ) - job_run_ids.append(response['JobRunId']) + private generateViewDefinition(tableConfig: MigrationContractEntry): ViewDefinition { + const viewName = `ddb_${tableConfig.table.toLowerCase()}_view`; + let sql = `CREATE OR REPLACE VIEW ${viewName} AS SELECT `; + + // Build SELECT clause based on attributes and join patterns + const selectClauses: string[] = []; - return job_run_ids + for (const attribute of tableConfig.attributes) { + if (attribute.join) { + selectClauses.push(this.generateJoinClause(attribute)); + } else { + selectClauses.push(`${tableConfig.source_table}.${attribute.source_column} AS ${attribute.name}`); + } + } + + sql += selectClauses.join(', '); + sql += ` FROM ${tableConfig.source_table}`; + + // Add JOIN clauses for complex patterns + const joinClauses = this.generateJoinClauses(tableConfig); + if (joinClauses.length > 0) { + sql += ' ' + joinClauses.join(' '); + } + + return { + name: viewName, + sql: sql, + sourceTable: tableConfig.source_table, + targetTable: tableConfig.table + }; + } + + private generateJoinClause(attribute: AttributeDefinition): string { + switch (attribute.join.type) { + case 'self-join': + return `COALESCE(${attribute.join.join_alias}.${attribute.join.select_column}, '${attribute.join.null_value}') AS ${attribute.name}`; + + case 'foreign-key': + return `${attribute.join.target_table}.${attribute.join.select_column} AS ${attribute.name}`; + + case 'chain': + if (attribute.join.chain_separator) { + const chainParts = attribute.join.joins.map(join => join.select_column); + return `CONCAT_WS('${attribute.join.chain_separator}', ${chainParts.join(', ')}) AS ${attribute.name}`; + } else { + const lastJoin = attribute.join.joins[attribute.join.joins.length - 1]; + return `${lastJoin.target_table}.${lastJoin.select_column} AS ${attribute.name}`; + } + + case 'conditional': + return `CASE WHEN ${attribute.join.condition} THEN ${attribute.join.target_table}.${attribute.join.select_column} ELSE '${attribute.join.else_value}' END AS ${attribute.name}`; + + case 'json-construction': + return `(SELECT JSON_ARRAYAGG(JSON_OBJECT(${this.buildJsonObjectFields(attribute.join.construction.select_columns)})) FROM ${attribute.join.target_table} WHERE ${attribute.join.join_condition} ${attribute.join.construction.order_by ? 'ORDER BY ' + attribute.join.construction.order_by : ''} ${attribute.join.construction.limit ? 'LIMIT ' + attribute.join.construction.limit : ''}) AS ${attribute.name}`; + + default: + throw new Error(`Unsupported join type: ${attribute.join.type}`); + } + } +} ``` -### 3. Data Validation and Integrity System -**Purpose**: Ensure data integrity throughout the migration process - -**Validation Framework**: -```python -class MigrationValidator: - def __init__(self, mysql_connection, dynamodb_client, migration_contract): - self.mysql = mysql_connection - self.dynamodb = dynamodb_client - self.contract = migration_contract +### 3. Glue MCP Integration System +**Purpose**: Handle all AWS Glue operations through MCP server calls + +**Glue Operations**: +```typescript +interface GlueMCPClient { + createJob(jobDefinition: GlueJobDefinition): Promise; + startJobRun(jobName: string, parameters?: Record): Promise; + monitorJobRun(jobName: string, runId: string): Promise; + getJobLogs(jobName: string, runId: string): Promise; +} + +class GlueMCPOperations implements GlueMCPClient { + private mcpClient: MCPClient; + private s3MCP: S3MCPClient; + + constructor(mcpEndpoint: string, s3MCPClient: S3MCPClient) { + this.mcpClient = new MCPClient(mcpEndpoint); + this.s3MCP = s3MCPClient; + } - def validate_migration_completeness(self) -> ValidationReport: - """Validate that all data has been successfully migrated""" - report = ValidationReport() + async createJob(jobDefinition: GlueJobDefinition): Promise { + // First, upload the script to S3 via S3 MCP + const scriptUploadResult = await this.s3MCP.uploadObject({ + bucket: jobDefinition.scriptBucket, + key: jobDefinition.scriptKey, + content: jobDefinition.scriptContent, + contentType: 'text/x-python' + }); - for table_config in self.contract: - if table_config['type'] == 'Table': - table_validation = self.validate_table_migration(table_config) - report.add_table_validation(table_validation) + if (!scriptUploadResult.success) { + throw new Error(`Failed to upload Glue script: ${scriptUploadResult.error}`); + } + + // Create the Glue job via MCP + const response = await this.mcpClient.call('glue_create_job', { + job_name: jobDefinition.jobName, + role_arn: jobDefinition.roleArn, + script_location: `s3://${jobDefinition.scriptBucket}/${jobDefinition.scriptKey}`, + command_name: 'pythonshell', + python_version: '3.9', + max_retries: 0, + timeout: 60, + default_arguments: { + '--additional-python-modules': 'mysql-connector-python,boto3' + } + }); + + if (!response.success) { + throw new Error(`Glue job creation failed: ${response.error}`); + } - return report + return { + jobName: jobDefinition.jobName, + jobArn: response.job_arn, + scriptLocation: `s3://${jobDefinition.scriptBucket}/${jobDefinition.scriptKey}`, + created: true + }; + } - def validate_table_migration(self, table_config: dict) -> TableValidationResult: - """Validate migration for a specific table""" - source_table = table_config['source_table'] - target_table = table_config['table'] - - # Count records in source - mysql_count = self.count_mysql_records(source_table) - - # Count records in DynamoDB - dynamodb_count = self.count_dynamodb_records(target_table) - - # Sample data validation - sample_validation = self.validate_sample_data(table_config) - - return TableValidationResult( - table_name=target_table, - source_count=mysql_count, - target_count=dynamodb_count, - count_match=mysql_count == dynamodb_count, - sample_validation=sample_validation, - data_integrity_score=self.calculate_integrity_score(sample_validation) - ) + async startJobRun(jobName: string, parameters?: Record): Promise { + const response = await this.mcpClient.call('glue_start_job_run', { + job_name: jobName, + arguments: parameters || {} + }); + + if (!response.success) { + throw new Error(`Failed to start Glue job ${jobName}: ${response.error}`); + } + + return { + jobName: jobName, + runId: response.job_run_id, + status: 'STARTING', + startedAt: new Date() + }; + } - def validate_sample_data(self, table_config: dict, sample_size: int = 100) -> SampleValidationResult: - """Validate a sample of migrated data for accuracy""" - source_table = table_config['source_table'] - target_table = table_config['table'] + async monitorJobRun(jobName: string, runId: string): Promise { + const response = await this.mcpClient.call('glue_get_job_run', { + job_name: jobName, + run_id: runId + }); - # Get sample records from MySQL - mysql_sample = self.get_mysql_sample(source_table, sample_size) + if (!response.success) { + throw new Error(`Failed to get job run status: ${response.error}`); + } - validation_results = [] + return { + jobName: jobName, + runId: runId, + status: response.job_run.job_run_state, + startedAt: new Date(response.job_run.started_on), + completedAt: response.job_run.completed_on ? new Date(response.job_run.completed_on) : undefined, + errorMessage: response.job_run.error_message, + executionTime: response.job_run.execution_time + }; + } + + async createJobsFromContract( + contract: MigrationContract, + mysqlConnection: MySQLConnectionInfo, + createdViews: string[] + ): Promise { + const results: GlueJobCreationResult[] = []; - for mysql_record in mysql_sample: - # Get corresponding DynamoDB record - pk_value = mysql_record[table_config['pk']] - dynamodb_record = self.get_dynamodb_record(target_table, pk_value) + for (const tableConfig of contract.tables) { + const viewName = `ddb_${tableConfig.table.toLowerCase()}_view`; + + if (!createdViews.includes(viewName)) { + results.push({ + tableName: tableConfig.table, + jobName: `${tableConfig.table.toLowerCase()}_migration_job`, + success: false, + error: `View ${viewName} not found in created views` + }); + continue; + } - if dynamodb_record: - comparison = self.compare_records(mysql_record, dynamodb_record, table_config) - validation_results.append(comparison) - else: - validation_results.append(RecordComparison( - mysql_id=pk_value, - found_in_dynamodb=False, - differences=['Record not found in DynamoDB'] - )) - - return SampleValidationResult( - total_sampled=len(mysql_sample), - successful_matches=len([r for r in validation_results if r.is_match]), - validation_results=validation_results - ) + try { + const jobDefinition = this.generateJobDefinition(tableConfig, mysqlConnection, viewName); + const creationResult = await this.createJob(jobDefinition); + + results.push({ + tableName: tableConfig.table, + jobName: jobDefinition.jobName, + success: true, + result: creationResult + }); + } catch (error) { + results.push({ + tableName: tableConfig.table, + jobName: `${tableConfig.table.toLowerCase()}_migration_job`, + success: false, + error: error.message + }); + } + } + + return results; + } + + private generateJobDefinition( + tableConfig: MigrationContractEntry, + mysqlConnection: MySQLConnectionInfo, + viewName: string + ): GlueJobDefinition { + const jobName = `${tableConfig.table.toLowerCase()}_migration_job`; + const scriptContent = this.generateETLScript(tableConfig, mysqlConnection, viewName); + + return { + jobName: jobName, + roleArn: process.env.GLUE_ROLE_ARN!, + scriptBucket: process.env.S3_SCRIPT_BUCKET!, + scriptKey: `migration-scripts/${jobName}.py`, + scriptContent: scriptContent, + sourceView: viewName, + targetTable: tableConfig.table + }; + } + + private generateETLScript( + tableConfig: MigrationContractEntry, + mysqlConnection: MySQLConnectionInfo, + viewName: string + ): string { + return ` +import sys +import boto3 +import mysql.connector +from awsglue.utils import getResolvedOptions +import json + +# Get job arguments +args = getResolvedOptions(sys.argv, ['JOB_NAME']) + +def main(): + # Connect to MySQL + mysql_conn = mysql.connector.connect( + host='${mysqlConnection.host}', + port=${mysqlConnection.port}, + user='${mysqlConnection.user}', + password='', + database='${mysqlConnection.database}' + ) + + # Connect to DynamoDB + dynamodb = boto3.resource('dynamodb', region_name='${process.env.AWS_REGION}') + table = dynamodb.Table('${tableConfig.table}') - def compare_records(self, mysql_record: dict, dynamodb_record: dict, table_config: dict) -> RecordComparison: - """Compare individual records between MySQL and DynamoDB""" - differences = [] + try: + cursor = mysql_conn.cursor(dictionary=True) + cursor.execute('SELECT * FROM ${viewName}') - for attr_name, attr_config in table_config['attributes'].items(): - mysql_value = mysql_record.get(attr_config['source_column']) - dynamodb_value = dynamodb_record.get(attr_name) + batch_size = 25 # DynamoDB batch write limit + batch = [] + total_processed = 0 + + for row in cursor: + # Convert row to DynamoDB item format + item = {} + for key, value in row.items(): + if value is not None: + if isinstance(value, (int, float)): + item[key] = {'N': str(value)} + elif isinstance(value, bool): + item[key] = {'BOOL': value} + else: + item[key] = {'S': str(value)} + + batch.append({'PutRequest': {'Item': item}}) - # Handle type conversions and formatting - if not self.values_match(mysql_value, dynamodb_value, attr_config['type']): - differences.append(f"{attr_name}: MySQL={mysql_value}, DynamoDB={dynamodb_value}") - - return RecordComparison( - mysql_id=mysql_record[table_config['pk']], - found_in_dynamodb=True, - is_match=len(differences) == 0, - differences=differences - ) + # Write batch when full + if len(batch) >= batch_size: + response = table.batch_writer().batch_write_item( + RequestItems={'${tableConfig.table}': batch} + ) + total_processed += len(batch) + batch = [] + print(f'Processed {total_processed} records') + + # Write remaining items + if batch: + table.batch_writer().batch_write_item( + RequestItems={'${tableConfig.table}': batch} + ) + total_processed += len(batch) + + print(f'✅ Migration completed: {total_processed} records migrated to ${tableConfig.table}') + + except Exception as e: + print(f'❌ Migration failed for ${tableConfig.table}: {str(e)}') + raise e + finally: + mysql_conn.close() + +if __name__ == '__main__': + main() +`; + } +} ``` -### 4. Monitoring and Reporting System -**Purpose**: Provide comprehensive monitoring and reporting throughout migration +### 4. DynamoDB MCP Integration System +**Purpose**: Handle all DynamoDB operations through MCP server calls -**Monitoring Implementation**: -```python -class MigrationMonitor: - def __init__(self, cloudwatch_client, glue_client): - self.cloudwatch = cloudwatch_client - self.glue = glue_client - self.metrics = [] +**DynamoDB Operations**: +```typescript +interface DynamoDBMCPClient { + createTable(tableDefinition: DynamoDBTableDefinition): Promise; + describeTable(tableName: string): Promise; + validateTableSchema(tableName: string, expectedSchema: TableSchema): Promise; + monitorTableMetrics(tableName: string): Promise; +} + +class DynamoDBMCPOperations implements DynamoDBMCPClient { + private mcpClient: MCPClient; + + constructor(mcpEndpoint: string) { + this.mcpClient = new MCPClient(mcpEndpoint); + } - def monitor_job_progress(self, job_run_ids: List[str]) -> MigrationProgress: - """Monitor progress of all migration jobs""" - progress = MigrationProgress() + async createTable(tableDefinition: DynamoDBTableDefinition): Promise { + const response = await this.mcpClient.call('dynamodb_create_table', { + table_name: tableDefinition.tableName, + key_schema: tableDefinition.keySchema, + attribute_definitions: tableDefinition.attributeDefinitions, + billing_mode: 'PAY_PER_REQUEST', + global_secondary_indexes: tableDefinition.globalSecondaryIndexes, + stream_specification: { + stream_enabled: true, + stream_view_type: 'NEW_AND_OLD_IMAGES' + }, + point_in_time_recovery_specification: { + point_in_time_recovery_enabled: true + }, + sse_specification: { + enabled: true + }, + deletion_protection_enabled: true + }); - for job_run_id in job_run_ids: - job_status = self.get_job_status(job_run_id) - progress.add_job_status(job_status) - - if job_status.state in ['FAILED', 'ERROR']: - error_details = self.get_job_error_details(job_run_id) - progress.add_error(error_details) + if (!response.success) { + if (response.error.includes('ResourceInUseException')) { + return { + tableName: tableDefinition.tableName, + created: false, + alreadyExists: true, + tableArn: await this.getTableArn(tableDefinition.tableName) + }; + } + throw new Error(`Table creation failed: ${response.error}`); + } - return progress + return { + tableName: tableDefinition.tableName, + created: true, + tableArn: response.table_description.table_arn, + tableStatus: response.table_description.table_status + }; + } - def generate_migration_report(self, validation_report: ValidationReport, job_statuses: List[JobStatus]) -> MigrationReport: - """Generate comprehensive migration completion report""" - return MigrationReport( - migration_timestamp=datetime.now(), - total_tables_migrated=len([s for s in job_statuses if s.state == 'SUCCEEDED']), - total_records_migrated=sum(v.target_count for v in validation_report.table_validations), - data_integrity_score=validation_report.overall_integrity_score, - job_execution_summary=job_statuses, - validation_summary=validation_report, - recommendations=self.generate_recommendations(validation_report, job_statuses) - ) + async createTablesFromContract(contract: MigrationContract): Promise { + const results: TableCreationResult[] = []; + + for (const tableConfig of contract.tables) { + try { + const tableDefinition = this.generateTableDefinition(tableConfig); + const creationResult = await this.createTable(tableDefinition); + + results.push(creationResult); + } catch (error) { + results.push({ + tableName: tableConfig.table, + created: false, + error: error.message + }); + } + } + + return results; + } - def publish_metrics(self, metric_name: str, value: float, unit: str = 'Count'): - """Publish custom metrics to CloudWatch""" - self.cloudwatch.put_metric_data( - Namespace='MySQL-DynamoDB-Migration', - MetricData=[ - { - 'MetricName': metric_name, - 'Value': value, - 'Unit': unit, - 'Timestamp': datetime.now() + private generateTableDefinition(tableConfig: MigrationContractEntry): DynamoDBTableDefinition { + const keySchema = [ + { AttributeName: tableConfig.pk, KeyType: 'HASH' } + ]; + + const attributeDefinitions = [ + { AttributeName: tableConfig.pk, AttributeType: 'S' } + ]; + + if (tableConfig.sk) { + keySchema.push({ AttributeName: tableConfig.sk, KeyType: 'RANGE' }); + attributeDefinitions.push({ AttributeName: tableConfig.sk, AttributeType: 'S' }); + } + + const globalSecondaryIndexes = tableConfig.gsis?.map(gsi => { + const gsiKeySchema = [{ AttributeName: gsi.pk, KeyType: 'HASH' }]; + + if (!attributeDefinitions.find(attr => attr.AttributeName === gsi.pk)) { + attributeDefinitions.push({ AttributeName: gsi.pk, AttributeType: 'S' }); + } + + if (gsi.sk) { + gsiKeySchema.push({ AttributeName: gsi.sk, KeyType: 'RANGE' }); + if (!attributeDefinitions.find(attr => attr.AttributeName === gsi.sk)) { + attributeDefinitions.push({ AttributeName: gsi.sk, AttributeType: 'S' }); } - ] - ) + } + + return { + IndexName: gsi.index_name, + KeySchema: gsiKeySchema, + Projection: { ProjectionType: 'ALL' } + }; + }) || []; + + return { + tableName: tableConfig.table, + keySchema: keySchema, + attributeDefinitions: attributeDefinitions, + globalSecondaryIndexes: globalSecondaryIndexes + }; + } +} ``` ## Data Models -### Migration Contract Reference -```python -@dataclass -class MigrationContractEntry: - table: str - type: str # 'Table' or 'GSI' - source_table: str - pk: str - sk: Optional[str] - gsis: Optional[List[GSIDefinition]] - attributes: Dict[str, AttributeDefinition] - satisfies: List[str] - estimated_item_size_bytes: int -``` - -### Validation Results -```python -@dataclass -class ValidationReport: - migration_timestamp: datetime - table_validations: List[TableValidationResult] - overall_integrity_score: float - total_records_validated: int - issues_found: List[str] - recommendations: List[str] - -@dataclass -class TableValidationResult: - table_name: str - source_count: int - target_count: int - count_match: bool - sample_validation: SampleValidationResult - data_integrity_score: float +### MCP Server Status +```typescript +interface MCPServerStatus { + allServersOperational: boolean; + serverStatuses: MCPServerValidation[]; + readyForMigration: boolean; +} + +interface MCPServerValidation { + serverType: string; + isOperational: boolean; + hasRequiredPermissions: boolean; + lastChecked: Date; + details?: any; + error?: string; +} ``` -### Migration Progress Tracking -```python -@dataclass -class MigrationProgress: - total_jobs: int - completed_jobs: int - failed_jobs: int - in_progress_jobs: int - job_statuses: List[JobStatus] - errors: List[ErrorDetails] - estimated_completion_time: Optional[datetime] +### Migration Result +```typescript +interface MigrationResult { + success: boolean; + viewsCreated: string[]; + tablesCreated: string[]; + jobsExecuted: string[]; + validationReport: ValidationReport; + migrationMetrics: MigrationMetrics; +} + +interface MigrationMetrics { + totalRecordsMigrated: number; + migrationDuration: number; + averageThroughput: number; + errorRate: number; + tableMetrics: TableMigrationMetrics[]; +} ``` ## Error Handling -### MySQL View Generation Errors -- Invalid migration contract format -- Missing source tables or columns -- SQL syntax errors in generated views -- Database connectivity issues - -### AWS Glue Job Errors -- Insufficient IAM permissions -- Resource allocation failures -- Data transformation errors +### MCP Server Communication Errors +- Server unavailability or timeout errors +- Authentication and permission failures - Network connectivity issues +- Invalid request format or parameters -### Data Validation Errors -- Record count mismatches -- Data type conversion failures -- Missing or corrupted records -- Performance issues with large datasets - -### Recovery Procedures -- Automatic retry mechanisms for transient failures -- Partial migration recovery and resumption -- Data rollback procedures for critical failures -- Manual intervention procedures for complex issues - -## Security and Compliance - -### Data Protection -- Encryption in transit for all data transfers -- Secure credential management for database connections -- Audit logging for all migration activities -- Data masking for sensitive information during validation - -### Access Control -- Least privilege access for all migration components -- Separate roles for different migration phases -- Network isolation for migration infrastructure -- Comprehensive access logging and monitoring - -### Compliance Considerations -- Data residency requirements for cross-region migrations -- Audit trail maintenance for compliance reporting -- Data retention policies for migration artifacts -- Privacy protection during data transformation +### Migration-Specific Errors +- MySQL view creation failures +- DynamoDB table creation conflicts +- Glue job execution failures +- Data validation and integrity issues + +### Recovery Mechanisms +- Automatic retry with exponential backoff +- Circuit breaker patterns for persistent failures +- Graceful degradation and partial migration support +- Comprehensive error logging and reporting ## Performance Optimization -### Parallel Processing -- Concurrent execution of multiple Glue jobs -- Parallel data validation processes -- Batch processing for large datasets -- Resource optimization for cost efficiency - -### Memory and Storage Management -- Efficient memory usage in ETL processes -- Temporary storage management for large datasets -- Cleanup procedures for intermediate data -- Resource monitoring and optimization - -### Network Optimization -- VPC endpoints for AWS service communication -- Connection pooling for database connections -- Bandwidth optimization for large data transfers -- Regional placement for optimal performance \ No newline at end of file +### Batch Processing +- Optimize batch sizes for DynamoDB writes +- Parallel processing for independent tables +- Memory-efficient data streaming +- Connection pooling and reuse + +### Monitoring and Alerting +- Real-time progress tracking +- Performance metrics collection +- Capacity utilization monitoring +- Error rate and latency tracking + +This design replaces subprocess calls with proper MCP server integration, providing better error handling, monitoring, and maintainability while leveraging the full capabilities of specialized MCP servers for each service. \ No newline at end of file diff --git a/workshops/modernizr/prompts/07-data-migration-execution/requirements.md b/workshops/modernizr/prompts/07-data-migration-execution/requirements.md index 4e8986d8..3014367c 100644 --- a/workshops/modernizr/prompts/07-data-migration-execution/requirements.md +++ b/workshops/modernizr/prompts/07-data-migration-execution/requirements.md @@ -2,42 +2,66 @@ ## Introduction -The Data Migration Execution stage performs the actual data migration from MySQL to DynamoDB using AWS Glue ETL jobs and MySQL views. This stage emphasizes data integrity, comprehensive validation, and safe migration practices with proper monitoring and rollback capabilities. +The Data Migration Execution stage orchestrates the complete migration of data from MySQL to DynamoDB using specialized MCP servers for database operations, AWS Glue ETL jobs, and comprehensive monitoring. This stage emphasizes proper MCP server integration, data integrity, and comprehensive validation throughout the migration process. ## Requirements ### Requirement 1 -**User Story:** As a data engineer, I want to create MySQL views that transform data according to the migration contract, so that I can extract data in the format required for DynamoDB loading. +**User Story:** As a data engineer, I want to generate and execute MySQL views using MCP server integration, so that I can transform relational data according to the migration contract specifications with proper database connectivity and error handling. #### Acceptance Criteria -1. WHEN creating views THEN the system SHALL run generate_mysql_views.py script for each DynamoDB table defined in the migration contract -2. WHEN generating SQL THEN the system SHALL create a single .sql file containing all resulting SQL statements -3. WHEN executing SQL THEN the system SHALL run the generated SQL against the user's MySQL database using MCP server connectivity -4. WHEN handling transformations THEN the system SHALL apply all data transformations specified in the migration contract -5. WHEN managing denormalized data THEN the system SHALL properly handle joins and data denormalization as specified in the contract +1. WHEN generating views THEN the system SHALL use the migration contract as the definitive source for view definitions and transformations +2. WHEN connecting to MySQL THEN the system SHALL use MySQL MCP server for all database operations including connection discovery and query execution +3. WHEN creating views THEN the system SHALL handle all join patterns defined in the migration contract (self-join, foreign-key, multi-column, conditional, chain, lookup-table, json-construction) +4. WHEN executing SQL THEN the system SHALL use MCP server calls for secure and monitored database operations +5. WHEN validating views THEN the system SHALL verify that all views are created successfully and return expected data transformations ### Requirement 2 -**User Story:** As a migration engineer, I want to create and execute AWS Glue ETL jobs that safely migrate data with validation, so that I can ensure data integrity throughout the migration process. +**User Story:** As a cloud engineer, I want to create and execute AWS Glue ETL jobs using Glue MCP server integration, so that I can migrate data from MySQL views to DynamoDB tables with proper monitoring and error handling. #### Acceptance Criteria -1. WHEN creating Glue jobs THEN the system SHALL run create_glue_and_run.py with parameters from the migration contract -2. WHEN configuring jobs THEN the system SHALL use view names from the previous step and proper AWS credentials -3. WHEN executing migration THEN the system SHALL perform incremental data migration with conditional writes to prevent data corruption -4. WHEN handling errors THEN the system SHALL implement comprehensive error handling and retry mechanisms -5. WHEN validating data THEN the system SHALL provide data validation and integrity checks throughout the process +1. WHEN creating Glue jobs THEN the system SHALL use Glue MCP server for all AWS Glue operations including job creation, script management, and execution +2. WHEN configuring jobs THEN the system SHALL use migration contract specifications to determine source views, target tables, and transformation logic +3. WHEN managing scripts THEN the system SHALL use S3 MCP server operations for script upload and management +4. WHEN executing jobs THEN the system SHALL provide real-time monitoring and progress tracking through MCP server calls +5. WHEN handling errors THEN the system SHALL implement comprehensive retry mechanisms and error reporting through MCP server integration ### Requirement 3 -**User Story:** As a system administrator, I want comprehensive monitoring and validation of the migration process, so that I can ensure successful completion and troubleshoot any issues. +**User Story:** As a database administrator, I want to manage DynamoDB operations using DynamoDB MCP server integration, so that I can ensure proper table creation, data validation, and performance monitoring throughout the migration process. #### Acceptance Criteria -1. WHEN monitoring migration THEN the system SHALL provide real-time progress tracking and status updates -2. WHEN handling errors THEN the system SHALL log detailed error information and provide troubleshooting guidance -3. WHEN ensuring consistency THEN the system SHALL verify AWS region consistency between infrastructure and Glue jobs -4. WHEN using connectivity THEN the system SHALL use MCP servers for database connectivity where available -5. WHEN completing migration THEN the system SHALL generate comprehensive migration completion and validation reports \ No newline at end of file +1. WHEN creating tables THEN the system SHALL use DynamoDB MCP server for all table creation and configuration operations +2. WHEN validating schema THEN the system SHALL verify that created tables match migration contract specifications including keys, GSIs, and attributes +3. WHEN monitoring migration THEN the system SHALL use DynamoDB MCP server to track write operations, capacity utilization, and error rates +4. WHEN validating data THEN the system SHALL perform comprehensive data integrity checks using MCP server operations +5. WHEN handling capacity THEN the system SHALL monitor and adjust table capacity settings through DynamoDB MCP server calls + +### Requirement 4 + +**User Story:** As a DevOps engineer, I want comprehensive monitoring and validation of the migration process using MCP server integration, so that I can ensure data integrity, track progress, and handle errors effectively. + +#### Acceptance Criteria + +1. WHEN monitoring progress THEN the system SHALL provide real-time status updates for all migration phases using MCP server telemetry +2. WHEN validating data THEN the system SHALL perform comprehensive data integrity checks comparing source and target data through MCP server operations +3. WHEN handling errors THEN the system SHALL implement intelligent retry mechanisms and error recovery using MCP server capabilities +4. WHEN reporting results THEN the system SHALL generate detailed migration reports including success rates, error logs, and performance metrics +5. WHEN completing migration THEN the system SHALL provide comprehensive validation reports and operational guidance for ongoing management + +### Requirement 5 + +**User Story:** As a system architect, I want proper MCP server orchestration and coordination, so that I can ensure all migration components work together seamlessly with proper error handling and monitoring. + +#### Acceptance Criteria + +1. WHEN orchestrating migration THEN the system SHALL coordinate between MySQL MCP, Glue MCP, DynamoDB MCP, and S3 MCP servers +2. WHEN managing dependencies THEN the system SHALL ensure proper sequencing of operations across different MCP servers +3. WHEN handling failures THEN the system SHALL implement cross-MCP server error handling and recovery mechanisms +4. WHEN monitoring health THEN the system SHALL validate that all required MCP servers are operational before starting migration +5. WHEN completing operations THEN the system SHALL ensure proper cleanup and resource management across all MCP servers \ No newline at end of file diff --git a/workshops/modernizr/prompts/07-data-migration-execution/tasks.md b/workshops/modernizr/prompts/07-data-migration-execution/tasks.md index 64af6246..b1bf7745 100644 --- a/workshops/modernizr/prompts/07-data-migration-execution/tasks.md +++ b/workshops/modernizr/prompts/07-data-migration-execution/tasks.md @@ -1,10 +1,201 @@ # Data Migration Execution - Tasks -- [ ] 1. Execute data migration script. +- [ ] 1. Generate MySQL views for data transformation - [ ] 1.1 Validate migration contract and infrastructure inputs - - **INPUT**: Use artifacts/stage-02/migrationContract.json from the data modeling stage and deployed infrastructure from infrastructure deployment stage - - Ask the user to provide AWS credentials to add as environment variables + - **INPUT**: Use migrationContract.json from the data modeling stage and deployed infrastructure from infrastructure deployment stage + - Verify that the migration contract file exists and is properly formatted + - Confirm that DynamoDB tables have been successfully deployed and are accessible - Check AWS region consistency between infrastructure deployment and planned Glue jobs - - Execute using python the file `tools/contract_driven_migration_glue_mcp.py` - - Validate that all required fields are present in the migration contract - - _Requirements: 1.1, 3.3_ \ No newline at end of file + - _Requirements: 1.1, 3.3_ + + - [ ] 1.2 Generate MySQL views using the provided script + - **CRITICAL**: Run generate_mysql_views.py script for each DynamoDB table defined in the migration contract + - Use the migration contract as input to the script with proper parameters + - Ensure the script processes all table definitions and handles denormalized attributes correctly + - **CRITICAL**: Create a single .sql file containing all resulting SQL statements + - Validate that the generated SQL is syntactically correct and follows MySQL standards + - _Requirements: 1.1, 1.2_ + + - [ ] 1.3 Execute generated SQL against MySQL database + - **CRITICAL**: Run the generated SQL against the user's MySQL database using MCP server connectivity + - Use MCP server tools for secure database connectivity and SQL execution + - Verify that all views are created successfully without errors + - _Requirements: 1.3, 1.4_ + + - [ ] 1.4 Validate view creation and data transformation + - Test each generated view to ensure it returns data in the expected format + - Verify that denormalized data is properly handled with correct joins + - **CRITICAL**: Properly handle joins and data denormalization as specified in the contract + - Check that data types are correctly mapped and transformed + - Validate that the views exclude soft-deleted records and apply appropriate filters + - _Requirements: 1.5_ + +- [ ] 2. Execute contract-driven migration using Data Processing MCP server + - [ ] 2.1 Prepare migration environment and validate MCP server availability + - Verify that AWS credentials are properly configured with sufficient permissions for Glue operations + - Confirm that the target AWS region matches the infrastructure deployment region + - **CRITICAL**: Ensure AWS region consistency between infrastructure and Glue jobs + - Validate that config.json file is properly configured with migration contract path, AWS settings, and MySQL discovery parameters + - Confirm that MCP servers have proper permissions and can execute required operations + - _Requirements: 2.2, 3.3_ + + - [ ] 2.3 Monitor migration execution and handle errors + - **CRITICAL**: Monitor the contract-driven migration script execution for comprehensive error handling + - Track progress through each phase: MySQL view creation, DynamoDB table creation, Glue job execution + - Verify that the script properly handles MySQL connection discovery and database connectivity + - Ensure that DynamoDB tables are created with correct schema including GSIs and key structures + - Monitor Glue job creation, script upload to S3, and job execution via MCP server calls + - **CRITICAL**: Validate that error handling and retry mechanisms are working properly throughout the process + - _Requirements: 2.4, 3.1, 3.2_ + +- [ ] 3. Execute data migration with monitoring and validation + - [ ] 3.1 Execute migration jobs with comprehensive monitoring + - Start all configured Glue ETL jobs and monitor their execution progress + - **CRITICAL**: Provide real-time progress tracking and status updates + - Monitor resource utilization and performance metrics during execution + - **CRITICAL**: Log detailed error information and provide troubleshooting guidance + - Set up alerts for job failures or performance issues + - _Requirements: 3.1, 3.2_ + + - [ ] 3.2 Implement incremental migration with data validation + - **CRITICAL**: Perform incremental data migration with conditional writes to prevent data corruption + - Use DynamoDB conditional writes to prevent overwriting existing data + - Implement data validation checks during the migration process + - Monitor data consistency and integrity throughout the migration + - **CRITICAL**: Provide data validation and integrity checks throughout the process + - _Requirements: 2.3, 2.5_ + + - [ ] 3.3 Monitor migration progress and handle errors + - Track migration progress for each table and provide regular status updates + - Monitor CloudWatch metrics for Glue job performance and DynamoDB table metrics + - Handle migration errors gracefully with appropriate retry and recovery mechanisms + - **CRITICAL**: Use MCP servers for database connectivity where available + - Provide detailed error reporting and troubleshooting guidance for any issues + - _Requirements: 3.4_ + +- [ ] 4. Validate migration completeness and data integrity + - [ ] 4.1 Perform comprehensive data validation + - Compare record counts between MySQL source tables and DynamoDB target tables + - Validate a statistical sample of migrated records for data accuracy and completeness + - Check that all required attributes are properly mapped and transformed + - Verify that denormalized data is correctly joined and represented + - Test that all access patterns from the migration contract work with migrated data + - _Requirements: 2.5_ + + - [ ] 4.2 Execute data integrity checks + - Validate that primary key constraints are maintained in DynamoDB + - Check that unique constraints are properly enforced through lookup tables + - Verify that referential integrity is maintained for denormalized data + - Test that all GSIs contain the expected data and support required access patterns + - Perform end-to-end testing of critical application workflows with migrated data + - _Requirements: 2.5_ + + - [ ] 4.3 Generate validation reports and metrics + - Create detailed validation reports showing migration success rates and data integrity scores + - Document any data discrepancies found during validation with recommended remediation + - Generate performance metrics for the migration process including throughput and latency + - Create summary reports for stakeholders showing migration completion status + - Provide recommendations for any post-migration optimizations or corrections needed + - _Requirements: 3.5_ + +- [ ] 5. Generate comprehensive migration reports and documentation + - [ ] 5.1 Create migration completion reports + - **CRITICAL**: Generate comprehensive migration completion and validation reports + - Document total records migrated, migration duration, and performance metrics + - Include data integrity scores and validation results for each table + - Provide detailed error logs and resolution status for any issues encountered + - Create executive summary reports for project stakeholders + - _Requirements: 3.5_ + + - [ ] 5.2 Document post-migration procedures and recommendations + - Create operational runbooks for ongoing DynamoDB table management + - Document monitoring and alerting procedures for production operations + - Provide recommendations for performance optimization and cost management + - Create troubleshooting guides for common post-migration issues + - Document rollback procedures and disaster recovery plans + - _Requirements: 3.5_ + + - [ ] 5.3 Provide cleanup and maintenance guidance + - Document cleanup procedures for temporary migration resources (Glue jobs, S3 buckets, etc.) + - Provide guidance for removing MySQL views and temporary database objects + - Create maintenance schedules for ongoing DynamoDB operations (backups, monitoring, etc.) + - Document capacity planning and scaling procedures for production workloads + - Provide cost optimization recommendations based on actual migration results + - _Requirements: 3.5_ + +## Output Validation Checklist + +Before marking this stage complete, verify: +- [ ] MySQL views are generated and executed via contract-driven migration script using MCP server +- [ ] MySQL connection is discovered automatically via Data Processing MCP server +- [ ] DynamoDB tables are created via MCP server based on migration contract specifications +- [ ] AWS Glue ETL jobs are created and executed using contract_driven_migration_glue_mcp.py +- [ ] AWS region consistency is maintained between infrastructure and Glue jobs +- [ ] Complete migration process is executed via Data Processing MCP server with comprehensive monitoring +- [ ] All MCP server calls (MySQL, DynamoDB, Glue, S3) are executed successfully +- [ ] Data validation confirms successful migration with integrity checks +- [ ] Migration completion reports are generated with detailed metrics and recommendations +- [ ] Post-migration documentation and operational guidance are provided + +## Critical Execution Guidelines + +**Script Execution Requirements**: +- **ALWAYS** use data-migration-tools/contract_driven_migration_glue_mcp.py for complete migration execution +- **VERIFY** that config.json is properly configured with migration contract path and AWS settings +- **ENSURE** that Data Processing MCP server (Glue MCP) is available and accessible +- **VALIDATE** that the script handles all phases: MySQL views, DynamoDB tables, and Glue jobs via MCP + +**Data Integrity Requirements**: +- **ALWAYS** use conditional writes to prevent data corruption during migration +- **ALWAYS** perform comprehensive data validation throughout the process +- **VERIFY** that record counts match between source and target systems +- **VALIDATE** that data transformations are applied correctly according to the contract + +**Monitoring and Error Handling**: +- **ALWAYS** provide real-time progress tracking and status updates +- **ALWAYS** log detailed error information for troubleshooting +- **IMPLEMENT** comprehensive retry mechanisms for transient failures +- **ENSURE** that all errors are properly handled and documented + +**Regional Consistency**: +- **VERIFY** that AWS region is consistent between infrastructure deployment and Glue jobs +- **ENSURE** that all AWS resources are created in the same region +- **VALIDATE** that cross-region dependencies are properly handled if they exist + +## Troubleshooting Guide + +**Contract-Driven Migration Script Issues**: +- Verify that config.json exists and contains proper migration contract path and AWS configuration +- Check that Data Processing MCP server (Glue MCP) is running and accessible +- Ensure that the migration contract file is properly formatted and accessible +- Validate that MySQL connection discovery parameters are correctly configured in config.json + +**MCP Server Connectivity Issues**: +- Verify that MySQL MCP server can discover and connect to the database automatically +- Check that AWS credentials have sufficient permissions for DynamoDB, Glue, and S3 operations +- Ensure that all MCP server calls are executing successfully (MySQL, DynamoDB, Glue, S3) +- Validate that the script properly handles MCP server responses and error conditions + +**AWS Glue Job Issues via MCP**: +- Monitor Glue job creation and execution through MCP server calls +- Check that Glue scripts are properly uploaded to S3 via MCP server +- Verify that Glue jobs are created with correct IAM roles and configurations +- Ensure that Glue job execution completes successfully with proper error handling + +**Data Migration Issues**: +- Monitor CloudWatch logs for detailed error information from Glue jobs +- Check DynamoDB table metrics for throttling or capacity issues +- Verify that conditional writes are working properly to prevent data corruption +- Ensure that data transformations are being applied correctly + +**Validation Issues**: +- Compare record counts between source and target systems to identify discrepancies +- Check data type mappings and transformations for accuracy +- Verify that denormalized data is properly joined and represented +- Test access patterns with migrated data to ensure functionality + +**Performance Issues**: +- Monitor Glue job resource utilization and adjust worker configuration if needed +- Check DynamoDB table capacity and scaling settings +- Optimize data transformation logic for better performance +- Consider parallel processing for large datasets to improve throughput \ No newline at end of file