# Common Interview Questions and Answers

## 1. Explain the difference between Amazon SQS and Amazon SNS. In what scenarios would you use SNS → SQS → Lambda instead of using Lambda directly?

**Difference between SQS and SNS**:

**Amazon SQS (Simple Queue Service)**:
- Message queue service
- Pull-based: Consumers poll the queue for messages
- One-to-one communication (one message to one consumer, though multiple consumers can compete for messages)
- Messages persist until explicitly deleted
- Supports delayed processing and message ordering (FIFO)
- Use case: Work queues, task distribution, buffering

**Amazon SNS (Simple Notification Service)**:
- Pub/sub messaging service
- Push-based: SNS pushes messages to subscribers
- One-to-many communication (fan-out pattern)
- No persistence (messages are immediately delivered)
- Use case: Broadcasting notifications, multi-subscriber scenarios

**When to use SNS → SQS → Lambda instead of Lambda directly**:

1. **Buffering and rate limiting**: SQS acts as a buffer between SNS and Lambda, preventing Lambda from being overwhelmed by sudden traffic spikes. Lambda polls SQS at a controlled rate.

2. **Retry logic and fault tolerance**: If Lambda fails to process a message, it remains in SQS for retry. With direct Lambda invocation from SNS, failed messages are lost (unless DLQ is configured).

3. **Message persistence**: SQS stores messages durably. If Lambda is temporarily unavailable, messages wait in the queue. Direct Lambda invocation doesn't persist messages.

4. **Decoupling and flexibility**: SQS decouples the notification system from processing logic. You can change Lambda functions, add multiple consumers, or temporarily pause processing without affecting message ingestion.

5. **Batch processing**: Lambda can process multiple SQS messages in a single invocation, improving efficiency. Direct SNS invocation processes one message per invocation.

6. **Fan-out with different processing speeds**: If you have multiple Lambda functions with different processing times, SQS queues allow each Lambda to consume at its own pace.

**Example scenario**:
```text
File uploaded to S3
    → SNS (broadcasts to multiple subscribers)
        → SQS Queue 1 → Lambda 1 (data validation)
        → SQS Queue 2 → Lambda 2 (thumbnail generation)
        → SQS Queue 3 → Lambda 3 (metadata extraction)

Each Lambda processes independently, with its own retry logic and pace.
```



## 2. Explain how an S3 event can trigger downstream processing

**S3 Event-Driven Processing Flow**:

1. **Event configuration**: Configure S3 bucket to send notifications when specific events occur (ObjectCreated, ObjectRemoved, etc.).

2. **Event types**:
   - ObjectCreated (PUT, POST, COPY, CompleteMultipartUpload)
   - ObjectRemoved (DELETE)
   - ObjectRestore (Glacier restore)
   - Replication events

3. **Event destinations**:
   - Lambda function (direct invocation)
   - SQS queue (message sent to queue)
   - SNS topic (notification published)
   - EventBridge (advanced routing)

4. **Event payload**: S3 sends a JSON payload containing:
   - Bucket name
   - Object key (file path)
   - Event type
   - Request parameters
   - User identity
   - Source IP address

5. **Processing flow example**:

```text
User uploads "data.csv" to S3 bucket
    ↓
S3 generates ObjectCreated:Put event
    ↓
S3 sends event to Lambda function
    ↓
Lambda receives event payload:
{
  "Records": [{
    "s3": {
      "bucket": {"name": "my-bucket"},
      "object": {"key": "data.csv", "size": 1024}
    }
  }]
}
    ↓
Lambda extracts bucket and key
    ↓
Lambda uses boto3 to read file from S3
    ↓
Lambda processes data (validate, transform, etc.)
    ↓
Lambda writes results to another S3 bucket or database
    ↓
Lambda logs execution details to CloudWatch
```

**Advantages**:
- Near real-time: Processing starts within seconds of file upload
- Fully automated: No manual intervention required
- Scalable: Lambda scales automatically with number of files
- Cost-effective: Pay only for processing time
- Event-driven: No polling or scheduled checks needed

**Common use cases**:
- ETL pipelines triggered by data file uploads
- Image/video processing upon media upload
- Log file analysis
- Data validation and quality checks
- Backup and archival workflows



## 3. Why is Lambda not suitable for heavy data processing jobs like Spark or Hadoop?

**Lambda limitations for heavy data processing**:

1. **Maximum execution time: 15 minutes**
   - Spark/Hadoop jobs often run for hours or even days
   - Lambda terminates after 900 seconds (15 minutes)
   - Big data processing requires long-running computations

2. **Limited memory: Maximum 10 GB**
   - Spark processes data in-memory for speed
   - Large datasets require tens or hundreds of GB of RAM
   - Lambda's 10 GB limit is insufficient for big data workloads

3. **Limited CPU: Scales with memory, max ~6 vCPUs**
   - Hadoop/Spark leverage clusters with hundreds of CPUs
   - Parallel processing across many nodes is essential
   - Lambda's CPU is limited and doesn't support distributed computing

4. **Limited local storage: 512 MB to 10 GB /tmp**
   - Big data processing requires substantial temporary storage
   - Spark spills data to disk when memory is full
   - Lambda's ephemeral storage is too small for large intermediate results

5. **No distributed computing framework**
   - Spark/Hadoop use distributed file systems (HDFS)
   - Built-in resource management (YARN)
   - Data locality and partition management
   - Lambda functions are isolated and don't share state or coordinate

6. **Stateless execution**
   - Each Lambda invocation is independent
   - No shared state between invocations
   - Spark maintains execution context across transformations

7. **Cost inefficiency for long jobs**
   - Lambda charges per GB-second
   - Long-running jobs become expensive
   - Dedicated compute (EC2, EMR) is more cost-effective for sustained workloads

**What to use instead**:

- **AWS EMR (Elastic MapReduce)**: Managed Hadoop/Spark clusters
- **AWS Glue**: Serverless ETL with Spark engine
- **Amazon Athena**: SQL queries on S3 data
- **AWS Batch**: Run batch computing workloads
- **EC2 with Spark**: Self-managed Spark clusters

**Lambda's sweet spot**:
- Short-lived tasks (< 15 minutes)
- Lightweight data processing (< 10 GB memory)
- Event-driven workflows
- API backends
- File format conversions
- Small-scale data transformations



## 4. What does "serverless" mean in AWS Lambda?

**Serverless definition**:

Serverless computing is a cloud execution model where the cloud provider (AWS) dynamically manages the allocation and provisioning of servers. Developers focus solely on code without managing infrastructure.

**Key aspects of serverless in AWS Lambda**:

1. **No server management**:
   - No EC2 instances to launch or configure
   - No operating system updates or patches
   - No capacity planning or scaling decisions
   - AWS handles all infrastructure automatically

2. **Automatic scaling**:
   - Lambda automatically scales from zero to thousands of concurrent executions
   - No need to configure auto-scaling rules
   - Scales up during traffic spikes, scales down to zero when idle

3. **Pay-per-use pricing**:
   - Charged only for actual execution time (billed per millisecond)
   - No charges when code isn't running
   - No idle time costs (unlike EC2 instances running 24/7)

4. **Event-driven execution**:
   - Code runs in response to events (S3 uploads, API calls, schedule)
   - No long-running processes or daemon services
   - Ephemeral: Each execution is independent and short-lived

5. **Built-in high availability**:
   - AWS runs Lambda across multiple availability zones
   - Automatic failover and redundancy
   - No need to architect for fault tolerance

6. **No infrastructure to maintain**:
   - No SSH access to servers
   - No system administration tasks
   - Focus purely on application logic

**What serverless does NOT mean**:

- Servers still exist (AWS manages them behind the scenes)
- Not always cheaper (depends on usage patterns)
- Not suitable for all workloads (long-running, stateful apps)
- Still has limitations (execution time, memory, concurrent executions)

**Benefits of serverless**:
- Faster time to market (no infrastructure setup)
- Lower operational overhead
- Automatic scaling and high availability
- Cost efficiency for sporadic workloads

**Trade-offs**:
- Cold start latency (first invocation delay)
- Limited execution time (15 minutes max)
- Vendor lock-in
- Debugging and monitoring challenges



# Python Lambda Function for S3 Event Processing

## Requirements

Write a Python AWS Lambda function that:
- Is triggered by an S3 ObjectCreated event
- Extracts bucket and object key from the event
- Reads the file content from S3 using boto3
- Prints the file size and first 100 characters to CloudWatch
- Returns HTTP status code 200

## Complete Implementation

```python
import json
import boto3

# Initialize S3 client
s3_client = boto3.client('s3')

def lambda_handler(event, context):
    """
    Lambda function triggered by S3 ObjectCreated event
    
    Args:
        event: S3 event payload containing bucket and object information
        context: Lambda runtime information
    
    Returns:
        Response with status code 200
    """
    
    try:
        # Extract bucket name and object key from event
        # S3 event structure: event['Records'][0]['s3']['bucket']['name']
        record = event['Records'][0]
        bucket_name = record['s3']['bucket']['name']
        object_key = record['s3']['object']['key']
        
        print(f"Processing file: s3://{bucket_name}/{object_key}")
        
        # Read file content from S3
        response = s3_client.get_object(Bucket=bucket_name, Key=object_key)
        file_content = response['Body'].read()
        
        # Get file size
        file_size = len(file_content)
        print(f"File size: {file_size} bytes")
        
        # Decode content and get first 100 characters
        # Handle both text and binary files
        try:
            content_text = file_content.decode('utf-8')
            first_100_chars = content_text[:100]
            print(f"First 100 characters: {first_100_chars}")
        except UnicodeDecodeError:
            # Binary file
            print("Binary file detected - cannot display text preview")
            first_100_chars = str(file_content[:100])
            print(f"First 100 bytes (hex): {file_content[:100].hex()}")
        
        # Return success response
        return {
            'statusCode': 200,
            'body': json.dumps({
                'message': 'File processed successfully',
                'bucket': bucket_name,
                'key': object_key,
                'size': file_size,
                'preview': first_100_chars
            })
        }
        
    except Exception as e:
        print(f"Error processing file: {str(e)}")
        return {
            'statusCode': 500,
            'body': json.dumps({
                'message': 'Error processing file',
                'error': str(e)
            })
        }
```

## Lambda IAM Role Permissions

The Lambda execution role must have these permissions:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    }
  ]
}
```

## S3 Event Configuration

Configure S3 bucket to trigger Lambda:

```json
{
  "LambdaFunctionConfigurations": [
    {
      "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            {
              "Name": "prefix",
              "Value": "uploads/"
            },
            {
              "Name": "suffix",
              "Value": ".csv"
            }
          ]
        }
      }
    }
  ]
}
```

## Sample S3 Event Payload

```json
{
  "Records": [
    {
      "eventVersion": "2.1",
      "eventSource": "aws:s3",
      "awsRegion": "us-east-1",
      "eventTime": "2024-01-01T12:00:00.000Z",
      "eventName": "ObjectCreated:Put",
      "s3": {
        "bucket": {
          "name": "my-bucket",
          "arn": "arn:aws:s3:::my-bucket"
        },
        "object": {
          "key": "uploads/data.csv",
          "size": 1024,
          "eTag": "d41d8cd98f00b204e9800998ecf8427e"
        }
      }
    }
  ]
}
```

## Expected CloudWatch Logs Output

```text
START RequestId: abc123 Version: $LATEST
Processing file: s3://my-bucket/uploads/data.csv
File size: 1024 bytes
First 100 characters: id,name,email,age
1,John Doe,john@example.com,30
2,Jane Smith,jane@example.com,25
3,Bob J
END RequestId: abc123
REPORT RequestId: abc123  Duration: 234.56 ms  Billed Duration: 235 ms  Memory Size: 128 MB  Max Memory Used: 45 MB
```

## Testing the Function

Test event for Lambda console:

```json
{
  "Records": [
    {
      "s3": {
        "bucket": {
          "name": "my-test-bucket"
        },
        "object": {
          "key": "test-file.txt"
        }
      }
    }
  ]
}
```

## Enhanced Version with Error Handling

```python
import json
import boto3
from botocore.exceptions import ClientError

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    """
    Enhanced Lambda function with comprehensive error handling
    """
    
    try:
        # Validate event structure
        if 'Records' not in event or len(event['Records']) == 0:
            raise ValueError("Invalid S3 event: No records found")
        
        record = event['Records'][0]
        
        # Extract S3 information
        bucket_name = record['s3']['bucket']['name']
        object_key = record['s3']['object']['key']
        object_size = record['s3']['object'].get('size', 0)
        
        print(f"Event: {record['eventName']}")
        print(f"Bucket: {bucket_name}")
        print(f"Object: {object_key}")
        print(f"Size: {object_size} bytes")
        
        # Skip processing for very large files
        MAX_FILE_SIZE = 100 * 1024 * 1024  # 100 MB
        if object_size > MAX_FILE_SIZE:
            print(f"File too large ({object_size} bytes). Skipping preview.")
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'message': 'File too large for preview',
                    'size': object_size
                })
            }
        
        # Read file from S3
        try:
            response = s3_client.get_object(Bucket=bucket_name, Key=object_key)
            file_content = response['Body'].read()
            actual_size = len(file_content)
            
            print(f"Successfully read file. Size: {actual_size} bytes")
            
        except ClientError as e:
            error_code = e.response['Error']['Code']
            if error_code == 'NoSuchKey':
                print(f"Object not found: {object_key}")
            elif error_code == 'AccessDenied':
                print(f"Access denied to object: {object_key}")
            else:
                print(f"S3 error: {error_code}")
            raise
        
        # Process content
        try:
            content_text = file_content.decode('utf-8')
            first_100 = content_text[:100]
            print(f"First 100 characters:\n{first_100}")
            
        except UnicodeDecodeError:
            print("Binary file detected")
            first_100 = file_content[:100].hex()
            print(f"First 100 bytes (hex): {first_100}")
        
        # Success response
        return {
            'statusCode': 200,
            'body': json.dumps({
                'message': 'Successfully processed file',
                'bucket': bucket_name,
                'key': object_key,
                'size': actual_size,
                'preview_length': len(first_100)
            })
        }
        
    except KeyError as e:
        print(f"Missing required field in event: {str(e)}")
        return {
            'statusCode': 400,
            'body': json.dumps({'error': f'Invalid event structure: {str(e)}'})
        }
        
    except Exception as e:
        print(f"Unexpected error: {str(e)}")
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }
```

## Key Points to Remember

1. **Event structure**: Always access `event['Records'][0]` for S3 events
2. **IAM permissions**: Lambda needs `s3:GetObject` permission
3. **Error handling**: Handle both S3 errors and content decoding errors
4. **CloudWatch logging**: Use `print()` statements for CloudWatch Logs
5. **Response format**: Return structured response with statusCode and body
6. **Memory efficiency**: For large files, consider streaming or processing in chunks
7. **Timeouts**: Set appropriate timeout (default 3s might be too short)
8. **Binary files**: Handle binary content gracefully (images, PDFs, etc.)