This project is a fully serverless, event-driven application that automatically analyzes images uploaded to an S3 bucket and stores a list of detected labels in a DynamoDB table. It demonstrates a modern, scalable, and cost-effective architecture for processing data and integrating AI services.
The architecture is entirely event-driven. The process begins when a user uploads a file, and each step proceeds automatically without any servers to manage.
graph TD;
A[User] -->|1. Uploads image file| B(Amazon S3 Bucket);
B -->|2. Triggers Event| C{AWS Lambda Function};
C -->|3. Sends image for analysis| D(Amazon Rekognition);
D -->|4. Returns detected labels| C;
C -->|5. Writes labels to database| E[(Amazon DynamoDB Table)];
- Amazon S3: Chosen as a scalable and durable object store for the input images. Its key feature, S3 Event Notifications, serves as the trigger that initiates the entire workflow automatically, which is the cornerstone of an event-driven design.
- AWS Lambda: The core of the application, used for serverless compute. It was chosen to avoid managing servers, reduce cost (pay only for execution time), and for its seamless integration with other AWS services. It contains the business logic that connects S3, Rekognition, and DynamoDB.
- Amazon Rekognition: A managed, pre-trained AI service for image analysis. It was chosen to provide powerful machine learning capabilities without the complexity of training and deploying a custom model. This allows for rapid development of intelligent applications.
- Amazon DynamoDB: A fully managed NoSQL database used to store the results. It was chosen for its high performance, scalability, and low-latency access, making it ideal for a serverless application that needs to read and write data quickly.
- AWS IAM: Used to create a secure execution role for the Lambda function. This enforces the Principle of Least Privilege, ensuring the function has only the specific permissions it needs to read from S3, call Rekognition, and write to DynamoDB, and nothing more.
- A user uploads an image file (e.g., dog.jpg) to the designated S3 bucket.
- The S3 bucket automatically sends an "object created" event to the Lambda function, including the bucket name and the name of the file that was uploaded.
- The Lambda function's code is triggered. It reads the event data.
- The function makes an API call to Amazon Rekognition, telling it to analyze the specified image directly from the S3 bucket.
- Rekognition analyzes the image and returns a list of labels with confidence scores (e.g., "Dog", "Golden Retriever", "Pet").
- The Lambda function processes this list and writes a new item to the DynamoDB table, containing the image's filename and the list of detected labels.
This project involved significant real-world troubleshooting that highlighted key architectural principles:
- Service Regional Availability is Critical: The most significant challenge was an EndpointConnectionError. After extensive debugging, the root cause was discovered: the initial region (eu-north-1) did not support the Amazon Rekognition service. This taught a critical lesson in cloud architecture: always verify that all required services are available in your chosen region before beginning a project. The solution was to decommission the resources and rebuild them in a fully-featured region (eu-west-1).
- IAM Permissions are Explicit: An AccessDeniedException occurred when the function tried to write to DynamoDB. This reinforced that IAM permissions are "deny by default." The issue was resolved by attaching the necessary AmazonDynamoDBFullAccess policy to the Lambda function's execution role.
- Lambda Configuration is Key: An initial timeout error occurred because the default 3-second Lambda timeout was too short for the image analysis to complete. The solution was to increase the function's timeout to a more reasonable value (10 seconds), demonstrating the importance of configuring function settings for a specific workload.
- CloudWatch Logs are Essential for Debugging: When the DynamoDB table was empty with no obvious errors, the definitive solution was to check the Lambda function's CloudWatch Logs. The logs provided the exact error messages needed to diagnose and solve every issue encountered, proving their value as the primary debugging tool in a serverless environment.