ECE 590.24 Data Analysis At Scale in Cloud

Previous projects:

Jupyter workflows using Docker Container

Serverless Data Engineering Pipeline

This is individual project 3 for my course, Data Analysis At Scale in Cloud. In this project, I build an OCR application on AWS Lambda with Rekognition APIs to detect text in S3 Objects and stores labels in DynamoDB. More features about the application are in the screencast here.

Project structure

Here is a brief overview of the repo.

.
├── README.md                   <-- This instructions file
├── src                         <-- Source code for the Lambda function
│   ├── __init__.py
│   └── app.py                  <-- Lambda function code
├── template.yaml               <-- SAM template
└── SampleEvent.json            <-- Sample S3 event

Requirements

How to build (on Cloud9 environment)

python3 -m venv ~/.ocrlambda  ## create virtual environment
pip3 install pip setuptools wheel pyyaml -U  ## update the pip version
pip install boto3 botocore awscli aws-sam-cli -U  ## install python SDK, aws and sam command line tools
sam init --location gh:aws-samples/cookiecutter-aws-sam-s3-rekognition-dynamodb-python  ## get the rekognition template

After a few configurations for example your the project structure is ready. You can check it by the following.

Next, cd into your directory, run the following command to create a S3 bucket, package the Lambda function and upload it to the bucket.

touch requirements.txt
sam build --use-container
aws s3 mb s3://your-bucket-name
sam package \
    --template-file template.yaml \
    --output-template-file packaged.yaml \
    --s3-bucket your-bucket-name

The sam deploy command will create a Cloudformation Stack and deploy the SAM resources.

sam deploy \
    --template-file packaged.yaml \
    --stack-name aws-sam-ocr \
    --capabilities CAPABILITY_IAM \
    --region your-region

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

SampleEvent.json

SampleEvent.json

package.yaml

package.yaml

requirements.txt

requirements.txt

template.yaml

template.yaml

Repository files navigation

ECE 590.24 Data Analysis At Scale in Cloud

Serverless Data Engineering Pipeline

Project structure

Requirements

How to build (on Cloud9 environment)

Resources:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
SampleEvent.json		SampleEvent.json
package.yaml		package.yaml
requirements.txt		requirements.txt
template.yaml		template.yaml

JiajunSong629/Quick_OCR_with_AWS_Lambda

Folders and files

Latest commit

History

Repository files navigation

ECE 590.24 Data Analysis At Scale in Cloud

Serverless Data Engineering Pipeline

Project structure

Requirements

How to build (on Cloud9 environment)

Resources:

About

Topics

Resources

Stars

Watchers

Forks

Languages