Skip to content

aws-samples/address-enrichment-and-caching-using-stepfunctions

Address Enrichment and Caching Using AWS Step Functions by Leveraging Amazon Location Service

Traditional methods of performing address enrichment on geospatial datasets can be expensive and time consuming.

Using Amazon Location Service with AWS Step Functions for orchestration and with Amazon DynamoDB for caching in a serverless data processing pipeline, you may achieve significant performance improvements and cost savings on address enrichment jobs that use geospatial data.

This sample is an evolution to the already available sample, which only uses Lambda functions (can be found here).

Some of the improvements in this project includes:

The repository contains a SAM tempalte for deploying a Serverless Address Enrichment pipeline using:

It also uses sample data sourced from publicly available datasets that you can deploy and use to test the application.

This project addresses the concerns from the customers, how they can improve the performance of their application and at the same time optimize their costs.

Highlevel Architecture

image

  1. The Scatter Lambda function takes a data set from the S3 bucket labeled input and breaks it into equal sized shards.
  2. The Process Lambda function takes each shard from the pre-processed bucket and performs Address Enrichment in parallel calling the Amazon Location Service Places API and storing
  3. The Gather Lambda function takes each shard from the post-processed bucket and appends them into a complete dataset with additional address information.

Deploying the Project

Prerequistes:

To use the SAM CLI, you need the following tools:

This Sample Includes:

  • template.yaml: Contains the AWS SAM template that defines you applications AWS resources, which includes a Place Index for Amazon Location Service
  • statemachine/location_service_scatter_gather.asl.yaml: Contains the Step Functions ASL definition
  • functions/scatter/: Contains the Lambda handler logic behind the scatter function and its requirements
  • functions/process/: Contains the Lambda handler logic for the processor function which calls the Amazon Location Service Places API to perform address enrichment
  • functions/gather/: Contains the Lambda handler logic for the gather function which appends all of processed data into a complete dataset
  • tests/: TBD - Needs to contain test cases (Unit and Integration Tests)

Deploy the Sam-App:

  1. Use git clone https://github.com/aws-samples/address-enrichment-and-caching-using-stepfunctions to clone the repository to your environment where AWS SAM and python are installed.
  2. Use cd address-enrichment-and-caching-using-stepfunctionsto change into the project directory containing the template.yaml file SAM uses to build your application.
  3. If you have Docker installed, you can use sam build --use-container, otherwise, you can use sam build to build your application using SAM. You should see:
Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Test Function in the Cloud: sam sync --stack-name {stack-name} --watch
[*] Deploy: sam deploy --guided
  1. Use sam deploy --guided to deploy the application to your AWS account. Enter responses based on your environment:
Configuring SAM deploy
======================

        Looking for config file [samconfig.toml] :  Not found

        Setting default arguments for 'sam deploy'
        =========================================
        Stack Name [sam-app]: address-enrichment
        AWS Region [us-west-2]: us-east-1
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [y/N]: Y
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]: Y
        #Preserves the state of previously provisioned resources when an operation fails
        Disable rollback [y/N]: N
        Save arguments to configuration file [Y/n]: Y
        SAM configuration file [samconfig.toml]: 
        SAM configuration environment [default]: 

Testing the Application

Download the below samples locally, unzip the files, and upload the CSV to your input S3 bucket to trigger the adddress enrichment pipeline.

Geocoding: City of Hartford, CT Business Listing Dataset

Reverse Geocoding: Miami Housing Dataset

Cleanup

In order to avoid incurring any charges, this section talks about cleaning up the AWS resources, which got created when following through this sample.

Pre-req:

Make sure you empty the following S3 buckets before deleting the Cloud Formation Stack (as the deletion will fail for non-empty buckets):

  • input-stack-name-aws-region-aws-accountnumber
  • raw-stack-name-aws-region-aws-accountnumber
  • processed-stack-name-aws-region-aws-accountnumber
  • destination-stack-name-aws-region-aws-accountnumber

Method 1:

To delete the resources you created as part of this sample, you can run sam delete:

sam delete                                                                                                                                                     
        Are you sure you want to delete the stack address-enrichment in the region us-east-1 ? [y/N]: y
        Are you sure you want to delete the folder address-enrichment in S3 which contains the artifacts? [y/N]: y
        - Deleting S3 object with key address-enrichment/c2710045fb8c4c4d77e47fba2f9754e4
        - Deleting S3 object with key address-enrichment/c5ca75d7c52419e4077a3c030d76d812
        - Deleting S3 object with key address-enrichment/04c2cdceeee06f8998eccf77fc6ffb9b
        - Deleting S3 object with key address-enrichment/f1e2091b2a434fd87f023b603e23fe10
        - Deleting S3 object with key address-enrichment/5a46e427cf72552a09e714f3a5c16461.template
        - Deleting Cloudformation stack address-enrichment

Deleted successfully

Method 2:

Alternatively, you can delete the AWS CloudFormation Stack by logging in to your AWS Console and navigating to AWS CloudFormation service. Then select Stacks. After selecting the Stack you want to delete, click on Delete button on top right.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages