Lambda to start EMR and run a map reduce job Read more:

Initialization & Set up

  1. Upload files to s3: Upload your mapper, reducer, input and initialization script to S3
  2. Update bucket name, files names and other variables in
  3. Create a Lambda in AWS Console and upload
  4. Create an s3 trigger in the created lambda and set the trigger path to input.csv
  5. Upload input csv again to trigger path to trigger the Lambda

EMR Attributes


In this section, you can specify EMR cluster configurations

  • InstanceRole - MASTER or CORE
  • InstanceType - The size of the server
  • InstanceCount - Number of respective instances to be launched
  • Ec2KeyName - An existing key pair name without extension. This allows us to ssh into the cluster


BootstrapActions is to setup environment for your mapper and reducer scripts. Here you can optionally specify a script which will install software, library, packages which your files need. This script will be executed on all the nodes of you cluster whether master or core


This is where you define a step which executes after EMR is ready. The current script has a step to run a hadoop-streaming command which is our map reduce job. In the current script, there is only one step but you can add more if needed.

