Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
firehose-twitter-streaming-nodejs twitter app modifications Nov 18, 2015
lambda-s3-twitter-to-es-python aws-blog-firehose-lambda-elasticsearch-near-real-time-discovery-platf… Jan 18, 2018
packages
README.md

README.md

Building near real-time discovery platform
with AWS Lambda, Amazon Kinesis Firehose and Elasticsearch

This is the code repository for code sample used in AWS Big data blog Building a Near-Real-Time Discovery Platform with AWS

Prerequisites

Overview of Example

AWS Lambda function

AWS Lambda function (lambda-s3-twitter-to-es-python/lambda_function.py) that is triggered once a new file is created on S3. The function does the following:

  1. Reads the file content
  2. Parses the content to JSON format (Elasticsearch stores documents in JSON format)
  3. Analyzes Twitter data (lambda-s3-twitter-to-es-python/tweet_utils.py):
      a. Extracts Twitter mentions (@username) from the tweet text.
      b. Extracts sentiment based on emoticons. If there’s no emoticon in the text the function uses textblob sentiment analysis
  4. Loads the data to Elasticsearch (twitter_to_es.py) using elasticsearch-py library

You can download and unzip the deployment package (packages/s3_twitter_to_es.zip), which already includes elasticsearch and textblob modules, or create a deployment package yourself.

Please replace <<PARAMETER>> with your values.

Modify s3-twitter-to-es-python/config.py by assigning es_host and es_port

Zip your deployment folder

cd path/to/s3-twitter-to-es-python
zip -r -9 ../s3-twitter-to-es-python.zip .

create IAM Role and name it <<LAMBDA_EXEC_ROLE>>

create aws lambda function

aws lambda create-function \
--region <<REGION>> \
--function-name s3-twitter-to-es-python  \
--zip-file fileb://path/to/s3_twitter_to_es.zip \
--role arn:aws:iam::<<ACCOUNT_ID>>:role/<<LAMBDA_EXEC_ROLE>> \
--handler lambda_function.handler \
--runtime python2.7 \
--timeout 120

Add S3 as the event source to the lambda function with your <<S3_BUCKET>> and <<S3_PREFIX>>

Running example

Create Firehose IAM Role named “firehose_delivery_role” based on the following policy:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Action": [ "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::<<S3_BUCKET>>" ] } ] }

setup nodejs application

cd firehose-twitter-streaming-nodejs
npm install

modify configurations in config.js

• firehose
  o DeliveryStreamName – name your stream. The app will create the delivery stream if it does not exist
  o BucketARN: Use <<S3_BUCKET>> that you entered as the event source for the lambda function
  o RoleARN: Use the Firehose role you created earlier (“firehose_delivery_role”)
  o Prefix: Use <<S3_PREFIX>> that you entered as the event source for the lambda function
• twitter – enter your Twitter application keys.
• region – your firehose region (e.g.: us-east-1, us-west-2, eu-west-1)

run the application

node twitter_stream_producer_app

Please see Building a Near-Real-Time Discovery Platform with AWS for more details on using Elasticsearch and Kibana as the discovery platform.

You can’t perform that action at this time.