Skip to content
Zubair Nabi edited this page Jul 29, 2014 · 4 revisions

Overview

This project contains two email processing applications used for doing detailed performance analysis between IBM InfoSphere Streams and Apache Storm. The applications process emails from the Enron dataset and calculate metrics on the emails.

For a detailed description of the applications, please refer to the report here: https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf

In each of the applications, there are three distinct tasks that need to be performed:

  1. Preprocessing: Merge the Enron dataset into a single file. This is common to Storm and Streams.
  2. Dataset Creation: Take the merged Enron dataset, and serialize and compress it.
  3. Execution: Execute the main processing benchmark.

Before you get started, make sure that the following software requirements are installed on your system.

Software Requirements:

For Apache Storm Benchmark:

For InfoSphere Streams Benchmark:

Next Steps

[Preprocess Enron Email Dataset](Preprocess Enron Email Dataset)

[Create dataset for InfoSphere Streams benchmark](Create dataset for InfoSphere Streams benchmark)

[Running InfoSphere Streams benchmark](Running InfoSphere Streams benchmark)

[Create dataset for Apache Storm benchmark](Create dataset for Apache Storm benchmark)

[Running Apache Storm benchmark ](Running Apache Storm benchmark )