This is a application that utilizes the Hadoop framework and AWS EMR service to distribute tasks across multiple nodes for an exercise in distributed computing
This application was built running Hadoop 3.3.0, and Java 15.0.1
Hadoop was installed using this tutorial: https://brain-mentors.com/hadoopinstallation/
Download and install the above-mentioned requirements or deploy directly to AWS
To clean the generated files and build a single fat JAR file:
sbt clean assembly
The run the JAR file with a prompt of which task to execute:
sbt "run log/LogFileGenerator.2022-09-20.log reports"
or run a task directly as described below.
Shows the distribution of different types of messages across predefined time intervals
sbt "runMain TypeDistribution log/LogFileGenerator.2022-09-20.log reports"
Time intervals sorted in the descending order that contained most log messages of the type ERROR
sbt "runMain ErrorIntervalSort log/LogFileGenerator.2022-09-20.log reports"
Produces the number of the generated log messages
sbt "runMain TypeCount log/LogFileGenerator.2022-09-20.log reports"
The number of characters in each log message for each log message type
sbt "runMain CharacterCount log/LogFileGenerator.2022-09-20.log reports"
Here is a short video showing how to deploy to AWS's EMR service