This repository has been archived by the owner on May 12, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 507
METRON-1708 Run the Batch Profiler in Spark [Feature Branch] #1161
Closed
nickwallen
wants to merge
16
commits into
apache:feature/METRON-1699-create-batch-profiler
from
nickwallen:METRON-1708-v2
Closed
METRON-1708 Run the Batch Profiler in Spark [Feature Branch] #1161
nickwallen
wants to merge
16
commits into
apache:feature/METRON-1699-create-batch-profiler
from
nickwallen:METRON-1708-v2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…to have so much data
7 tasks
nickwallen
changed the title
METRON-1708 Run the Batch Profiler in Spark
METRON-1708 Run the Batch Profiler in Spark [Feature Branch]
Aug 14, 2018
nickwallen
changed the base branch from
master
to
feature/METRON-1699-create-batch-profiler
August 14, 2018 13:59
7 tasks
…-profiler' into METRON-1708-v2
7 tasks
…-profiler' into METRON-1708-v2
…-profiler' into METRON-1708-v2
10 tasks
…-profiler' into METRON-1708-v2
I had to change the period settings in |
Thanks @merrimanr ! I updated the testing steps in the PR description to account for what you found. I will add a README for the Spark Profiler in #1163 . That will allow it to include some installation steps. |
asfgit
pushed a commit
that referenced
this pull request
Aug 27, 2018
This has been merged. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds the ability to run the Batch Profiler from the command line. This also packages up the Batch Profiler into a tarball.
This is a pull request against the
METRON-1699-create-batch-profiler
feature branch.This is dependent on the following PRs. By filtering on the last commit, this PR can be reviewed before the others are reviewed and merged.
Testing
Start-up the development environment. Allow Metron to run for a bit so that a fair amount of telemetry is archived in HDFS.
Stop all Metron services.
Install Spark2 using Ambari.
Deploy the Batch Profiler to the development environment.
From the host machine; outside the development VM, run the following.
Then from the development VM, run the following.
Create a profile by editing
$METRON_HOME/config/zookeeper/profiler.json
as follows.Count the number of messages in the 'indexing' topic. This should not be changing.
In this case there are 8,131 messages.
Delete any previously written profile measurements from HBase.
Confirm that all of the messages were successfully indexed in HDFS.
Alter the
$METRON_HOME/config/batch-profiler.properties
as follows.Fix-up some of the Spark configuration.
You may need to create the Spark history directory in HDFS (if doing this in Full Dev.)
You may want to edit the log4j properties that sits in your config directory in $SPARK_HOME, or create one.
Run the Batch Profiler.
You should see something like the following.
Fetch the profile measurements created by the Profiler.
Change the period duration for the Profiler Client to match the 1 minute duration that was used by the Batch Profiler.
The Profiler counted a couple hundred messages each minute.
Overall, there were 30 measurements captured from the archived telemetry.
The Profiler counted a total of 8,130 messages.
Validate the range of time over which we have telemetry.
Launch the Spark Shell.
In the spark shell, run the following.
We see that 1,769,980 milliseconds is about 30 minutes. That matches the 30 measurements that have been captured by the Profiler.
Pull Request Checklist