Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a collector for Hadoop Metrics #72

Merged
merged 1 commit into from
Jun 6, 2012
Merged

Added a collector for Hadoop Metrics #72

merged 1 commit into from
Jun 6, 2012

Conversation

tehmaze
Copy link
Contributor

@tehmaze tehmaze commented Jun 6, 2012

Hadoop can generate metric files, containing a time stamp with key-value pairs. All values that are numeric are sent to graphite.

kormoc added a commit that referenced this pull request Jun 6, 2012
Added a collector for Hadoop Metrics
@kormoc kormoc merged commit 315e757 into BrightcoveOS:master Jun 6, 2012
@kormoc
Copy link
Contributor

kormoc commented Jun 6, 2012

Thanks!

@kormoc
Copy link
Contributor

kormoc commented Jun 6, 2012

When writing the unit test for the collector, it appears that it doesn't process multiple lines per log file very well. If you look at the https://github.com/BrightcoveOS/Diamond/blob/master/src/collectors/HadoopCollector/fixtures/jvmmetrics.log file, there are multiple lines per jvm.metrics key. It appears to me that we want to extract out the hostname and processname and add them to the stats key?

@tehmaze
Copy link
Contributor Author

tehmaze commented Jun 6, 2012

Please re-open this issue so I can investigate, it's probaby best to construct a key using the hostName and processName keys, in stead of using the second argument. What do you reckon?

@kormoc
Copy link
Contributor

kormoc commented Jun 6, 2012

I don't see a way to reopen the request. We can just start a new issue to track this.

It looks like each style of log has a different set of keys

rpcmetrics.log looks like it would use hostname and port
mrmetrics.log looks like it would only need to use a custom key for mapred.job and it would be hostname, group, and counter
jvmmetrics.log looks like it would use hostname and process name
dfsmetrics.log looks fine as is

Does that seem reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants