Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetricsServlet sometimes dies when a Gauge throws an exception #68

Closed
smanek opened this issue Aug 19, 2011 · 3 comments
Closed

MetricsServlet sometimes dies when a Gauge throws an exception #68

smanek opened this issue Aug 19, 2011 · 3 comments

Comments

@smanek
Copy link

smanek commented Aug 19, 2011

We've seen a few backtraces like these on our servers:

2011-08-18 19:25:04,012 (50338760) [http-8080-11] ERROR com.greplin.common.server.ErrorLoggingFilter - Uncaught error in servlet
org.codehaus.jackson.JsonGenerationException: Can not write text value, expecting field name
at org.codehaus.jackson.impl.JsonGeneratorBase._reportError(JsonGeneratorBase.java:481)
at org.codehaus.jackson.impl.Utf8Generator._verifyValueWrite(Utf8Generator.java:919)
at org.codehaus.jackson.impl.Utf8Generator.writeString(Utf8Generator.java:446)
at com.yammer.metrics.reporting.MetricsServlet.writeGauge(MetricsServlet.java:325)
at com.yammer.metrics.reporting.MetricsServlet.writeMetric(MetricsServlet.java:267)
at com.yammer.metrics.reporting.MetricsServlet.writeRegularMetrics(MetricsServlet.java:256)
at com.yammer.metrics.reporting.MetricsServlet.handleMetrics(MetricsServlet.java:243)
at com.yammer.metrics.reporting.MetricsServlet.doGet(MetricsServlet.java:151)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.greplin.common.server.ServletContextFilter.doFilter(ServletContextFilter.java:37)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.greplin.common.server.ErrorLoggingFilter.doFilter(ErrorLoggingFilter.java:31)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:865)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)

I looked at the writeGauge source code and tried to reproduce the bug in a simple test case, without much luck. Just thought I'd report the bug in case anyone else has any ideas. This seems to occur very rarely, fwiw (100s of servers, with their status pages being polled dozens of times per hour, and we only see ~1 of these per day).

@codahale
Copy link
Contributor

Hmm. It looks like there's an exception thrown while trying to write out the value of the gauge. Jackson considers the field written despite the fact that something went wrong — perhaps a downstream IO error?

It might help to kick out a custom build with some debug logging associated with that exception handling — without knowing more about what that exception was, it's hard to say how we can handle this better. I haven't seen this in production on our machines, FWIW.

@codahale
Copy link
Contributor

Do you still see this in production?

@smanek
Copy link
Author

smanek commented Nov 16, 2011

Yep (we're now on beta 18). Still very rare, but consistently 2-3 get logged per day (out of 10s of thousands of times of getting metrics over http per day).

If no one else is seeing this you can just close this issue - it doesn't have any real negative impact for us, and the code in Metrics looked fine.

Screenshot of our internal exception aggregation tool: https://skitch.com/e-smanek/ge6gu/can-not-write-text-value-expecting-field-name-greplin-exception-catcher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants