New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress all kinds of Exceptions raised by report() #1049

Merged
merged 1 commit into from Jan 30, 2017

Conversation

Projects
None yet
4 participants
@iksaif
Contributor

iksaif commented Jan 30, 2017

With the current code exceptions that do not inherit from RuntimeException will
make this task die silently.

@arteam arteam merged commit f49d1b7 into dropwizard:3.2-development Jan 30, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Jan 30, 2017

Member

Looks good. We should catch all exceptions, despite it's a rare case when a reporter implementation throws a checked exception.

Member

arteam commented Jan 30, 2017

Looks good. We should catch all exceptions, despite it's a rare case when a reporter implementation throws a checked exception.

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Jan 30, 2017

Member

Thank you for your contribution!

Member

arteam commented Jan 30, 2017

Thank you for your contribution!

@iksaif

This comment has been minimized.

Show comment
Hide comment
@iksaif

iksaif Jan 30, 2017

Contributor

Thanks for merging. Any idea when 3.2.0 will be released ?

Contributor

iksaif commented Jan 30, 2017

Thanks for merging. Any idea when 3.2.0 will be released ?

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Jan 30, 2017

Member

I'm currently merging/reviewing outstanding pull requests and bug fixes from the backlog to the 3.2.x branch. A big part of backlog was processed, but I would like to land #1047 and #1048 to 3.2.* and after that make a release. I will discuss it with @ryantenney (he has the deployment rights). Hopefully, we can make a release this week, but I can't promise it.

Member

arteam commented Jan 30, 2017

I'm currently merging/reviewing outstanding pull requests and bug fixes from the backlog to the 3.2.x branch. A big part of backlog was processed, but I would like to land #1047 and #1048 to 3.2.* and after that make a release. I will discuss it with @ryantenney (he has the deployment rights). Hopefully, we can make a release this week, but I can't promise it.

@iksaif

This comment has been minimized.

Show comment
Hide comment
@iksaif

iksaif Jan 30, 2017

Contributor

Great thanks, I'll ask for a version bump in Cassandra once it's released

Contributor

iksaif commented Jan 30, 2017

Great thanks, I'll ask for a version bump in Cassandra once it's released

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Feb 2, 2017

Member

If the Cassandra project is depended on it, I think it makes sense to backport the fix to the 3.1.* branch as well, so it's easier to upgrade for the Cassandra folks.

Member

arteam commented Feb 2, 2017

If the Cassandra project is depended on it, I think it makes sense to backport the fix to the 3.1.* branch as well, so it's easier to upgrade for the Cassandra folks.

arteam added a commit that referenced this pull request Feb 2, 2017

arteam added a commit that referenced this pull request Feb 15, 2017

Merge changes from the 3.1 branch to 3.2 (#1071)
* Support for publishing metrics of BigInteger and BigDecimal type

* Update third-party.rst

New reporter - metrics-munin-reporter

* Add links to more third party libs

* Update third-party to include metrics-circonus

* Fix spelling in core.rst

* #814 always close socket

(cherry picked from commit 2762ce5)

* Revert "Add log4j2 xml-based config support"

This reverts commit 6667014.

* Revert "Added log4j2 InstrumentedAppender xml support to docs"

This reverts commit 26c4cd2.

* Revert "Fix javadoc link"

This reverts commit 28e6556.

* Revert "Add CPU profiling support to the AdminServlet (fixes #927)"

This reverts commit bbb52c1.

* Revert "Support for publishing metrics of BigInteger and BigDecimal type"

This reverts commit 7cfc23c.

* Suppress all kinds of Exceptions raised by report() (#1049) (#1056)

* Support the JDK's ThreadLocalRandom (#1052)

Add a proxy which provides a ThreadLocalRandom implementation depending on its
availability in runtime. If the JDK provides ThreadLocalRandom, use it. Otherwise,
fallback to the internal implementation.

* Support JDK's LongAdder (#1055)

* Support the JDK's LongAdder

Add an ability to use the JDK's implementation if it's available in the
runtime. Use a proxy for creating LongAdders which tries to use the JDK's
implementation and fallbacks to the internal one if the JDK doesn't
provide any.

Unfortunatelly, `LongAdder`s don't have a common interface, therefore we
introduce `LongAdderAdapter` as a common interface for `LongAdder`s. It
will be used in other components as an interface, but a concrete
implementation of it will be created by `LongAdderProxy`.

This pattern allows to us to still run code in the JRE, but enjoy the
benefits of the stock `LongAdder` in modern JDKs (less bugs, better
performance).

* Run tests only on JDK8

We need a compile dependency on LongAdders which is not available
in JDK7.

* Retry DNS lookups (#1064)

Do retry DNS lookups, which are just done in the constructor of
InetSocketAddress.

Reason might be that in some environments DNS is very dynamic and thus a name
might be sometimes resolveable and sometimes not. The way this is currently
implemented the InetSocketAddress is always cached, which in itself caches the
return value of the last DNS lookup, even if that failed.

* Backport rescale bug (#1046)

* Added ExponentiallyDecayingReservoir test illustrating the rescale race condition.

(cherry picked from commit d9af3c1)

* Fixed the race condition in ExponentiallyDecayingReservoir's rescale method.

(cherry picked from commit b022f04)
@iksaif

This comment has been minimized.

Show comment
Hide comment
@iksaif

iksaif Feb 17, 2017

Contributor

Looks like #1047 and #1048 got merged, I guess the release is getting closer ?

Contributor

iksaif commented Feb 17, 2017

Looks like #1047 and #1048 got merged, I guess the release is getting closer ?

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Feb 17, 2017

Member

Looks like so. @ryantenney told me yesterday he is ready to cut the releases.

Member

arteam commented Feb 17, 2017

Looks like so. @ryantenney told me yesterday he is ready to cut the releases.

@jplock jplock added this to the 3.2.0 milestone Feb 17, 2017

@jplock jplock added the improvement label Feb 17, 2017

@iksaif

This comment has been minimized.

Show comment
Hide comment
@iksaif

iksaif Mar 17, 2017

Contributor

Looks like this wasn't enough :/ We should replace Exception by Throwable

Contributor

iksaif commented Mar 17, 2017

Looks like this wasn't enough :/ We should replace Exception by Throwable

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Mar 17, 2017

Member

Does the Cassandra reporter throw Errors? I thought it's a bad practice to catch them, because you could end up swallowing unrecoverable errors.

Member

arteam commented Mar 17, 2017

Does the Cassandra reporter throw Errors? I thought it's a bad practice to catch them, because you could end up swallowing unrecoverable errors.

@vladimir-bukhtoyarov

This comment has been minimized.

Show comment
Hide comment
@vladimir-bukhtoyarov

vladimir-bukhtoyarov Mar 18, 2017

Contributor

Yes, Cassandra throws exception on snapshot extraction time, and it does it by design(not because of bug). Currently I am trying to promote this request which solves this issue CASSANDRA-13223. I hope I will achieve success after fixes for mentioned notes.

I want to raise following thought: it is better to surround by try/catch the code places like this, because if any histogram throws exception then we have no reports at all. Missing of reports can lead to epic fail when something going wrong and you are unable to know how much wrong. I think that MetricsServlet should act in similar way and surrounds by try/catch each particular metric. And that why I do not use any built-in Dropwizard reporter, because I do not want to come in situation when reports are missed because of tricky bugs(like this #1104) or absurd design solution(like Cassandra).

Contributor

vladimir-bukhtoyarov commented Mar 18, 2017

Yes, Cassandra throws exception on snapshot extraction time, and it does it by design(not because of bug). Currently I am trying to promote this request which solves this issue CASSANDRA-13223. I hope I will achieve success after fixes for mentioned notes.

I want to raise following thought: it is better to surround by try/catch the code places like this, because if any histogram throws exception then we have no reports at all. Missing of reports can lead to epic fail when something going wrong and you are unable to know how much wrong. I think that MetricsServlet should act in similar way and surrounds by try/catch each particular metric. And that why I do not use any built-in Dropwizard reporter, because I do not want to come in situation when reports are missed because of tricky bugs(like this #1104) or absurd design solution(like Cassandra).

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Mar 19, 2017

Member

There is a PR for GraphiteReporter which provides something like that (#984), I could take a look at it.

I think it's not a bad idea and could improve reliability and resilience. Doing so, we avoid situations when a failure for a set of metrics prevents the whole MetricsRegistry from reporting (or worse sends bogus data). If we adopt it, it means we will not imply that all metrics implementation don't throw exceptions, but embrace failures and find a means to mitigate from them. That might be not so easy, but it looks like a step in the right direction.

Member

arteam commented Mar 19, 2017

There is a PR for GraphiteReporter which provides something like that (#984), I could take a look at it.

I think it's not a bad idea and could improve reliability and resilience. Doing so, we avoid situations when a failure for a set of metrics prevents the whole MetricsRegistry from reporting (or worse sends bogus data). If we adopt it, it means we will not imply that all metrics implementation don't throw exceptions, but embrace failures and find a means to mitigate from them. That might be not so easy, but it looks like a step in the right direction.

@iksaif

This comment has been minimized.

Show comment
Hide comment
@iksaif

iksaif Mar 20, 2017

Contributor

@arteam the thing is that currently the silentely ignore Throwables and Errors subclasses. IMOH it's better to log something.

Contributor

iksaif commented Mar 20, 2017

@arteam the thing is that currently the silentely ignore Throwables and Errors subclasses. IMOH it's better to log something.

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Mar 20, 2017

Member

I'm still a lit bit reluctant to change Exception to Throwable. This is an easy fix for the Cassandra issue, but potentially can have a negative impact on other reporters. I still believe that handling Errors is not a right thing to do. Could point me on the place where Cassandra throws an Error?

Member

arteam commented Mar 20, 2017

I'm still a lit bit reluctant to change Exception to Throwable. This is an easy fix for the Cassandra issue, but potentially can have a negative impact on other reporters. I still believe that handling Errors is not a right thing to do. Could point me on the place where Cassandra throws an Error?

@iksaif

This comment has been minimized.

Show comment
Hide comment
@iksaif

iksaif Mar 20, 2017

Contributor

That's the thing: I don't know since the reporter just silently gets un-scheduled forever. I'll try a patched version catching everything to see if I can find out.

Contributor

iksaif commented Mar 20, 2017

That's the thing: I don't know since the reporter just silently gets un-scheduled forever. I'll try a patched version catching everything to see if I can find out.

@arteam

This comment has been minimized.

Show comment
Hide comment
@arteam

arteam Mar 20, 2017

Member

Ah, I see. Thanks for hunting the issue!

Member

arteam commented Mar 20, 2017

Ah, I see. Thanks for hunting the issue!

iksaif pushed a commit to criteo-forks/metrics that referenced this pull request Apr 25, 2017

@iksaif

This comment has been minimized.

Show comment
Hide comment
@iksaif

iksaif May 4, 2017

Contributor

Ok I can confirm that fixes the issue because there are a lot of cases where an AssertionError (Error, not Exception) is being throwned, for example:

{code}
/var/log/cassandra/system.log.18.zip:ERROR [metrics-graphite-reporter-1-thread-1] 2017-05-02 15:17:47,178 ScheduledReporter.java:176 - Exception thrown from GraphiteReporter#report. Exception was suppressed.
/var/log/cassandra/system.log.18.zip:java.lang.AssertionError: null
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.io.util.Memory.size(Memory.java:376) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.utils.obs.OffHeapBitSet.serializedSize(OffHeapBitSet.java:131) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.utils.BloomFilterSerializer.serializedSize(BloomFilterSerializer.java:65) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.utils.BloomFilter.serializedSize(BloomFilter.java:64) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.io.sstable.format.SSTableReader.getBloomFilterSerializedSize(SSTableReader.java:1295) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$29.getValue(TableMetrics.java:601) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$29.getValue(TableMetrics.java:596) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$35.getValue(TableMetrics.java:742) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$35.getValue(TableMetrics.java:736) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
{code}

I've sent #1128

Contributor

iksaif commented May 4, 2017

Ok I can confirm that fixes the issue because there are a lot of cases where an AssertionError (Error, not Exception) is being throwned, for example:

{code}
/var/log/cassandra/system.log.18.zip:ERROR [metrics-graphite-reporter-1-thread-1] 2017-05-02 15:17:47,178 ScheduledReporter.java:176 - Exception thrown from GraphiteReporter#report. Exception was suppressed.
/var/log/cassandra/system.log.18.zip:java.lang.AssertionError: null
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.io.util.Memory.size(Memory.java:376) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.utils.obs.OffHeapBitSet.serializedSize(OffHeapBitSet.java:131) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.utils.BloomFilterSerializer.serializedSize(BloomFilterSerializer.java:65) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.utils.BloomFilter.serializedSize(BloomFilter.java:64) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.io.sstable.format.SSTableReader.getBloomFilterSerializedSize(SSTableReader.java:1295) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$29.getValue(TableMetrics.java:601) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$29.getValue(TableMetrics.java:596) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$35.getValue(TableMetrics.java:742) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
/var/log/cassandra/system.log.18.zip: at org.apache.cassandra.metrics.TableMetrics$35.getValue(TableMetrics.java:736) ~[apache-cassandra-3.10-criteo1.jar:3.10-criteo1]
{code}

I've sent #1128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment