New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-3655 Global Phoenix Client Metrics for PQS #315
Conversation
The patch might be tricky to port to other branches. Will also look into that once the initial one gets in. |
} | ||
|
||
private boolean verifyMetricsFromSinkOnce(Map<String, Long> expectedMetrics) { | ||
synchronized (GlobalPhoenixMetricsTestSink.lock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it were me, I would have used a Queue or something to save off the metrics, but there's nothing wrong with your approach here :)
private boolean verifyMetricsFromSinkOnce(Map<String, Long> expectedMetrics) { | ||
synchronized (GlobalPhoenixMetricsTestSink.lock) { | ||
for (AbstractMetric metric : GlobalPhoenixMetricsTestSink.metrics) { | ||
if (expectedMetrics.containsKey(metric.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check doesn't catch the case where you have expectedMetrics
but GlobalPhoenixMetricsTestSink.metrics
is empty. I'd switch this around to iterate over your expectations, ensuring that they all exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also soon add a test that disables these metrics as well.
I'd switch this around to iterate over your expectations, ensuring that they all exist.
The reason why I didn't do it that way because I didn't wanted to iterate over the it for every expected metric. It doesn't really matter since its a test though. One approach can be to remove each entry from the map and check if the size is 0 at the end. How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach can be to remove each entry from the map and check if the size is 0 at the end. How does that sound?
That would work too!
# For right now, all metrics are exported to a phoenix table | ||
*.source.start_mbeans=false | ||
hbase.source.start_mbeans=true | ||
hbase.sink.sink0.class=org.apache.phoenix.monitoring.GlobalPhoenixMetricsTestSink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave a comment in the test that this configuration is here? Otherwise, might seem like voodoo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. It was hard for me as well since things were not working as expected in the first place. Took a little while to realize this file was the reason :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, perfect :)
|
||
private static MetricRegistry createMetricRegistry() { | ||
LOG.info("Creating Metric Registry for Phoenix Global Metrics"); | ||
MetricRegistryInfo registryInfo = new MetricRegistryInfo("PHOENIX", "Phoenix Global Metrics", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "Global" mean in this context? Is "Phoenix Client Metrics" more reasonable? Maybe "Phoenix,sub=CLIENT" down below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global here refers that it is an aggregation across all clients (or all Phoenix Connections).
Maybe "Phoenix,sub=CLIENT" down below?
Seems reasonable as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global here refers that it is an aggregation across all clients (or all Phoenix Connections).
In my mind, if you show me "Phoenix Client Metrics", I would naturally assume that they are for all clients that have metrics enabled (as opposed to breakouts by individual clients). As such, the word "Global" to me is just filler, but maybe that's just my interpretation of it.
Addressed your comments @joshelser, Please review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close now, Karan.
You do any testing in a "real" environment? Does this work/feel like you expect it to?
LOG.info("Values from Hadoop Metrics Sink match actual values"); | ||
return true; | ||
} | ||
Thread.sleep(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to:
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return false;
}
Then, remove the throws InterruptedException
.
We should be catching, re-setting the interrupted state and just returning ASAP. Likely, this code never gets invoked in a unit-test context, but good practice.
private boolean verifyMetricsFromSinkOnce(Map<String, Long> expectedMetrics) { | ||
synchronized (GlobalPhoenixMetricsTestSink.lock) { | ||
for (AbstractMetric metric : GlobalPhoenixMetricsTestSink.metrics) { | ||
if (expectedMetrics.containsKey(metric.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there was an issue with unboxing the value from expectedMetrics.get(metric.name())
.
I think it would be cleaner to do:
Long value = expectedMetrics.get(metric.name());
if (value != null) {
long expectedValue = value.longValue();
...
private static GlobalMetricRegistriesAdapter INSTANCE; | ||
|
||
private GlobalMetricRegistriesAdapter() { | ||
DefaultMetricsSystem.initialize("HBase"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/HBase/Phoenix/
|
||
private GlobalMetricRegistriesAdapter() { | ||
DefaultMetricsSystem.initialize("HBase"); | ||
JvmMetrics.initSingleton("HBase", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think collecting JVM metrics from client applications is heavy-handed. I'd suggest you drop this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I added this since this is going to be a part of PQS, where having these metrics would help. Should I cover this with a config parameter then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. I keep getting confused over that.
JVM metrics being exported from PQS makes total sense. +1 to that
I am now wondering: should we be exporting Phoenix metrics out of PQS under "Client" or make some specific "sub=PQS" or similar for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep it as client since they represent client in the end, even though it PQS that is instantiating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will you differentiate this "client" (PQS) from other clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I didn't you on this one. Could you elaborate?
Also, I realized that stopping JVM metrics from Phoenix Level is not an option here. Since phoenix embeds hbase-client which instantiates JVM Metrics by default. Checkout this stack trace.
init:57, BaseSourceImpl$DefaultMetricsSystemInitializer (org.apache.hadoop.hbase.metrics) <init>:112, BaseSourceImpl (org.apache.hadoop.hbase.metrics) <init>:56, MetricsZooKeeperSourceImpl (org.apache.hadoop.hbase.zookeeper) <init>:51, MetricsZooKeeperSourceImpl (org.apache.hadoop.hbase.zookeeper) newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect) newInstance:62, NativeConstructorAccessorImpl (sun.reflect) newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect) newInstance:423, Constructor (java.lang.reflect) newInstance:442, Class (java.lang) nextService:380, ServiceLoader$LazyIterator (java.util) next:404, ServiceLoader$LazyIterator (java.util) next:480, ServiceLoader$1 (java.util) getInstance:59, CompatibilitySingletonFactory (org.apache.hadoop.hbase) <init>:38, MetricsZooKeeper (org.apache.hadoop.hbase.zookeeper) <init>:130, RecoverableZooKeeper (org.apache.hadoop.hbase.zookeeper) connect:143, ZKUtil (org.apache.hadoop.hbase.zookeeper) <init>:181, ZooKeeperWatcher (org.apache.hadoop.hbase.zookeeper) <init>:155, ZooKeeperWatcher (org.apache.hadoop.hbase.zookeeper) <init>:43, ZooKeeperKeepAliveConnection (org.apache.hadoop.hbase.client) getKeepAliveZooKeeperWatcher:1737, ConnectionManager$HConnectionImplementation (org.apache.hadoop.hbase.client) getClusterId:104, ZooKeeperRegistry (org.apache.hadoop.hbase.client) retrieveClusterId:945, ConnectionManager$HConnectionImplementation (org.apache.hadoop.hbase.client) <init>:721, ConnectionManager$HConnectionImplementation (org.apache.hadoop.hbase.client) newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect) newInstance:62, NativeConstructorAccessorImpl (sun.reflect) newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect) newInstance:423, Constructor (java.lang.reflect) createConnection:238, ConnectionFactory (org.apache.hadoop.hbase.client) createConnection:439, ConnectionManager (org.apache.hadoop.hbase.client) createConnectionInternal:348, ConnectionManager (org.apache.hadoop.hbase.client) createConnection:144, HConnectionManager (org.apache.hadoop.hbase.client) createConnection:47, HConnectionFactory$HConnectionFactoryImpl (org.apache.phoenix.query) openConnection:428, ConnectionQueryServicesImpl (org.apache.phoenix.query) access$400:270, ConnectionQueryServicesImpl (org.apache.phoenix.query) call:2539, ConnectionQueryServicesImpl$12 (org.apache.phoenix.query) call:2515, ConnectionQueryServicesImpl$12 (org.apache.phoenix.query) call:76, PhoenixContextExecutor (org.apache.phoenix.util) init:2515, ConnectionQueryServicesImpl (org.apache.phoenix.query) getConnectionQueryServices:255, PhoenixDriver (org.apache.phoenix.jdbc) createConnection:150, PhoenixEmbeddedDriver (org.apache.phoenix.jdbc) connect:221, PhoenixDriver (org.apache.phoenix.jdbc) getConnection:664, DriverManager (java.sql) getConnection:208, DriverManager (java.sql) createConnection:640, JdbcMeta (org.apache.calcite.avatica.jdbc) openConnection:625, JdbcMeta (org.apache.calcite.avatica.jdbc) apply:285, LocalService (org.apache.calcite.avatica.remote) accept:1770, Service$OpenConnectionRequest (org.apache.calcite.avatica.remote) accept:1750, Service$OpenConnectionRequest (org.apache.calcite.avatica.remote) apply:94, AbstractHandler (org.apache.calcite.avatica.remote) apply:46, ProtobufHandler (org.apache.calcite.avatica.remote) handle:127, AvaticaProtobufHandler (org.apache.calcite.avatica.server) handle:52, HandlerList (org.eclipse.jetty.server.handler) handle:97, HandlerWrapper (org.eclipse.jetty.server.handler) handle:499, Server (org.eclipse.jetty.server) handle:311, HttpChannel (org.eclipse.jetty.server) onFillable:257, HttpConnection (org.eclipse.jetty.server) run:544, AbstractConnection$2 (org.eclipse.jetty.io) runJob:635, QueuedThreadPool (org.eclipse.jetty.util.thread) run:555, QueuedThreadPool$3 (org.eclipse.jetty.util.thread) run:748, Thread (java.lang)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I didn't you on this one. Could you elaborate?
You're using openTSDB to collect all of this data and your use case is to "understand how PQS runs". To do this, you will need to know what hosts that PQS is running on to just look at PQS metrics. I was suggesting that having a unique name (or a tag) would make an operator's life much more simple -- it would be clear (and presumably easier to filter on) to find just PQS metrics.
For example, the difference of issuing a filter "host in pqs_server1, pqs_server2, pqs_server3" as opposed to just "tag=PQS", or similar.
Now that I'm thinking about this, leaving this as {{Phoenix,sub=CLIENT}} may be OK and you can instead just configure the push of metrics data to Hadoop Metrics2 to include a "tag" that indicates this is from PQS. That would make me happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand this correctly, I will include a new property which takes a csv of tags as input. Later on, I can split by comma and add those tags to HBaseMetrics2HadoopMetricsAdapter#snapshotAllMetrics()
. @joshelser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on how you expose this to let PQS say "I am PQS" (e.g. whether CSV, list, whatever).
But yeah, modifying snapshotAllMetrics
to include a Hadoop Metrics2 MetricsTag should do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Thanks!
@@ -204,6 +230,45 @@ private static void resetGlobalMetrics() { | |||
} | |||
} | |||
|
|||
// Phoenix Client Metrics are transported via Hadoop-metrics2 sink | |||
// The test sink is defined at GlobalPhoenixMetricsTestSink | |||
// Configuration for Hadoop-metrics2 comes from hadoop-metrics2.properties file located in test/resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
public class GlobalPhoenixMetricsTestSink implements MetricsSink { | ||
|
||
public static final String PHOENIX_METRICS_RECORD_NAME = "PHOENIX"; | ||
static Object lock = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave a comment here about the importance of this lock? e.g. that the caller should hold it to read metrics
.
@joshelser I was trying out this patch with an internal PQS version. All these metrics are pushed to JMX and we monitor those via OpenTSDB tcollectors. I haven't done any perf-test here, but I can create a "rate of" on the metrics to track the increase in values over time. Still working on some more metrics patches internally before I can see this end to end. From @twdsilva and @apurtell experience, seems like these metrics should help monitor the health and usage of PQS. |
Cool! I was just looking for a sanity check that you are seeing all of the metrics you expect coming out of PQS and that they are "useful". No need to perform any perf testing for me. |
@joshelser I think the only question pending in this PR is if we want JVMMetrics or not, which are crucial from PQS POV, otherwise not really for a regular client. We can discuss what approach to finalize and then I can commit this PR. Wdyt? |
Agreed. What about just making a new configuration key in QueryServices/QueryServicesOptions for including JVM metrics in client-metrics which defaults to Going to pull this down right now and look at it too out of curiosity. |
Pulling this into |
Just needs to be switched to the |
Thanks @joshelser will update the PR soon. We have immediate requirements for HBase-1.x versions here. That is why I mentioned that at the start of PR itself that it might be tricky to port these patches. You can try it out with 4.x-HBase-1.4 branch, which is where my current PR is based on. Let me know if you think anything else is required here. |
Just used JVisualVM to look at these metrics via the MBeans directly. Worked just fine. I think naming/convention when being collected from a distributed environment needs to be thought about more though. |
} | ||
|
||
public static GlobalMetricRegistriesAdapter getInstance() { | ||
if (INSTANCE == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do you need double checking here?
if (INSTANCE == null) {
synchronized(this) {
if(INSTANCE == null) {
...
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, this method is called from static block in GlobalClientMetrics
, which can happen only once, so we dont have any race conditions here.
I have two options here, add a lock here or move this initialization to static block of this class. What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am moving away from lazy initialization all together. No need of premature optimization here I believe.
|
||
public void snapshotAllMetrics(MetricRegistry metricRegistry, MetricsRecordBuilder builder) { | ||
Map<String, Metric> metrics = metricRegistry.getMetrics(); | ||
Iterator i$ = metrics.entrySet().iterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this? i$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally should not have used it here. I copied the code from a class file. Will update it.
# For right now, all metrics are exported to a phoenix table | ||
*.source.start_mbeans=false | ||
phoenix.source.start_mbeans=true | ||
phoenix.sink.sink0.class=org.apache.phoenix.monitoring.GlobalPhoenixMetricsTestSink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it usually starts with sink 1 . (?) at least ni HBase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just an identifier and can be anything.
Thanks @xcangCRM and @joshelser for the review. I have addressed your review comments and added tag support to differentiate between various types of client when publishing metrics to JMX. |
One final request, @karanmehta93 -- can you set You have my +1 after that. |
@joshelser I don't see an easy way of doing it, since we read the tag from |
@karanmehta93 but in PQS, we're controlling the Configuration. Isn't this as simple as changing |
Yes I want to do that for sure here, and was also planning to file a Jira for that, but seems like I get the change done here. |
Nope, it's me :). I didn't realize that the Factory was creating a new I'm not sure what it takes to write a custom Factory (I think I did it once in an IT), but would be OK if you defer this. After you commit this, please update the website with these necessary changes (e.g. at least tell people that PQS exports metrics). |
Yes, agreed on that one.
Checkout
Sure will file a new Jira for documentation change as well. One last question, Is it fine for me to change the |
I'd just leave this as-is for now. I don't think it's "broken" how it is right now (PQS doesn't try to set anything in the Configuration for the PhoenixDriver to pick up). |
It's just that some project using PQS wants to change configuration via code, It wont be possible since |
Downstream users would instantiate QueryServer directly, not call |
Great! That makes sense. |
edfd58d
to
96df948
Compare
96df948
to
1c82e03
Compare
@joshelser Please review.