Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-3655 Global Phoenix Client Metrics for PQS #315

Merged
merged 1 commit into from Aug 2, 2018

Conversation

karanmehta93
Copy link
Member

@joshelser Please review.

@karanmehta93
Copy link
Member Author

The patch might be tricky to port to other branches. Will also look into that once the initial one gets in.

}

private boolean verifyMetricsFromSinkOnce(Map<String, Long> expectedMetrics) {
synchronized (GlobalPhoenixMetricsTestSink.lock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it were me, I would have used a Queue or something to save off the metrics, but there's nothing wrong with your approach here :)

private boolean verifyMetricsFromSinkOnce(Map<String, Long> expectedMetrics) {
synchronized (GlobalPhoenixMetricsTestSink.lock) {
for (AbstractMetric metric : GlobalPhoenixMetricsTestSink.metrics) {
if (expectedMetrics.containsKey(metric.name())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check doesn't catch the case where you have expectedMetrics but GlobalPhoenixMetricsTestSink.metrics is empty. I'd switch this around to iterate over your expectations, ensuring that they all exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also soon add a test that disables these metrics as well.

I'd switch this around to iterate over your expectations, ensuring that they all exist.

The reason why I didn't do it that way because I didn't wanted to iterate over the it for every expected metric. It doesn't really matter since its a test though. One approach can be to remove each entry from the map and check if the size is 0 at the end. How does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One approach can be to remove each entry from the map and check if the size is 0 at the end. How does that sound?

That would work too!

# For right now, all metrics are exported to a phoenix table
*.source.start_mbeans=false
hbase.source.start_mbeans=true
hbase.sink.sink0.class=org.apache.phoenix.monitoring.GlobalPhoenixMetricsTestSink
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment in the test that this configuration is here? Otherwise, might seem like voodoo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It was hard for me as well since things were not working as expected in the first place. Took a little while to realize this file was the reason :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, perfect :)


private static MetricRegistry createMetricRegistry() {
LOG.info("Creating Metric Registry for Phoenix Global Metrics");
MetricRegistryInfo registryInfo = new MetricRegistryInfo("PHOENIX", "Phoenix Global Metrics",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Global" mean in this context? Is "Phoenix Client Metrics" more reasonable? Maybe "Phoenix,sub=CLIENT" down below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global here refers that it is an aggregation across all clients (or all Phoenix Connections).

Maybe "Phoenix,sub=CLIENT" down below?

Seems reasonable as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global here refers that it is an aggregation across all clients (or all Phoenix Connections).

In my mind, if you show me "Phoenix Client Metrics", I would naturally assume that they are for all clients that have metrics enabled (as opposed to breakouts by individual clients). As such, the word "Global" to me is just filler, but maybe that's just my interpretation of it.

@karanmehta93
Copy link
Member Author

Addressed your comments @joshelser, Please review!

Copy link
Member

@joshelser joshelser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close now, Karan.

You do any testing in a "real" environment? Does this work/feel like you expect it to?

LOG.info("Values from Hadoop Metrics Sink match actual values");
return true;
}
Thread.sleep(1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to:

try {
  Thread.sleep(1000);
} catch (InterruptedException e) {
  Thread.currentThread().interrupt();
   return false;
}

Then, remove the throws InterruptedException.
We should be catching, re-setting the interrupted state and just returning ASAP. Likely, this code never gets invoked in a unit-test context, but good practice.

private boolean verifyMetricsFromSinkOnce(Map<String, Long> expectedMetrics) {
synchronized (GlobalPhoenixMetricsTestSink.lock) {
for (AbstractMetric metric : GlobalPhoenixMetricsTestSink.metrics) {
if (expectedMetrics.containsKey(metric.name())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was an issue with unboxing the value from expectedMetrics.get(metric.name()).

I think it would be cleaner to do:

Long value = expectedMetrics.get(metric.name());
if (value != null) {
  long expectedValue = value.longValue();
  ...

private static GlobalMetricRegistriesAdapter INSTANCE;

private GlobalMetricRegistriesAdapter() {
DefaultMetricsSystem.initialize("HBase");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/HBase/Phoenix/


private GlobalMetricRegistriesAdapter() {
DefaultMetricsSystem.initialize("HBase");
JvmMetrics.initSingleton("HBase", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think collecting JVM metrics from client applications is heavy-handed. I'd suggest you drop this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I added this since this is going to be a part of PQS, where having these metrics would help. Should I cover this with a config parameter then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I keep getting confused over that.

JVM metrics being exported from PQS makes total sense. +1 to that

I am now wondering: should we be exporting Phoenix metrics out of PQS under "Client" or make some specific "sub=PQS" or similar for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep it as client since they represent client in the end, even though it PQS that is instantiating it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you differentiate this "client" (PQS) from other clients?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I didn't you on this one. Could you elaborate?
Also, I realized that stopping JVM metrics from Phoenix Level is not an option here. Since phoenix embeds hbase-client which instantiates JVM Metrics by default. Checkout this stack trace.

init:57, BaseSourceImpl$DefaultMetricsSystemInitializer (org.apache.hadoop.hbase.metrics) <init>:112, BaseSourceImpl (org.apache.hadoop.hbase.metrics) <init>:56, MetricsZooKeeperSourceImpl (org.apache.hadoop.hbase.zookeeper) <init>:51, MetricsZooKeeperSourceImpl (org.apache.hadoop.hbase.zookeeper) newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect) newInstance:62, NativeConstructorAccessorImpl (sun.reflect) newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect) newInstance:423, Constructor (java.lang.reflect) newInstance:442, Class (java.lang) nextService:380, ServiceLoader$LazyIterator (java.util) next:404, ServiceLoader$LazyIterator (java.util) next:480, ServiceLoader$1 (java.util) getInstance:59, CompatibilitySingletonFactory (org.apache.hadoop.hbase) <init>:38, MetricsZooKeeper (org.apache.hadoop.hbase.zookeeper) <init>:130, RecoverableZooKeeper (org.apache.hadoop.hbase.zookeeper) connect:143, ZKUtil (org.apache.hadoop.hbase.zookeeper) <init>:181, ZooKeeperWatcher (org.apache.hadoop.hbase.zookeeper) <init>:155, ZooKeeperWatcher (org.apache.hadoop.hbase.zookeeper) <init>:43, ZooKeeperKeepAliveConnection (org.apache.hadoop.hbase.client) getKeepAliveZooKeeperWatcher:1737, ConnectionManager$HConnectionImplementation (org.apache.hadoop.hbase.client) getClusterId:104, ZooKeeperRegistry (org.apache.hadoop.hbase.client) retrieveClusterId:945, ConnectionManager$HConnectionImplementation (org.apache.hadoop.hbase.client) <init>:721, ConnectionManager$HConnectionImplementation (org.apache.hadoop.hbase.client) newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect) newInstance:62, NativeConstructorAccessorImpl (sun.reflect) newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect) newInstance:423, Constructor (java.lang.reflect) createConnection:238, ConnectionFactory (org.apache.hadoop.hbase.client) createConnection:439, ConnectionManager (org.apache.hadoop.hbase.client) createConnectionInternal:348, ConnectionManager (org.apache.hadoop.hbase.client) createConnection:144, HConnectionManager (org.apache.hadoop.hbase.client) createConnection:47, HConnectionFactory$HConnectionFactoryImpl (org.apache.phoenix.query) openConnection:428, ConnectionQueryServicesImpl (org.apache.phoenix.query) access$400:270, ConnectionQueryServicesImpl (org.apache.phoenix.query) call:2539, ConnectionQueryServicesImpl$12 (org.apache.phoenix.query) call:2515, ConnectionQueryServicesImpl$12 (org.apache.phoenix.query) call:76, PhoenixContextExecutor (org.apache.phoenix.util) init:2515, ConnectionQueryServicesImpl (org.apache.phoenix.query) getConnectionQueryServices:255, PhoenixDriver (org.apache.phoenix.jdbc) createConnection:150, PhoenixEmbeddedDriver (org.apache.phoenix.jdbc) connect:221, PhoenixDriver (org.apache.phoenix.jdbc) getConnection:664, DriverManager (java.sql) getConnection:208, DriverManager (java.sql) createConnection:640, JdbcMeta (org.apache.calcite.avatica.jdbc) openConnection:625, JdbcMeta (org.apache.calcite.avatica.jdbc) apply:285, LocalService (org.apache.calcite.avatica.remote) accept:1770, Service$OpenConnectionRequest (org.apache.calcite.avatica.remote) accept:1750, Service$OpenConnectionRequest (org.apache.calcite.avatica.remote) apply:94, AbstractHandler (org.apache.calcite.avatica.remote) apply:46, ProtobufHandler (org.apache.calcite.avatica.remote) handle:127, AvaticaProtobufHandler (org.apache.calcite.avatica.server) handle:52, HandlerList (org.eclipse.jetty.server.handler) handle:97, HandlerWrapper (org.eclipse.jetty.server.handler) handle:499, Server (org.eclipse.jetty.server) handle:311, HttpChannel (org.eclipse.jetty.server) onFillable:257, HttpConnection (org.eclipse.jetty.server) run:544, AbstractConnection$2 (org.eclipse.jetty.io) runJob:635, QueuedThreadPool (org.eclipse.jetty.util.thread) run:555, QueuedThreadPool$3 (org.eclipse.jetty.util.thread) run:748, Thread (java.lang)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I didn't you on this one. Could you elaborate?

You're using openTSDB to collect all of this data and your use case is to "understand how PQS runs". To do this, you will need to know what hosts that PQS is running on to just look at PQS metrics. I was suggesting that having a unique name (or a tag) would make an operator's life much more simple -- it would be clear (and presumably easier to filter on) to find just PQS metrics.

For example, the difference of issuing a filter "host in pqs_server1, pqs_server2, pqs_server3" as opposed to just "tag=PQS", or similar.

Now that I'm thinking about this, leaving this as {{Phoenix,sub=CLIENT}} may be OK and you can instead just configure the push of metrics data to Hadoop Metrics2 to include a "tag" that indicates this is from PQS. That would make me happy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this correctly, I will include a new property which takes a csv of tags as input. Later on, I can split by comma and add those tags to HBaseMetrics2HadoopMetricsAdapter#snapshotAllMetrics(). @joshelser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on how you expose this to let PQS say "I am PQS" (e.g. whether CSV, list, whatever).

But yeah, modifying snapshotAllMetrics to include a Hadoop Metrics2 MetricsTag should do the trick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks!

@@ -204,6 +230,45 @@ private static void resetGlobalMetrics() {
}
}

// Phoenix Client Metrics are transported via Hadoop-metrics2 sink
// The test sink is defined at GlobalPhoenixMetricsTestSink
// Configuration for Hadoop-metrics2 comes from hadoop-metrics2.properties file located in test/resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

public class GlobalPhoenixMetricsTestSink implements MetricsSink {

public static final String PHOENIX_METRICS_RECORD_NAME = "PHOENIX";
static Object lock = new Object();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment here about the importance of this lock? e.g. that the caller should hold it to read metrics.

@karanmehta93
Copy link
Member Author

@joshelser I was trying out this patch with an internal PQS version. All these metrics are pushed to JMX and we monitor those via OpenTSDB tcollectors. I haven't done any perf-test here, but I can create a "rate of" on the metrics to track the increase in values over time. Still working on some more metrics patches internally before I can see this end to end. From @twdsilva and @apurtell experience, seems like these metrics should help monitor the health and usage of PQS.

@joshelser
Copy link
Member

All these metrics are pushed to JMX and we monitor those via OpenTSDB tcollectors.

Cool! I was just looking for a sanity check that you are seeing all of the metrics you expect coming out of PQS and that they are "useful". No need to perform any perf testing for me.

@karanmehta93
Copy link
Member Author

@joshelser I think the only question pending in this PR is if we want JVMMetrics or not, which are crucial from PQS POV, otherwise not really for a regular client. We can discuss what approach to finalize and then I can commit this PR. Wdyt?

@joshelser
Copy link
Member

I think the only question pending in this PR is if we want JVMMetrics or not, which are crucial from PQS POV, otherwise not really for a regular client.

Agreed. What about just making a new configuration key in QueryServices/QueryServicesOptions for including JVM metrics in client-metrics which defaults to false? Then, have PQS always set this to be true via code.

Going to pull this down right now and look at it too out of curiosity.

@joshelser
Copy link
Member

Pulling this into master, it looks like there are some commons-configuration issues that will need to be worked out for GlobalPhoenixMetricsTestSink.java

@joshelser
Copy link
Member

it looks like there are some commons-configuration issues that will need to be worked out for GlobalPhoenixMetricsTestSink.java

Just needs to be switched to the org.apache.commons.configuration2 package. Also, GlobalPhoenixMetricsTestSink has a lot of unused imports. Would be good to make a pass over your changes for those.

@karanmehta93
Copy link
Member Author

Thanks @joshelser will update the PR soon. We have immediate requirements for HBase-1.x versions here. That is why I mentioned that at the start of PR itself that it might be tricky to port these patches. You can try it out with 4.x-HBase-1.4 branch, which is where my current PR is based on. Let me know if you think anything else is required here.

@joshelser
Copy link
Member

Just used JVisualVM to look at these metrics via the MBeans directly. Worked just fine.

I think naming/convention when being collected from a distributed environment needs to be thought about more though.

}

public static GlobalMetricRegistriesAdapter getInstance() {
if (INSTANCE == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do you need double checking here?
if (INSTANCE == null) {
synchronized(this) {
if(INSTANCE == null) {
...
}
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, this method is called from static block in GlobalClientMetrics, which can happen only once, so we dont have any race conditions here.
I have two options here, add a lock here or move this initialization to static block of this class. What do you suggest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am moving away from lazy initialization all together. No need of premature optimization here I believe.


public void snapshotAllMetrics(MetricRegistry metricRegistry, MetricsRecordBuilder builder) {
Map<String, Metric> metrics = metricRegistry.getMetrics();
Iterator i$ = metrics.entrySet().iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this? i$

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ideally should not have used it here. I copied the code from a class file. Will update it.

# For right now, all metrics are exported to a phoenix table
*.source.start_mbeans=false
phoenix.source.start_mbeans=true
phoenix.sink.sink0.class=org.apache.phoenix.monitoring.GlobalPhoenixMetricsTestSink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it usually starts with sink 1 . (?) at least ni HBase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just an identifier and can be anything.

@karanmehta93
Copy link
Member Author

Thanks @xcangCRM and @joshelser for the review. I have addressed your review comments and added tag support to differentiate between various types of client when publishing metrics to JMX.

@joshelser
Copy link
Member

One final request, @karanmehta93 -- can you set phoenix.client.metrics.tag in QueryServer.java proactively, please?

You have my +1 after that.

@karanmehta93
Copy link
Member Author

@joshelser I don't see an easy way of doing it, since we read the tag from metricTag = QueryServicesOptions.withDefaults().getClientMetricTag();, which internally uses InstanceResolver, which depends on implementations of ConfigurationFactory. Ideally, I need to add META-INF/ to provide that factory and make it work. I don't think that we want to enforce that. If there is any other option, let me know.

@joshelser
Copy link
Member

@karanmehta93 but in PQS, we're controlling the Configuration. Isn't this as simple as changing QueryServer#main(String[]) to use HBaseFactoryProvider.getConfigurationFactory().getConfiguration(); instead of HBaseConfiguration.create()?

@karanmehta93
Copy link
Member Author

Isn't this as simple as changing QueryServer#main(String[]) to use HBaseFactoryProvider.getConfigurationFactory().getConfiguration(); instead of HBaseConfiguration.create()?

Yes I want to do that for sure here, and was also planning to file a Jira for that, but seems like I get the change done here.
You also want me to set the value of phoenix.client.metrics.tag to PQS right? For that to happen, I will need a factory, because no matter what conf we pass to the tool, that one is not propagated to phoenix-core. QueryServicesOptions.withDefaults() always asks the HBaseFactoryProvider.getConfigurationFactory().getConfiguration() for the values. Am I missing something here?

@joshelser
Copy link
Member

Am I missing something here?

Nope, it's me :). I didn't realize that the Factory was creating a new Configuration every time. The setup of GlobalClientMetrics just... doesn't give us any way to do this nicely :(

I'm not sure what it takes to write a custom Factory (I think I did it once in an IT), but would be OK if you defer this.

After you commit this, please update the website with these necessary changes (e.g. at least tell people that PQS exports metrics).

@karanmehta93
Copy link
Member Author

karanmehta93 commented Aug 2, 2018

The setup of GlobalClientMetrics just... doesn't give us any way to do this nicely

Yes, agreed on that one.

I'm not sure what it takes to write a custom Factory (I think I did it once in an IT), but would be OK if you defer this

Checkout PhoenixMetricsDisabledIT in this patch. I am doing the same. I will defer it for now. Thanks

After you commit this, please update the website with these necessary changes (e.g. at least tell people that PQS exports metrics).

Sure will file a new Jira for documentation change as well.

One last question, Is it fine for me to change the QueryServer#main() to use HBaseFactoryProvider.getConfigurationFactory().getConfiguration() instead of the regular one. We can control the config from other projects, if required. What do you think?

@joshelser
Copy link
Member

Is it fine for me to change

I'd just leave this as-is for now. I don't think it's "broken" how it is right now (PQS doesn't try to set anything in the Configuration for the PhoenixDriver to pick up).

@karanmehta93
Copy link
Member Author

I'd just leave this as-is for now. I don't think it's "broken" how it is right now (PQS doesn't try to set anything in the Configuration for the PhoenixDriver to pick up).

It's just that some project using PQS wants to change configuration via code, It wont be possible since HBaseConfiguration.create() will only read XML on the class path. Either way, I am fine. Let me know your thoughts.

@joshelser
Copy link
Member

It's just that some project using PQS wants to change configuration via code, It wont be possible since HBaseConfiguration.create() will only read XML on the class path

Downstream users would instantiate QueryServer directly, not call main(). Still, the bigger issue is the ConfigurationFactory being ineffective.

@karanmehta93
Copy link
Member Author

Great! That makes sense.
Thanks for the review @joshelser . I will be (squashing and) pushing this to 4.x-HBase-1.4 branch and will back port / forward port this patch as well.

@asfgit asfgit merged commit 1c82e03 into apache:4.x-HBase-1.4 Aug 2, 2018
@karanmehta93 karanmehta93 deleted the 4.x-HBase-1.4 branch August 3, 2018 17:05
@karanmehta93 karanmehta93 restored the 4.x-HBase-1.4 branch August 3, 2018 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants