-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dashboard + Turbine for multiple services on the same server #117
Comments
Hi @pparth Turbine supports monitoring multiple clusters of servers at the same time. You can see some information about this here: https://github.com/Netflix/Turbine/wiki/Configuration (@opuneet Can each cluster be given a different port, or is that global?) A Hystrix/Turbine dashboard represents the metrics for a "cluster" as defined in Turbine and Netflix generally defines that to be a cluster of servers with a single application on it. For example, most of the screenshots you see are from the Netflix API, a single application deployed across hundreds of servers that uses 100+ backend services served by dozens of backend applications each with their own cluster of servers. The Hystrix dashboard shows the HystrixCommand metrics of the Netflix API interacting with all of those services. Each of the other backend clusters of servers have their own Hystrix dashboard and Turbine monitor. However, what you define as a "cluster" is completely up to you in Turbine. We just happen to do it by application, as each application is a logical place to monitor and separate metrics for us. If you want all servers from different applications to all merge together and be presented on a single dashboard you can configure Turbine to do that for you. It's just what instances you tell Turbine to monitor. If the same HystrixCommand is used in different applications, the metrics for them will be merged together by Turbine if you have a single 'cluster' configured in Turbine that pulls from multiple applications. This may make sense for your use case - it doesn't for us. For example, different applications have different configurations of HystrixCommands (timeouts, concurrency limits, etc) - even if the HystrixCommand itself is the same used by 2 applications, the config and metrics are different. As for multiple JVM processes on the same machine - obviously they can't all stream over the same port, so you'll need to configure different ports for them and then configure Turbine to monitor the different ports. Perhaps each port represents a "cluster" for you in Turbine. In short ... Turbine just aggregates whatever metrics it gets from the streams you point it at. It doesn't matter if they are different applications/services/machines etc, it's just data for Turbine. It is you who defines the logical boundaries by what you call a "cluster" in Turbine. |
Turbine can work with multiple clusters and each cluster can specify it's own configuration when connecting to a host within that cluster. But I think that the "logical" cluster trick may not work here. Turbine maintains state of all these instances in order to maintain persistent connections to them and it does rely on the "hostname" and if the host name is the same then it won't instantiate a new connection to that same server (on a different port). This was done intentionally so as to ensure that Turbine does not open multiple connections to our servers in production and hence be less intrusive to our prod servers. |
Hi @pparth There may be something else that we could do here, but we'd have to write code and plug in a new aggregator. All the aggregators use a global singleton called MonitorConsole to track these connections and there is one connection per host / instance. The new aggregator implementation would inherit all the basic aggregation logic from the AggregateClusterMonitor class, but would maintain it's own MonitorConsole object hence connections from one Aggregator to individual hosts would not step on the other connections (to the same hosts maintained from within another aggregator). Each aggregator instance would map to a logical cluster, and you would have 5 clusters mapping to your 5 services. Make sense? But then the down side to this approach would be that you would see the metrics for a cluster alone since that is how one connects to the Turbine aggregated output stream, and hence you would not be able to see all the metrics from all 5 logical services in one Hystrix dash. But the Hystrix dash could still be used to connect to all 5 Turbine clusters separately. |
Hello guys, Ben, here at Odesk, we have essentially the same architecture: we have a single application called "O2 API" that consists of a number of services that may well be up to a few dozens in full deployment. These services call each other through Hystrix-wrapped connections. We may end up, eventually, deploying each service to its own host, but i really think that the most logical approach for architects to take in the early stages of the implementation, is to deploy all the services of the application on the same host and have a cluster of 3+ hosts with a load balancer in front. Problem is that Turbine does not seem to go well with this configuration. I already tried the logical trick but it has major problems:
So, i think that the major problem is that the Dashboard, alone or with Turbine, does not support the consolidation of Hystrix commands coming from 2+ services listening on different ports on the same host. This requirement is much more critical than the aggregation of data from different hosts, which is something to be checked on the next stage. I can't tell if this requirement is to be supported by the Dashboard itself or with the help of Turbine. But it's definitely a must for any small to medium sized deployment. |
Hello guys, @benjchristensen and @opuneet Are you going to consider the aforementioned requirement for the Dashboard to consolidate commands from multiple services listening on different ports on the same host? Or do you think this is out of scope and we have to stick with the single service per host architecture? Is there a possible roadmap if you decide otherwise? Thank you for your time! |
Hey @pparth, sorry for not getting back to you earlier. Even if I make changes you one to be able to get multiple the aggregated Turbine streams for multiple clusters in the same connection (a single connection is needed by the Hystrix dash) you could still have the same Hystrix commands defined in different clusters and hence multiple data events for these commands will stomp on each other in the Hystrix dash. There may be a way to solve this by indicating the cluster name along with each data event that is streamed out from Turbine, but this would also then involve major changes to the Hystrix dash to indicate what Hystrix command is for which cluster. We didn't have this kind of a use case when we were designing these 2 components, hence this may take some more thought. I've opened an issue for Turbine here Netflix/Turbine#9 to track the logical cluster problem. Thanks, |
Hello @opuneet , Just my thoughts of course, you know better! |
Hi @pparth I think I understand what you are saying, but I want to clarify the meaning of some terms used here just to ensure that we are all on the same page w.r.t your use case / problem definition. I'm restating a few facts here so that there is no confusion on this thread. Apologies in advance for the long winded email, but I think it's necessary since this thread is getting pretty confusing How Turbine works and what do we mean by 'cluster'Cluster is a Turbine concept and not a Hystrix dash concept. Turbine defines 'cluster' as a group of logical hosts, and runs an individual aggregator for each defined cluster. All data emanating from hosts for the same cluster is aggregated together by the agg which then produces a single aggregate output stream. Hence all Hystrix cmd metrics that have the same aggregation key ( basically name) within the same cluster all get aggregated together into a single metric. This is the value add of using Turbine - you have multiple hosts such as an AWS ASG, and you need to aggregate the same data from all your ec2 instances within the same ASG to get an ASG level view of your Hystrix metrics. Your cluster here is an ASG, but you could also use a group of ASGs or some arbitrary list of instances within the same cluster. Hence the term logical cluster If 2 or more Hystrix cmd instances with the same name should not be aggregated together, then they cannot be in the same cluster, since that is essentially how Turbine works. Cluster is the scoping mechanism for aggregation. What is the limitation in Turbine here w.r.t your use case.Turbine makes a brittle assumption that each host / instance belongs to exactly one cluster, hence cannot connect to the same host on 5 different ports which represent 5 different services. This is a limitation and can be fixed in the near future, but Turbine will still assume that each service on a different port is logically separate and hence should be part of a different cluster. In any case, looks like you don't want the same Hystrix cmd metrics from different services to be aggregated together and hence the different services here map to different Turbine clusters. So using cluster to map to your services gives you a natural isolation boundary for your metrics How does Hystrix dash workThe Hystrix dash needs a connection url and simply gets data / metrics from this connection and displays that data using it's javascript code which essentially reacts to the HTML 5 spec compliant events coming over the connection stream (possibly to Turbine or even an individual host). It treats each event as a self describing Hystrix command metrics instance. So the overall summary here is ...
You use case.
Do you really need Turbine here?Well if you have one physical host with 5 different services, then no you do not. But if you scale your 5 services horizontally in the future you will need something to combine metrics from different hosts together, regardless of the isolation boundaries. This is what Turbine was designed for. What could we do to achieve your use caseLooks like no matter what we do for Turbine, we still need to do some work on the Hystrix dash to achieve the union with isolation boundaries feature that you want. The Hystrix dash will have to be aware of these isolation boundaries so that metrics do not stomp on each other. I'm not going to make implementation suggestions this early and I think that doing this is non-trivial in the near future. @benjchristensen and I can discuss this and get back to you. |
Hello @opuneet and thank you for your effort! Your post was very helpful. Your description about the way Turbine works was really enlightening. 1). If you don't change the signature of the data stream, then the only way for the Dashboard to isolate the commands is to add an arbitrary prefix or suffix to the command name, which essentially is not helpful at all. 2). Again, without changing the data stream signature, a useful option would be for us to somehow augment the command name with the service name where is hosted. So, i could name the command "servicename_commandName" and then you could have the isolation boundary you request. A Dashboard user could then just sort by name and have a decent view of the commands. Its not perfect, but it will do. So, if you think that this solution will help you in the short run, and help you provide a version of an implementation quicker, please let me know. I am pretty sure that we can do it. 3). Best solution on the long run, would be to change the data stream signature in order to contain the service (host app) name. The Dashboard should be aware of this new piece of data. Firstly, it can define the isolation boundary requested and secondly it can provide the user with an opportunity to have a detail view per service. This is very helpful when a large number of services are deployed and the 2) solution is getting out of hand. Now, how can the Hystrix world know about the service name? I think that a nice, Netflix-integrated solution would be to automatically get the name from the ApplicationId attribute of the DeploymentContext defined on the static ConfigurationManager of Archaius. This way, there is no need for other, custom implementations. Sorry, if i went too far. I'm just really excited about the use of these technologies. Hope i've been helpful. |
Hey @pparth Yes, I see that 3. makes sense. We'll discuss and see if and when we can make these changes. Meanwhile 2. can work for you and I've just released changes to Turbine to enable connecting to the same host for a different service. Please note that I've tested using my unit tests but haven't really had a chance to test this using a real server since we don't usually run services at Netflix using that mechanism. |
Nice! I'll do my code changes in order to augment my command names with the service name, while waiting for the release to reach Maven Central. So to clear this out, a working configuration for, say, 2 services will be as follows?
|
Yes I think that this will work. |
Ok, i tested the new version and seems to work ok. So, what do we have now? A cluster view of Turbine that is exactly the same with the relevant Dashboard view when connecting to the same service port == cluster port. So, in the aforementioned example, the Dashboard direct connection to http://localhost:8052/hystrix.stream service, is the same to the Dashboard, through-Turbine, connection to http://turbine-hostname:port/turbine.stream?cluster=connectionsCS. I'm looking forward to the next steps... |
Hello, |
Is this still worth keeping open? |
Hi @pparth JI'm going to close this issue coz there's been no activity on this for a long time. Please reopen if necessary. Note that there was a related issue here with Turbine Netflix/Turbine#9 To summarize this really long thread, we basically needed a way to confluence metrics from multiple turbine clusters and also represent multiple logical clusters on a single physical group of servers. In the related Turbine issue, I've patched Turbine so that one can run multiple apps on the same physical h/w and then represent each distinct app within a distinct turbine cluster and hence get agg metrics for the entire cluster. The Hystrix dash will be able to connect to each of these streams and give you the cluster level metrics for each app. |
@opuneet I am facing a similar problem and looking up the docs but I haven't found anything about this matter. Our architecture consists of 2 apps, deployed in a cluster of 2 servers:
We would like to have a unified dashboard showing the info of this two services but can't manage to configure it. So far we have made a wild guess with something like this: turbine.aggregator.clusterConfig=myCluster turbine.ConfigPropertyBasedDiscovery.myCluster.instances=server1,server2 Thanks in advance EDIT: With the config above, we only get updates from app2. |
@codependent Did you ever find a solution to your issue with 2 servers? I'm having pretty much the same issue and can't seem to figure out what the problem is. Thanks! |
@seh13 Yes, it was pretty straightforward. I don't have access to the code right now, I'll post it here tomorrow. |
Hi @seh13, these are the relevant parts of my cfg:
You can keep adding domains+app to Hope it helps |
@codependent Thanks a lot! I'll give that a try Update: Worked perfectly, thanks again! |
Hello again Ben,
I used the Dashboard to monitor a single service successfully. I tried to use Turbine in order to aggregate Hystrix metrics from multiple services residing on the same server, but this seems to be impossible.
So, let me clear the things a little bit:
The architecture of the relevant Netflix implementation, which is reflected in the Dashboard images presented all over the Dashboard and Turbine Wiki, essentially implies that each service resides on its own server. In order to aggregate Hystrix metrics from all services (each defining its own set of Hystrix commands), all services should generate the metrics stream on the same port (e.g. 8080) which is configured in Turbine. The question is: can Turbine aggregate different sets of Hystrix commands metrics this way?
So, if i have 5 different services, each with different sets of Hystrix commands, located in 5 different servers, then can i have a Dashboard where i could see the union of all Hystrix commands, using Turbine to define a Default cluster of these instances?
What happens when the same Hystrix Command is defined in different hosts?
The text was updated successfully, but these errors were encountered: