Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing UI displays resource name for outgoing resources #1040

Merged
merged 5 commits into from Nov 28, 2023

Conversation

JamesNK
Copy link
Member

@JamesNK JamesNK commented Nov 27, 2023

Fixes #1052

I had an idea to improve tracing.

Before:
image

After:
image

Background:
Distributed tracing commonly has a client recording it is making a call, and the peer recording receiving it. However, sometimes the peer doesn't log anything to OTEL. For example, the redis client records that it calls redis, but redis itself doesn't send data to OTEL. Internally this is called an outgoing peer. When it happens, the tracing UI shows an arrow and the peer address (e.g. redis address).

Aspire has a unique opportunity. It combines defining resources, starting resources, and displaying telemetry for resources. Aspire knows the relationship between addresses and resources. That allows the tracing UI to use the outgoing peer address to lookup the resource name.

For example, there is an outgoing peer call to address localhost:69322. The redis container has a service with that address. The tracing UI links the address and the service together and therefore displays redis.

This PR adds some coupling between telemetry and hosting. I've placed all logic inside IOutgoingPeerResolver. A dummy resolver that just returns the address back can be used by the UI if hosting isn't available.

Note: The IP address continues to be available in the detail view:
image

_resourceNameMapping[result.Name] = result;
}

_watchTask = Task.Run(async () =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the IDashboardViewModelService is already caching the changes (at least for the lifetime of that object, and we need to resolve your other comment on that matter first I suppose), I'm wondering if we need this whole extra layer of caching on top of it. I guess the call to GetResourceMonitor (inside GetResources) is too expensive to do repeatedly. Just seems like a bit much for what amounts to "iterate the current snapshot". I wonder if we could expose the current snapshot better somehow instead? Just spitballing. Nothing wrong with this class as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied what the console logs page does.

I think DashboardViewModelService is very complex. A method to subscribe to resource updates and a method to get the resources would be less efficient (does it matter?) but much simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the console logs needs the watcher part to be able to update the list as it changes. But theoretically here you only need the current snapshot at the time of a lookup. I think we can tweak the view model service api to make it simpler. We've added/removed various methods of accessing as we've needed them/not needed them anymore. At minimum I think we could add a call to get the current snapshot. I guess for your use case though, when would you call that? On every lookup? It could theoretically get stale otherwise... but would that matter in practice (especially since this is not the page you hit when the dashboard loads)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for your use case though, when would you call that? On every lookup? It could theoretically get stale otherwise... but would that matter in practice (especially since this is not the page you hit when the dashboard loads)?

That's why there is a callback to tell you data has changed and a method to get the data. That's the pattern telemetry uses.

Debounce logic to throttle callbacks could be used if there are many changes. e.g. wait 100ms between callback notifications. (I haven't done that in telemetry yet, but I plan to)

@tlmii
Copy link
Member

tlmii commented Nov 27, 2023

This is a great idea. I think it'll help with @leslierichardson95's concerns about needing a legend for what the symbols mean too (#698). I'd say lgtm overall, but we need to sort out the service lifetime question first I think.

@kvenkatrajan kvenkatrajan added the enhancement An enhancement to an existing feature or capability. label Nov 27, 2023
@JamesNK JamesNK added this to the preview 2 (Dec) milestone Nov 27, 2023
@JamesNK JamesNK removed this from the preview 2 (Dec) milestone Nov 27, 2023
Copy link
Member

@tlmii tlmii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I played with it a bit and I don't think the flash-of-unresolved-peer is that bad, especially since TraceDetail isn't the main page so it'll really only happen if you refresh the page (or open in another tab). Navigating around it doesn't manifest.

One thing that this does bring up is if we can do anything to make it more clear why something isn't resolved. E.g.:

image

Not saying we should hold this PR. But I think when I see this I immediately wonder "wait, what is that and why does the dashboard not know what it is?"

@DamianEdwards
Copy link
Member

One thing that this does bring up is if we can do anything to make it more clear why something isn't resolved. E.g.:

It would be AMAZING if we could auto-resolve the injected VS endpoints (I think that's what that is). Do they inject already as an environment variable that we could rely on? If not, let's talk to them about a way to enable this.

@JamesNK JamesNK merged commit f566139 into main Nov 28, 2023
3 checks passed
@JamesNK JamesNK deleted the jamesnk/outgoingpeer-resolver branch November 28, 2023 00:12
@JamesNK
Copy link
Member Author

JamesNK commented Nov 28, 2023

Good idea. New issue: #1056

@@ -173,8 +173,7 @@ private async Task ProcessContainerChange(WatchEventType watchEventType, Contain

var objectChangeType = ToObjectChangeType(watchEventType);

var containerViewModel = ConvertToContainerViewModel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these calls changed? Static methods were intentional. There is no concurrency control in those methods.

Copy link
Member Author

@JamesNK JamesNK Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to access another instance field (_resourceAssociatedServicesMap). The static methods already had a high number of parameters, and they always used the same values:

  • _applicationModel
  • _servicesMap.Values
  • _endpointsMap.Values
  • _resourceAssociatedServicesMap

These weren't being copied before they're passed to the static methods. Why was it better that they were static?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class access fields in background thread without any concurrency checks. Static methods just reduced the surface area where the fields are accessed to very few places where it wouldn't be concurrent. By adding more methods which can access instance field, we increased the surface area where concurrency issue can arise. I will think something about solving both - concurrency and not having to pass a lot of parameters.

@davidfowl
Copy link
Member

I like the outcome of this change but I worry this won't be doable outside of development scenarios. We want to think through how this will work when the dashboard is in deployed environments as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-dashboard enhancement An enhancement to an existing feature or capability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve tracing UI to display aspire resource name for outgoing calls
7 participants