-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small microsecond spans after most $in mongo queries and mislabeling of getMore operations #719
Comments
The way the integration works right now is that we instrument any command going through the driver. This includes queries but also administrative command or any of the available commands. We do not alter how these are captured and displayed specifically to give you more visibility, but I understand that for your use case it's not necessarily the best approach. It's unlikely this will change for the Do you think this would help for your use case? |
I would argue that currently the spans are not displaying exactly what the driver is doing. There is no way to differentiate between which spans are a round trip to the database, and which are not (remember, my suspicion is that the microsecond duration spans do not result in a network hop), and there is no differentiation between the initial find and the getMore paging through the cursor. I really want to use the APM information to know what cursors are paged through to find opportunities to reduce the total number of network hops by increasing You're right that the |
Makes sense. I'll have to look into it to see how we could restructure things to isolate actual calls to Mongo. |
It likely represents the
This actually happens because they all share the same resource name, which will no longer be the case with the new logic, so the stats will be properly aggregated with this additional dimension to separate
The problem with this approach that I didn't initially consider is that a lot of collection methods return a cursor, and it's also possible to instantiate cursors directly. Cursors don't have a clear beginning and end since they can be reinitialized, killed, rewinded, or just garbage collected when unused which would keep the server cursor open forever unless there is some sort of timeout. This means it's basically not possible to do this in a way that will not cause other issues. |
Fixed in #1159. Each span now correctly corresponds to an actual command sent to Mongo with more relevant metadata. |
Describe the bug
This is potentially a bug, but more of a question as to why dd-trace is behaving the way that it is behaving.
Summary:
Details
We noticed that when a large number of documents is returned, the APM spans produced by dd-trace produce multiple different spans, one for the initial query, and then new spans for each of the getMore operations to fetch the remaining documents:
This find returns > 101 documents, so there is at least one getMore request involved in retrieving all of the results. The APM spans were useful for noticing this, but it would be nice if the getMore was actually labeled as a getMore instead of a find.
We thought the third microsecond duration span was odd, but hoped it would go away when we used a large enough batchSize to eliminate the subsequent getMore operations:
You'll notice that the microsecond duration span is still there. What does it actually represent? IMO it should be lumped together with whatever operation it is actually associated with, either the find, or subsequent getMore operations.
Currently both of these factors greatly skew the APM data and make it difficult to use them to track performance. For the example query provided above, we currently see a tri-modal distribution, where the first peak represents the microsecond response time spans, and the other two peaks represent the find and the getMore operation (not necessarily in that order):
Environment
The text was updated successfully, but these errors were encountered: