Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect Ecto stacktraces #885

Open
gmile opened this issue Sep 6, 2023 · 9 comments
Open

Collect Ecto stacktraces #885

gmile opened this issue Sep 6, 2023 · 9 comments
Labels

Comments

@gmile
Copy link

gmile commented Sep 6, 2023

It's nice to see slow queries coming from Ecto in the dashboard. However, oftentimes it's really hard to match the query to the code that emits it. For example, I'm looking at a vast codebase that spans over a 1000 of modules, and it's really challenging sometimes to locate Elixir/Ecto code that produces slow query.

For any given query reported in AppSignal, it would be great to be able to trace it back to the code that produced it.

Ecto reports stacktraces in telemetry events for a while now, see: elixir-ecto/ecto#3798. It would be really useful to have that in AppSignal.

There's also a discussion about this topic on ElixirForum atm: https://elixirforum.com/t/tracking-down-slow-queries-in-ecto/58121

@gmile gmile added the chore label Sep 6, 2023
@crbelaus
Copy link

crbelaus commented Sep 8, 2023

This would be a very welcome improvement. I am also in the process of fixing some slow queries in my application (also very big) and sometimes it is really hard to correlate the query with the code that triggers it.

@jeffkreeftmeijer
Copy link
Member

Hey Ievgen and Cristian,

I do agree that it might be useful to know where queries are executed, but we're currently not set up for that. Right now, events don't support stack traces unless they have an error attached to them.

Also, I'm not sure what the performance impact of reporting stack traces for every query would be. Not specifically on AppSignal's side, as adding data to the span should be fairly quick, but calculating the stack trace might have an impact on high-throughput applications, so I'm not sure if I'd recommend this for production use.

I'll keep this issue open to allow for some discussion about this and to maybe pick this up in the future, but it's not something we'll be picking up in the coming period, I'm afraid.

@crbelaus
Copy link

While fixing slow queries I've found two different situations:

  • Slow queries that are directly executed in a Phoenix request or an Oban job are correctly linked to the code that runs them. If there is an error, it show in the "web" or "oban" namespace accordingly.
  • Other queries seem not to be linked to any part of our codebase so there is no way to find the offending code other than searching for the query manually. In this case errors are not linked to the code either and show in the "background" namespace.

I think that the second kind of query is the problematic one. When there are errors I always see the following in AppSignal's stacktrace:

lib/ecto/adapters/sql.ex:913 Ecto.Adapters.SQL.raise_sql_call_error/1
lib/ecto/adapters/sql.ex:828 Ecto.Adapters.SQL.execute/6
lib/ecto/repo/queryable.ex:229 Ecto.Repo.Queryable.execute/4
lib/ecto/repo/queryable.ex:19 Ecto.Repo.Queryable.all/3
lib/ecto/repo/preloader.ex:272 Ecto.Repo.Preloader.fetch_query/8
lib/task/supervised.ex:101 Task.Supervised.invoke_mfa/2
lib/task/supervised.ex:36 Task.Supervised.reply/4

The Ecto.Repo.Preloader is always present so this makes me think that those queries may be the result of a parallel preload, which is not currently being linked to the parent span, so AppSignal cannot correlate it with the proper action.

This has been reported on #603 and I think that it may be the same issue. In #603 there is a link to an ElixirForum post with a potential fix to link parallel preload queries to the parent span but I did not try it yet myself.

@jeffkreeftmeijer jeffkreeftmeijer removed their assignment Oct 18, 2023
@unflxw
Copy link
Contributor

unflxw commented Nov 20, 2023

@crbelaus @gmile We're considering supporting this feature by adding the last line of the backtrace (the one further from Ecto's internals) in the event. Here's a quick mockup with placeholder text:

Image

You would be able to click on a "git" icon next to it and see the corresponding source code in GitHub. Would this be useful to you?

@gmile
Copy link
Author

gmile commented Nov 20, 2023

Hi, @unflxw! Happy to hear AppSignal is considering implementing this feature. Noted that you're considering to report a location line emitted right before Ecto's internals.

Regarding your screenshot, would lib/task/supervised.ex:36 be the location where the query originates from? Or is this just happens to be a non-ideal example?

My point is, the chosen line of backtrace should be pointing to a location in application code, and not from Ecto nor from Elixir internals. But on your screenshot it appears to be coming from Elixir internals, as suggested by lib/task/supervised.ex:36, which is less than ideal I think 🤔

@unflxw
Copy link
Contributor

unflxw commented Nov 20, 2023

Hi @gmile! My apologies, it is placeholder text -- I quickly copy-pasted a line of an Elixir backtrace from your comment above.

Ideally we'd pick the line in the backtrace that is furthest from Ecto internals, as that line is the most likely one to be close to the developer's code, and as such meaningful to the developer. Going from the sample logs in elixir-ecto/ecto#3798 (comment), that would be EctoBug.reproduce/1 :)

@gmile
Copy link
Author

gmile commented Nov 20, 2023

@unflxw noted that it's a sample glitch :) In this case the answer to your question is yes, this certainly would be very useful!

@crbelaus
Copy link

I also agree with @gmile, it would be extremely useful!

@unflxw unflxw self-assigned this Dec 14, 2023
@unflxw
Copy link
Contributor

unflxw commented Dec 14, 2023

Since this issue is seen as a workaround for #603, I have decided to first approach fixing #603 in isolation, then look into implementing this.

While it is true that having stacktrace information would work around the loss of context on the query caused by #603, the two are fundamentally different issues.

@unflxw unflxw removed their assignment May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants