Observability Roadmap #6560

samstokes · 2026-03-31T22:27:50Z

samstokes
Mar 31, 2026
Maintainer

Daft should offer an excellent experience for developing a job locally, deploying it at production scale, and maintaining it as requirements and data evolve. Observability is key to that experience, ensuring that:

as you develop locally, you can see what your query is doing
as you deploy and scale, you can monitor your workloads
when problems arise, you can diagnose what went wrong and learn how to fix it

We are planning the following improvements:

1) The Daft Dashboard

Upgrade the dashboard to better explain query execution, from local development through production:

Fully support distributed execution and show parallelism
Make the dashboard available by default, by self-hosting and auto-starting the dashboard process
Show cluster health and utilization metrics

2) Query debugging experience

Any crash or slow query should be diagnosable without needing to rerun the query to reproduce it:

Give each query a structured event log showing operator and task activity for post-hoc analysis
Improve error messages and stack traces to make errors easier to pinpoint
Document debugging best practices for humans and AI agents

3) Memory observability

When a job crashes due to out-of-memory, Daft should tell you what used the memory and what to change.

Track peak memory usage per operator for built-in operators (e.g. accumulating operators, image operations)
Report per-process memory and CPU metrics, available via OTel and the event log
On-demand heap profiling without restart

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Observability Roadmap #6560

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Observability Roadmap #6560

Uh oh!

samstokes Mar 31, 2026 Maintainer

1) The Daft Dashboard

2) Query debugging experience

3) Memory observability

Replies: 0 comments

samstokes
Mar 31, 2026
Maintainer