Skip to content

H-4708: error-stack: Extract owned types from Report #7324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

zakstucke
Copy link

@zakstucke zakstucke commented Jun 3, 2025

🌟 What is the purpose of this PR?

Derived from #7274

Some error types are not Clone, or are expensive to clone, but need to be owned to handle them. Not being able to extract/take owned types from a Report has been one of the trickiest aspects of a migration to ES I've experienced.

The current implementation just replaces the types internally with an internal type that will have the same printable format. Methods return an untyped Report<dyn Error> because this means C: Unsized and methods such as current_context or further calls to pop_current_context cannot be used, downcast etc must be used instead as type safety on C has been lost.

downcast_take<T>() must also remove C from Report<C> and return an untyped (Report<dyn Error>), because T may equal C and act like pop_current_context().

I'm not tied to the implementation, but the api hole is:
There needs to be a way to pull owned types out of an existing report, for both the top level C and arbitrary nested contexts + attachments, without consuming the report, or affecting the printable representation.

This implementation seemed to solve that goal with the most concise change.

// These types CANNOT be called on Report<dyn Error>
impl<C> Report<C> {
    pub fn pop_current_context(self) -> (C, Report<dyn Error>)
        where
            C: Send + Sync + 'static

    pub fn into_current_context(self) -> C
        where
            C: Send + Sync + 'static
}

// These types CAN be called on Report<dyn Error>
impl<C: ?Sized> Report<C> {
    pub fn downcast<T: Send + Sync + 'static>(self) -> Result<T, Self>

    pub fn downcast_take<T: Send + Sync + 'static>(self) -> Result<(T, Report<dyn Error>), Self>
}

@github-actions github-actions bot added area/libs > error-stack Affects the `error-stack` crate (library) area/libs Relates to first-party libraries/crates/packages (area) area/tests New or updated tests labels Jun 3, 2025
@zakstucke
Copy link
Author

zakstucke commented Jun 3, 2025

@indietyp I'll reply to your comments relating to this over here:

Could you elaborate on this, maybe even provide code samples - if available (or even pseudocode or a concrete example) - at least I've used the current API for many different projects and haven't found need for this supposed API hole, so would we very interested in hearing the use case! 😄

The times this has been a noticeable issue are usually some variation of this:

use error_stack::Report;

mod foo {
    use super::*;

    #[derive(Debug)]
    pub struct MyError {
        private: (),
    }

    impl core::fmt::Display for MyError {
        fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
            write!(f, "MyError occurred")
        }
    }

    impl core::error::Error for MyError {}

    pub fn foo() -> Report<MyError> {
        Report::new(MyError { private: () })
    }
}

// Note no Report<> wrapping here
fn bar() -> foo::MyError {
    let mut report = foo::foo();
    // - The foo module cannot be modified
    // - MyError cannot be instanstiated
    // - MyError does not have a default
    // - MyError must be returned unwrapped and this API cannot change
    todo!()
}

Existing systems that cannot have interfaces changed and limited control over the error type itself.
downcast_mut and mem::take don't work because one cannot construct a new MyError, further it would alter the representation of the formatted Report.
This example applies to both top level typesafe context in some cases, the type safety would ideally be kept (pop_current_context), and nested contexts and attachments (downcast_take)

We cannot just simply replace the Context with an attachment, it completely changes the semantics of a report and violates many of the internal variants. Each attachment is attached to a context, by replacing it with another context you just "joink" the rug and attach attachments to an unrelated context that might not even find these useful. What about the types it provides via the provide api?

The current implementation just replaces one dyn core::error::Error with a private internal type in regards to FrameKind::Context that maintains the printable representation, which is also a valid Error/Context. I thought it would be reasonable to assume that any expectations of working with a "taken" context or attachment, other than for printing, would be low, likewise with the provider api on that specific dyn FrameImpl, although I don't know too much about that RFC! By replacing dyn core::error::Error directly, I was able to leave the frame structure untouched and minimally rewrite code for this PoC.

I have no special love for this PR's implementation! This was just an initial attempt at getting those new tests written without touching the frame structure. I'd just love something that makes an external API like this work :).

@zakstucke
Copy link
Author

zakstucke commented Jun 3, 2025

Every context must have a single context as the first layer, both formatting and internals rely on it. It's a context stacked on top of other contexts with attachments sandwiched in between

As I said, afaik this PR specifically avoids changing any structure, or am I missing something?

As @TimDiekmann eluded to there are some questions about Report<[C]> types, how do they fit into the equation - when I replace a report, does that mean that one arm now doesn't have a report? What about the introduced asymmetry? Which arm is chosen?

They don't act any differently, a downcast_take or a pop_current_context return a Report<dyn Error>, further types can be extracted with downcast_take, but the current context type safety is lost because this is a destructive operation. The internal structure of the Report has not changed, just one of the dyn Error's inside has been replaced with a printable dummy instead, or one of the attachments if this matched an attachment instead.

Edit: actually, [C] is ?Sized so it would have it's own pop_current_contexts() -> (Vec<C>, Report<dyn Error>) implementation in its own impl block, like the current current_contexts method is defined. pop_current_context cannot be called on Report<[C]>.

@TimDiekmann
Copy link
Member

Thanks for the detailed example with foo and bar – that made the use case very tangible. I’ve encountered similar situations myself, especially when integrating with external APIs that don’t offer meaningful support for error creation or modification. In those cases, using error-stack by attaching the original error as a context is currently the most straightforward approach.

There are, however, alternative ways to achieve similar outcomes – for example, by using the provide API. This allows you to attach data like Option<MyError> and later extract it via request_value. While it’s significantly more cumbersome and less ergonomic, it accomplishes a similar goal. My hunch is that provide and request_value aren’t on your radar yet – which is totally reasonable, given how under-the-surface they still are.

You also emphasized that your main concern is having some API to extract context-like data, and that the exact mechanism is less important to you. I understand and sympathize with that perspective. But that brings us to what I’d call the “plot twist” in this discussion: @indietyp's critique.

He’s absolutely right to point out that this change fundamentally alters what a Report is and what guarantees it provides. That’s not a small detail. One of the core expectations of Report is that once you’ve built it, its shape and meaning remain predictable. Introducing context replacement – especially in a way that isn’t constrained to concrete types or that breaks down when dealing with Report<[C]>s – violates that expectation. It creates the risk of users relying on something that subtly mutates or reinterprets state post-construction, which undermines the mental model we’ve tried to establish.

I still think the general direction of enabling richer introspection or mutation makes sense. In fact, I’ve wanted something like this myself in the past (and ended up working around it using Option attachments). But this iteration isn’t quite the right vehicle yet. It needs sharper boundaries – for instance, only allowing context replacement on Report<C>, not variadic contexts, and doing so in a way that’s explicit and type-safe.

This connects to a larger point: our internal model of Report has grown organically and is now showing some strain. A rework that treats context and attachments more cohesively might make this kind of feature much more natural. Until then, though, I think we should be cautious about shipping an API that weakens the current invariants for short-term ergonomic wins.

@zakstucke
Copy link
Author

zakstucke commented Jun 3, 2025

Thanks @TimDiekmann! Unfortunately the provider api (afaik?!) is nightly only, which makes it a no go for me.

One of the core expectations of Report is that once you’ve built it, its shape and meaning remain predictable. Introducing context replacement – especially in a way that isn’t constrained to concrete types or that breaks down when dealing with Report<[C]>s – violates that expectation. It creates the risk of users relying on something that subtly mutates or reinterprets state post-construction, which undermines the mental model we’ve tried to establish.

I am still somewhat missing exactly what safety guarantees have changed. At a basic level calling .._take or pop_.. with mut self is clearly destructive, as is the existing downcast_mut which could be used in a similarly destructive way today. Otherwise:

  • Upon taking (or in reality internal replacement), some frames that used to return a type id to T, now return an "unknown" type id, the same as if another system added a frame to the report. This to an external user acts as if the original type never existed, which was clear to the user when calling pop or take.
  • Internally, a placeholder exists in it's place, to prevent mutating the structure, or the printable representation.
  • "popping" or "taking" from a Report<C> or Report<[C]> will always return the remaining report as untyped Report<dyn Error>, to ensure soundness.

Would it be possible to give me a concrete example that wouldn't apply to downcast_mut?

@vilkinsons vilkinsons changed the title error-stack: extract owned types from Report. H-4708: error-stack: Extract owned types from Report Jun 4, 2025
@vilkinsons vilkinsons requested a review from TimDiekmann June 4, 2025 07:38
@indietyp
Copy link
Member

indietyp commented Jun 4, 2025

The problem is that this proposal changes a Report on a semantic level, which leads to unintended and subtle consequences downstream. Let's walk through a real example.

To illustrate, I'll use lowercase letters (a, b, ...) for attachments, uppercase letters (A, B, ...) for contexts, and:

  • α(A) to denote a replacement of a context A with your proposed dummy wrapper
  • β(a) to represent values added via the provide API

On Stack Structure:

graph TD;
	a --> b --> A --> c --> d --> B --> e --> f --> A1[A]
    A -.-> g["β(g)"]
	A -.-> h["β(h)"]
Loading

This structure includes:

  • Multiple instances of A (totally valid)
  • Attached values associated with A via provide

After calling pop_current_context, we get:

graph TD;
	a --> b --> A0["α(A)"] --> c --> d --> B --> e --> f --> A1[A]
Loading
  1. downcast_mut subtly changes semantics:
    • Previously, it returned the first A; now it returns the second.
    • This is almost never the intended behavior - yet it's invisible, since the formatting of A and α(A) is identical.
  2. contains becomes misleading:
    • It still returns true because of the second A, but it no longer reflects the original structure.
    • If there were no second A, it would suddenly return false after popping - breaking downstream logic that previously matched. And for consumers who never saw the original Report, the formatting still looks identical, so there’s no obvious sign the structure was mutated.
  3. Loss of provided information:
    • Values like β(g) and β(h) tied to A are now inaccessible.
    • There's no signal to the caller that this data has been silently invalidated and dropped.

This undermines the user's entire mental model of what a Report is. Attachments and contexts don't just format into a string - they compose behavior and meaning. The moment we allow replacement (even with a dummy), we introduce holes in type-driven reasoning.

This is also exacerbated by the fact that it allows for conditional replacement, code might rely on downcast_mut::<A>() to consistently return the outermost A, but now that invariant is broken. Depending on when pop_current_context() was called, you might get the upper or lower one. This behavior is unpredictable, context-sensitive, and not reflected in the API or the formatting layer.

On Tree Structure:

This is especially problematic in non-linear (tree-shaped) reports:

graph TD;
	a --> b --> A --> c
	c --> d --> B
	c --> e --> f --> A1[A]
	c --> g --> A2[A]
Loading

If we pop A, which branch do introspection methods like downcast_mut use now? What happens if ReportSink changes the order? These subtle shifts aren't observable via formatting, but they matter deeply for correctness.

On Ergonomics and Intent

I appreciate the motivation - it's a pain I've encountered too. But in practice, I've found that this isn't usually the right solution.

Turning an error into a context changes its role: it's no longer "the thing being propagated," it's part of the story - one of many layers of context and attachments building toward a final cause. If you pull it back out as if it were the primary error, you risk discarding important framing (attachments, causality, or even the root issue).

That's why I've never reached for an API like this: I want to propagate the whole report, not just a single layer. Anything else strips away what makes error-stack so valuable.

On Interop

Most Rust libraries that expect a specific error type also provide ergonomic interop layers through conversion or formatting- not mutation. This is why I've never personally felt the need to destructure a Report. Instead, the idiomatic approach is to convert or wrap, preserving the full context chain:

This pattern is consistent across the ecosystem: conversion is the escape hatch, and it maintains all invariants of the original Report. By contrast, mutating a report in-place via context removal breaks assumptions these systems rely on, even if the change is invisible at the formatting layer.

@zakstucke
Copy link
Author

zakstucke commented Jun 4, 2025

Thanks for the detailed walkthrough @indietyp!

Totally agree with how you outline the change in behaviour after calling pop/take in each of your steps in your walkthrough, I think we just differ in that I see those changes as expected by the user and reasonable, who knowingly called pop or take.

I completely get the tradeoffs you're concerned about, I think we just value the tradeoffs slightly differently and it's your project, you'll correctly have the final say :)

I think the api hole can still be filled with a watered down version of this PR, that focuses on consuming the entire report, removing the pop_current_context() and downcast_take() methods that open up the major concerns above.

I've updated the PR to be more targeted to not include methods that return a modified report, only a report consuming api.

The new api consists of:

impl<C> Report<C> {
    pub fn into_current_context(self) -> C
        where
            C: Send + Sync + 'static
}

impl<C: ?Sized> Report<C> {
    pub fn into_frame_contents(self) -> impl IntoIterator<Item = Box<dyn Any + Send + Sync>>

    pub fn downcast<T: Send + Sync + 'static>(self) -> Result<T, Self>
}

impl<C> Report<[C]> {
    pub fn into_current_contexts(self) -> impl Iterator<Item = C>
    where
        C: Send + Sync + 'static
}

What do you think? This does not provide ownership over anything unless the report is consumed. This API would cover all the issues I've encountered, without introducing the concerning soundness holes of modifying a report.

I'm not sure on the best name for into_frame_contents, and haven't thought about the provider API and whether they should be included as outputs.

@TimDiekmann TimDiekmann removed their request for review June 30, 2025 08:10
@TimDiekmann
Copy link
Member

Closing this because the parent ticket is closed as well. Also, as @indietyp pointed out, moving out owned types from a Report is complicated and it's breaking invariants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/libs > error-stack Affects the `error-stack` crate (library) area/libs Relates to first-party libraries/crates/packages (area) area/tests New or updated tests
Development

Successfully merging this pull request may close these issues.

3 participants