Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ill-typed upgrades cause data-loss #2692

Closed
crusso opened this issue Jul 29, 2021 · 9 comments · May be fixed by #2691
Closed

Ill-typed upgrades cause data-loss #2692

crusso opened this issue Jul 29, 2021 · 9 comments · May be fixed by #2691

Comments

@crusso
Copy link
Contributor

crusso commented Jul 29, 2021

Doing an ill-typed upgrade can cause stable data loss.

The problem is the recent change to candid to support defaulting of optional fields.

Motoko encodes stable variables in a record of options.

Doing an upgrade to an incompatible type will cause those optional fields to default to null, discarding any data those fields might have held (and just running the initializer instead).

A conformance check prior to upgrade could warn about this potential data loss, but absent tooling for that, we are in the situation where users can easily lose their data on upgrade by doing a simple, and natural, mistake like adding a field to stable variable of record type (or, more commonly, map thereof).

#2691 contains a repro of a sequence of ugrades that should fail but don't.

https://forum.dfinity.org/t/questions-about-data-structures-and-migrations/822/15 is a real world example of a user being tripped up by this, trying to "upgrade" a stable array of tuples containing records by adding a field.
Instead of failing the upgrade, the stable variable is regarded as uninitialized and initialized from this initializer expression (losing the data).

Solution: I briefly considered just changing the record of options encoding to a record of variants encoding for stable variables, which would be fine, but isn't backwards compatible with previously compiled code, so a non-starter.

I'll investigate whether we can just change the deserialization behaviour for "T.Memory" records to avoid the option defaulting instead.

@rossberg, @nomeata whaddya think?

Of course, we could say we really need the static check, but I think a defensive implementation is called for anyway.

@crusso crusso linked a pull request Jul 29, 2021 that will close this issue
@crusso
Copy link
Contributor Author

crusso commented Jul 30, 2021

Hmm, for consistency, I wonder if we should just live with this but make the future static checker warn on lossy re-typing of stable vars.

@rossberg
Copy link
Contributor

Ouch, yeah. It's really overdue that we implement this upgrade check!

@crusso
Copy link
Contributor Author

crusso commented Jul 30, 2021

Yeah, but it's not entirely clear to me what to check.

  1. Do we output the candidish type and check an extended candidish subtyping relation? Then we need to implement an extended candid just for that.

  2. Or do we use a slightly extended relation on the Motoko (stable) types? If so, we need to serialize the Motoko types (that might involve mu types) to a format we can check. Probably useful for separate compilation later, but definitely not supported right now.

  3. Another alternative, avoiding serialization of types, is just to type check both programs (before and after) in memory and check the relationship between the in memory types (but that seems gross and won't really accommodate language changes).

I think 1) is probably the way to go, since we have most of the machinery already and just need to extend candid a bit.

@crusso
Copy link
Contributor Author

crusso commented Jul 30, 2021

Another question is where to store this info. In a separate file, in a custom section or in the wasm binary as an additonal query method as Joachim hacked up for the (external) Candid interface. The latter actually has some appeal.

@FloorLamp
Copy link
Contributor

Hi Claudio, has there been any progress on this issue? I recently lost some state due to this, would be good to prevent this from happening to others!

@anthonymq
Copy link

Still no progress ? What do you suggest to prevent data loss while we wait for a clean solution ?
I'm coding a bash script to dump the data (only for the canister owner) and restore the data. The thing is that i'm getting candid data and it's a bit hard to parse it and modify if there is a new field in a type.

@crusso
Copy link
Contributor Author

crusso commented Nov 14, 2021

PR #2887 (not yet merged) adds some support for manually checking compatibility of stable signatures using moc.

Roughly,

moc --stable-types foo.mo will write the stable signature of an actor(class) to file foo.most.
moc --stable-compatible old.most new.most will check that the stable interface can evolve form old.most to new.most in a type safe way without unintentional data loss.

You can use tool didc to check that the dids are compatible

didc --check new.did old.did

The separate didc tool can be already be used to manually check compatibility of the candid interfaces.

What remains is integrating this with the replica and dfx to prevent an unsafe ugrade, but that's still underway and may take a while.

I'm also working on some informal examples to explain what is allowed and what isn't #2897.

The basic rules for stable-compatible are that you can evolve

  • a stable variable to a super-type with the same mutability (so that the upgrade can consume the old value)
  • add a new stable variable with different name and any type (it takes its value from the initializer when first introduced)

The Candid interfaces can evolve to a Candid subtype (note the inversion of the relation), so that old clients can use the new interface without breaking.

In practice, with the current implementation, one can safely change the mutability of a stable variable in an upgrade and drop an existing stable variable, but we have decided to error on those cases for now (but might only warn in future).

@nomeata
Copy link
Collaborator

nomeata commented Jan 10, 2022

#2887 is merged. Shall we close this?

@chenyan-dfinity
Copy link
Contributor

The replica change is still not in production, but we can close this, as it's a motoko issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants