Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarity: introduce merge function #2117

Merged
merged 12 commits into from
Dec 9, 2020
Merged

Clarity: introduce merge function #2117

merged 12 commits into from
Dec 9, 2020

Conversation

lgalabru
Copy link
Contributor

@lgalabru lgalabru commented Dec 2, 2020

This PR is addressing #1733 by introducing a (set base-tuple tuple).
The initial issue was suggesting merge, it could also be combine, or set (i personally like the symmetry with get) - please chime in on the naming :).

Todo left:

  • Set cost functions

@lgalabru lgalabru self-assigned this Dec 2, 2020
@njordhov
Copy link
Contributor

njordhov commented Dec 2, 2020

The verb set is used in the mutating functions map-set and var-set making it a mismatch for a pure function that returns a new merged record.

The term merge is widely used for this functionality: The Clojure lisp dialect has merge; Typed Racket lisp has record-merge2; Ruby has merge; In Javascript, Immutable has merge; Lodash FP has merge.

The name merge works well as an input function for e.g.fold:

(fold merge record-list initial-record)

Alternatively, what about using the declarative name record and allow one or more arguments that are merged into a record?

@lgalabru lgalabru marked this pull request as ready for review December 3, 2020 15:02
src/vm/functions/mod.rs Outdated Show resolved Hide resolved
@@ -254,6 +254,14 @@ impl<'a> DefinitionSorter {
}
return Ok(());
}
NativeFunctions::TupleSet => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be handled with a special case in this match? Or could TupleSet just be handled like any other native function (i.e., fall through to line 270, which just iterates over the expressions)?

Copy link
Contributor

@njordhov njordhov Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #2123

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed with a1a3082

Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me -- just a couple small comments. One is that I agree with @njordhov that the function can (and should) be implemented as a normal native function. I also agree that it makes more sense to call this merge -- I think set connotes that it is a mutating method, which it isn't.

src/vm/types/mod.rs Outdated Show resolved Hide resolved
base.type_signature.clone(),
))?;

let new_value = match existing_value {
Copy link
Member

@jcnelson jcnelson Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand the implications of this code:

  • (set { a: { c: 1 }, b: 2 } { a: 1 }) will fail at runtime, because when processing a in base, existing_value would be a Value::Tuple but value would be a Value::Uint(_), triggering the CheckErrors::ExpectedTuple error.
  • (set { a: 1, b: 2 } { a: { c: 1 } }) would succeed, because when processing a in base, existing_value would be a Value::Uint(_) and would just fall through this case. The resulting tuple would be { a: { c: 1 }, b: 2 }.
  • (set { a: { c: 1 }, b: 2 } { a: { d: 2 } }) would fail, because the inner call to TupleData::deep_merge() would fail for the inner { c: 1 } and { d: 2 } values for a.
  • (set { a: { c: 1, d: 2 }, b: 2 } { a: { c: 2 } }) succeed, and the tuple values for a in both base and updates would be merged through this inner call to TupleData::deep_merge(). The resulting tuple would be { a: { c: 1, d: 2 }, b: 2}.

Is my understanding of the above correct? If so, then I think it's kind of surprising. As a programmer, I would have expected the following behaviors:

  • (set { a: { c: 1 }, b: 2 } { a: 1 }) should succeed, but produce the tuple { a: 1, b: 2 }.
  • (set { a: 1, b: 2 } {a: { c: 1 } }) should behave as described above, and produce { a: { c: 1 }, b: 2 }.
  • (set { a: { c: 1 }, b: 2 } { a: { d: 2 } }) should succeed, but produce the tuple { a: { d: 2 }, b: 2 }.
  • (set { a: { c: 1, d: 2 }, b: 2 } { a: { c : 2 } }) should succeed, but produce the tuple { a: { c: 2 }, b: 2 }.

I figured that if I wanted to merge two tuples, the merger ought to be "shallow" and only affect the tuple's immediate keys (instead of a "deep" set, which merges all contained tuples as well). If I wanted the "deep" behavior, I would have explicitly called set on the inner tuples. Making the merger "deep" by default makes cost-tracking more difficult, because the "deep" merge algorithm introduces recursion -- you'll need to bill the user for processing each sub-tuple, as well as find a way to statically determine that a single (set ..) can't cause a stack overflow in the VM (leading to a node crash).

What do you all think? Also cc @kantai

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm -- yes, I think I agree with this. I think it makes more sense for this function to produce a shallow "union" tuple, rather than trying to do a deep merge.

I'm not so worried about the cost-tracking of a deep merge, which could be implemented as a different function (though, I'm not sure how useful it would be -- in our boot contracts and from the contracts people have mostly produced, tuples tend not to nest deeply, and when they do, it's rare that only a field of a child tuple would be updated) -- the cost of a deep merge is still linear in the size of the two inputs, and the depth of the merge is bounded by the type depth limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I designed this feature based on what I've been experiencing while writing contracts. I think in 100% of cases, I would have use this function for updating a tuple coming from storage.
If you look at this use case, then you never want to update the "base" type signature, and be as terse as possible.

Copy link
Member

@jcnelson jcnelson Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I designed this feature based on what I've been experiencing while writing contracts. I think in 100% of cases, I would have use this function for updating a tuple coming from storage.
If you look at this use case, then you never want to update the "base" type signature, and be as terse as possible.

This use-case doesn't invalidate the argument for creating a "union" tuple via (set ..). A subsequent call to var-set, map-set or map-insert would still need to pass type-check. Assuming we did something like what I suggest above, then this code snippet would not pass analysis, for example:

(define-map foo { a: uint, b : { c: uint } } { d: uint, e: { f: uint } })
(let (
   (test (unwrap-panic (map-get { a: u1, b: { c: u2 } })))
) (
   ;; will fail to type-check, since `(set ..)` has type `{ d: uint, e: { f: uint, g: uint } }`,
   ;; which is incompatible with the value type definition for `foo`
   map-set { a: u1, b: { c: u2 } } (set test { d: u3, e: { g: u4 } }))
))

So, having the "union" behavior shouldn't prevent you from safely loading, updating, and storing structured data.

Copy link
Contributor

@njordhov njordhov Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in 100% of cases, I would have use this function for updating a tuple coming from storage.

@lgalabru A common use case for merge in Clarity is to merge the state in a reducer with updated values. See for example the add-pox-addr-to-ith-reward-cycle reducer function tin the pox contract, which could use a (merge params {i: ...}) expression to replace the explicit passing of properties in line 264-267:

https://github.com/blockstack/stacks-blockchain/blob/1ad60a216e53473d6fb103e2e171cfe1003b9883/src/chainstate/stacks/boot/pox.clar#L254-L284

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll fold :trollface:

Copy link
Contributor Author

@lgalabru lgalabru Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on re-implementing this feature, but one problem that we would now be facing, is how do we handle optionals without turning this merge feature into a footgun - as in what should be the type signature of the following cases:

  1. (merge { a: (some { x: 0, y: 1 }), b: 2, c: 3 } { a: none })
  2. (merge { a: 1, b: 2, c: 3 } { a: none })
  3. (merge { b: 2, c: 3 } { a: none })

Happy to get a new round of thoughts on this.

Copy link
Contributor

@njordhov njordhov Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we handle optionals

@lgalabru Option properties should be handled like any other type. Consider that it is is just an enum with two variants (Some and None) just like a Response is an enum with two variants (Ok and Err). Despite the Clarity type name, the property is not optional (clarity-lang/reference#22).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all cases, I think the output type would be the same:

    (merge { a: (some { x: 0, y: 1 }), b: 2, c: 3 } { a: none }) -> { a: (optional NoType), b: int, c: int }
    (merge { a: 1, b: 2, c: 3 } { a: none }) -> { a: (optional NoType), b: int, c: int }
    (merge { b: 2, c: 3 } { a: none }) -> { a: (optional NoType), b: int, c: int }

This should be the natural output from a type checker that just set the output tuple type to be the "shallow" union of the two input tuples. The NoType here won't present difficulties to the type checker unless someone tried to unwrap! the optional, which is a type error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I was not familiar with the type admission rules of NoType / UnknownType, and it looks like this works as expected:

        (define-map users uint { address: principal, name: (optional (string-ascii 32)) })
        (let
            ((user (unwrap-panic (map-get? users u0))))
            (map-set users u0 (merge user { name: none })))

Approach revisited in 8b5c1f0.

Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some clarity (no pun intended) in the intended behavior of set, as outlined in my comments.

@lgalabru lgalabru changed the base branch from feat/tuples-volume-1 to next December 8, 2020 19:37
Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for addressing my feedback!

@@ -64,7 +64,7 @@ impl AssetIdentifier {

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct TupleTypeSignature {
type_map: BTreeMap<ClarityName, TypeSignature>,
pub type_map: BTreeMap<ClarityName, TypeSignature>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be pub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah when doing the merge operation, I'm draining, in order to avoid cloning. Do you prefer a method returning a mutable ref? or cloning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response -- it looks like the draining/mutation is all happening in shallow_merge, which is defined in this same module, so the pub shouldn't be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, got confused by pub behavior, thanks.
Addressed with 65706f3

Copy link
Contributor

@njordhov njordhov Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgalabru Perhaps a reference counted pointer could be of use here, which may avoid much of the cloning through structural sharing in persistent data structures.
https://doc.rust-lang.org/book/ch15-04-rc.html

Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Only comment is about the visibility of TupleTypeSignature::type_map, I think that field should remain private.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants