Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use functors in Diffing, instead of too many parameters #3

Closed

Conversation

gasche
Copy link

@gasche gasche commented Jun 8, 2021

This PR is on top of #2. I wrote it as an integrable proof-of-concept: if @Octachron agrees that this makes the code nicer, ideally he would be willing to finish the job by also functorizing Diffing_with_keys on his own. I now implemented functorization for both Diffing and Diffing_with_keys, completing the change I had in mind.

@Octachron
Copy link
Owner

I agree that Diffing.diff is essentially a functor with only one function in the body. This is really the edge case where I hesitate between functors and functions with too many arguments.

@gasche
Copy link
Author

gasche commented Jun 8, 2021

The gains from my perspective are the following:

  • less type parameters all around: I can actually read the types now
  • no need to carry ~test ~update all around

The loss is the fact that some forms of parametrization between different "instances" cannot be easily expressed anymore: map is not expressible for example. But we didn't need those anyway. Having to repeat the type declarations in the producer and the consumer also gets tiresome, so I kept a parametrized version of change outside the functor.

(There is room for potential naming discussions: maybe the types eq and diff should be eq_witness and diff_witness for example?)

diff depending on the placement choices for a prefix of the
input. This is done by returning (optional) extensions for the
left or right input array. *)
val update : change -> state -> state * left array option * right array option
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option type is splitting the "no extension" value into None | Some [||]. If the aim is to explicit the fact that extensions are not mandatory, I would rather kept the not variadic variant (and ideally keep the limitation that extension are only allowed on one side).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to make the same comment: the previous (slightly convoluted) type was there on purpose, we want to enforce that users are only extending on one side only, and that it doesn't change during execution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change of type is indeed blurring the distinction between "I statically know that we will never extend along this direction" and "dynamically I don't want to extend this time". It results in a substantial simplification -- but we could always revert to the previous complex API.
(One thing I can do but haven't yet is get rid of the dynamic = [||] check in the diffing code, given that producers now choose None in this situation instead.)

Why is it important to prevent an update function from trying to update both directions? I don't understand the details of the algorithm but the code suggests that extending alternatively in either directions would in fact be fine.

Copy link
Collaborator

@Drup Drup Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Termination is not guaranteed if the client can extends on both side (it could alternate a bit on each side in a loop).

We could just write it in the documentation, but this is a low-usage API, so we should use the inconvenient-but-safe API. With the update that returns 2 options, it's too easy to do a local change in Includemod_error (for instance), without realizing that an invariant is broken.

Copy link
Author

@gasche gasche Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is termination guaranteed if the update function always add a new line? The column width remains fixed, but I would naively assume that you need to explore the infinitely-growing lines in case they contain a better path?

it's too easy to do a local change in Includemod_error (for instance), without realizing that an invariant is broken.

What is the invariant we are talking about? If it is "we only add in the same direction", why is it worth enforcing in the code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update is only called in the Keep/Change case, which corresponds to a diagonal move in the matrix.

I don't get it. Here is the code I see in trunk:

let compute_inner_cell ~weight ~test ~update tbl i j =
  let compute_proposition i j diff =
    let* diff = diff in
    let+ localstate = Matrix.state tbl i j in
    weight diff + Matrix.weight tbl i j, (diff, localstate)
  in
  let del =
    let diff = let+ x = Matrix.line tbl (i-1) j in Delete x in
    compute_proposition (i-1) j diff
  in
  let insert =
    let diff = let+ x = Matrix.column tbl i (j-1) in Insert x in
    compute_proposition i (j-1) diff
  in
  let diag =
    let diff =
      let* state = Matrix.state tbl (i-1) (j-1) in
      let* line = Matrix.line tbl (i-1) (j-1) in
      let* column = Matrix.column tbl (i-1) (j-1) in
      match test state.state line column with
      | Ok ok -> Some (Keep (line, column, ok))
      | Error err -> Some (Change (line, column, err))
    in
    compute_proposition (i-1) (j-1) diff
  in
  let*! newweight, (diff, localstate) =
    select_best_proposition [diag;del;insert]
  in
  let state = update diff localstate in
  Matrix.set tbl i j ~weight:newweight ~state ~diff:(Some diff)

It looks to me like update is called unconditionally, on a diff whose category (insert/delete/keep/change) depends on the user-provided weight function.

I think that the property you are referring to is the fact that, in the user code in includemod.ml, the update function in fact only provides a non-empty array in the Keep/Change case. This is what I call a dynamic property (in particular, there is no enforcement by the type system).

Copy link
Author

@gasche gasche Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the following code diverges in trunk:

let () =
  let li =
    variadic_diff
      ~weight:(function _ -> 0)
      ~test:(fun _ _ _ -> Error ())
      ~update:(With_left_extensions (fun _ () -> (), [|()|]))
      () [|()|] [|()|]
  in
  print_endline (string_of_int (List.length li))

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a deal this way?

Not at all.

Exposing variadic diffing as the only interface seems unnecessarily error prone.
(Of course the update type is an implementation detail that can go away).

Similarly, I would rather make the variadic interface as constrained as possible (with one side at a time, which removes a possible mistake at the consumer sode).

And adding those specialized functors seems straigthforward?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And adding those specialized functors seems straigthforward?

Yes, of course. I can also reinstate the ugly update (I don't particularly care) if this is a blocking point.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface could be strengthened to avoid this issue, but your example is in the territory of "clearly malicious codewhereas the current iteration was aimed at relieving theupdate` writer from the burden of remembering which side should be extended.

Copy link
Collaborator

@Drup Drup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the beginning, I was totally on board with this change ... until I realized Diffable needs to be a functor parametrized by the local environment. If that's the case, we gain approximately nothing, we just trade one kind of complexity for another. If on top of that you need to simplify the arguments (like update so that everyone follow the same mold, it's really not worth it all that much.

diff depending on the placement choices for a prefix of the
input. This is done by returning (optional) extensions for the
left or right input array. *)
val update : change -> state -> state * left array option * right array option
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to make the same comment: the previous (slightly convoluted) type was there on purpose, we want to enforce that users are only extending on one side only, and that it doesn't change during execution.

@gasche
Copy link
Author

gasche commented Jun 8, 2021

@Drup What is the problem with having some Diff callsites use a Diffable parametrized over the local environment? I could get rid of this by defining them as a local module inside the corresponding diff function, but I thought it would be clearer this way. It is purely a client-side decision, not related to the functorized API on the Diffing side. (I could of course shoehorn (loc, env) as a non-changing part of the state, or add yet another type component to the functor input module for this callsite information, but that would be more complex.)

@Drup
Copy link
Collaborator

Drup commented Jun 8, 2021

It's not a "problem" per se. but you are trading parametric polymorphism with 4 or 5 arguments, with a functor+local functor declaration. The complexity difference just isn't that big.

@gasche
Copy link
Author

gasche commented Jun 8, 2021

It's the same on the callsite, but the definition side is substantially simpler, I think? Both in the type declarations and in the implementation.

(I realize now reading the functor implementation again that I forgot to propagate the weight, test, update functor parameters to the internal functions, which are still parametrized when they don't need to. Let me change that.)

@gasche gasche force-pushed the semdiff_type_decl-functors branch from 125520e to 70c5baa Compare June 8, 2021 11:37
@gasche
Copy link
Author

gasche commented Jun 8, 2021

Done.

On types:

-val diff :
-  weight:(('l, 'r, 'eq, 'diff) change -> int) ->
-  test:('state -> 'l -> 'r -> ('eq, 'diff) result) ->
-  update:(('l, 'r, 'eq, 'diff) change -> 'state -> 'state) ->
-  'state -> 'l array -> 'r array -> ('l, 'r, 'eq, 'diff) patch

+ val weight : change -> int
+ val test : state -> left -> right -> (eq, diff) result
+ val update : change -> state -> state * left array option * right array option
+ val diff : T.state -> T.left array -> T.right array -> T.patch

On terms:

-let diff ~weight ~test ~update state line column =
-  let update d fs = { fs with state = update d fs.state } in
-  let fullstate = { line; column; state } in
-  compute_matrix ~weight ~test ~update fullstate
-  |> construct_patch
- 
+ let diff state line column =
+  { state; line; column }
+  |> compute_matrix
+  |> construct_patch

@gasche
Copy link
Author

gasche commented Jun 8, 2021

(I'm thinking of going ahead and functorizing Diffing_with_keys as well.)

@Octachron
Copy link
Owner

Concerning the Diffable functor, this can be avoided by integrating the environment and location in the state, isn't it?

@gasche
Copy link
Author

gasche commented Jun 8, 2021

I pushed a new commit that also functorizes Diffing_with_keys, resulting in a pleasant simplification. (The user does not have to deal with two diffing modules at once.) There is a bit of boilerplate in building the Diffable argument to the Diffing-with-keys functor, but I think this is less frightening for users/maintainers than unspeakable (or rather unwritable) parametric polymorphism.

@gasche
Copy link
Author

gasche commented Jun 8, 2021

There you go: the Diffable modules are now inside the Diffing functions instead of being functorized over their parameters, a 40% reduction in the number of functor applications introduced by this patchset.

@Octachron
Copy link
Owner

Also as a generic comment, the functorization of the Diffing module seems orthogonal to the parent PR? Once we converge on an interface, I would propose that I (and @Drup ?) review it as an independent PR. Then I will the functorization commit of the keyed version.

@gasche
Copy link
Author

gasche commented Jun 8, 2021

I wanted to see Diffing_with_keys functorized as I was reviewing the PR, and I thought that getting my hands dirty would be even better in terms of code-understanding-sharing than just asking someone else to consider extra work. (Sure was.) Functorizing Diffing was a natural first step before functorizing Diffing_with_keys. I don't have strong opinions in the order in which the PRs should be considered.

@gasche
Copy link
Author

gasche commented Jun 8, 2021

I'm a bit frustrated by the discussion around this PR. I have the impression that I made a real effort working on this code to get the hang of it; not expecting a medal, but you could sound more enthusiastic about #2 at least!

@Octachron
Copy link
Owner

Thank you for taking the time of understanding the rootPR.
The dead code fix in #2 is nice.
Nevertheless, I am not enthusiastic about either PRs.

#2 is mostly moving around complexity (maybe decreasing it a little?).
#3 seems to be mostly increasing complexity by lifting back a functor that was lowered to the core language. If the core issue was the unreadable types, maybe something like 0ac0fef would work better?

@gasche
Copy link
Author

gasche commented Jun 9, 2021

In order to facilitate comparison with other approaches, I implemented the restricted interfaces with Diffing.Make, Diffing.MakeVariadicLeft and Diffing.MakeVariadicRight. There is a bit more (harmless) boilerplate in the library (four interfaces and four functors, three of them exposed and one hidden), but it does simplify the client code a bit.

@gasche
Copy link
Author

gasche commented Jun 9, 2021

Re. termination: I think that the key argument is that termination is ensured when the update function that the "input sizes" of all produced intermediary states are globally bounded in both directions. One sufficient criterion for this is that (1) one size never extends, and (2) the other never extends on Insert/Delete, but both aspects could be relaxed, it's just that nobody cares to write the more general implementation and that we don't have use-cases for now. (The current codebase is not correct in presence of extension on both sides; I think that the matrix computation works but reconstruction of the best path does not. This is a sensible reason to hide the extending-on-both-sides capability from the user.)

@Octachron
Copy link
Owner

I have played with a version with twice the number of functors: https://github.com/Octachron/ocaml#semdiff_functor_types . Having separate type definition functors make the impact on the implementation less intrusive while making sure that each diffing function is constrained to its own change type.
Of course, this is also probably totally over-engineered.

@gasche
Copy link
Author

gasche commented Jun 9, 2021

I think the result is reasonably nice. Would you care to submit it as a PR to your PR, superseding these ones?

Some comments looking at semdiff_type_decl...semdiff_functor_types (I can't post comments inline), in reverse patch order:

  • in Diffing_with_keys, I find the name Extended_defs unfelicitous ; these definitions are arguably not an "extension" of the previous ones (we are never working with both at the same time), but rather a sort of "inner core" on which the features related to the Defs are built. So I would call them Inner_def.
  • In Diffing, sig val diff : ... end is duplicated three times. This also happens in my version (in the present PR; ), but I can't avoid it as the type of my diff depends on the functor argument. This is not the case in your version, you could have a Defined.Diff module type for this, and you could in fact also reuse it in Diffing_with_keys.
  • Going in a different direction: I find your diffing_with_keys.mli hard to read, because it exposes a lot of details on the setup structure of Diffing, you have to be familiar with this module to understand the Diffing_with_keys signature. (This was already somewhat the case with my version, but less so.). Most of this internal layout is not in fact needed to describe the type of the final diff function. I think you should try unfolding/inlining the functor layers here, see if you can get a definition that is conceptually slightly more redundant, but more self-contained and much easier to read in practice.
  • Why are you using a generalize function to witness the relation between the specialized change and the generic version, instead of just exposing type change = (left, right, ...) Generic.change in the signature?
  • Defined is not a very nice name (the fact that its argument is named Defs suggest that you ran out of naming inspiration at that point.) If we want a bland name, what about just Make? Its result could be named Diff, and then the result of Simple would be just F (instead of Diff).

@Octachron
Copy link
Owner

Indeed, I completely ran out of naming fuel along the way.

Why are you using a generalize function to witness the relation between the specialized change and the generic version, instead of just exposing type change = (left, right, ...) Generic.change in the signature?

This has the advantage of making the update and test handles the correct change type by construction since the specialized change type is the only one in scope.

@Octachron
Copy link
Owner

Superseded by #4

@Octachron Octachron closed this Jun 22, 2021
Octachron pushed a commit that referenced this pull request Apr 12, 2024
Effect syntax: use Ctype.new_local_type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants