New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use functors in Diffing, instead of too many parameters #3
Use functors in Diffing, instead of too many parameters #3
Conversation
54c3936
to
125520e
Compare
I agree that |
The gains from my perspective are the following:
The loss is the fact that some forms of parametrization between different "instances" cannot be easily expressed anymore: (There is room for potential naming discussions: maybe the types |
diff depending on the placement choices for a prefix of the | ||
input. This is done by returning (optional) extensions for the | ||
left or right input array. *) | ||
val update : change -> state -> state * left array option * right array option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option type is splitting the "no extension" value into None | Some [||]
. If the aim is to explicit the fact that extensions are not mandatory, I would rather kept the not variadic variant (and ideally keep the limitation that extension are only allowed on one side).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to make the same comment: the previous (slightly convoluted) type was there on purpose, we want to enforce that users are only extending on one side only, and that it doesn't change during execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change of type is indeed blurring the distinction between "I statically know that we will never extend along this direction" and "dynamically I don't want to extend this time". It results in a substantial simplification -- but we could always revert to the previous complex API.
(One thing I can do but haven't yet is get rid of the dynamic = [||]
check in the diffing code, given that producers now choose None in this situation instead.)
Why is it important to prevent an update function from trying to update both directions? I don't understand the details of the algorithm but the code suggests that extending alternatively in either directions would in fact be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Termination is not guaranteed if the client can extends on both side (it could alternate a bit on each side in a loop).
We could just write it in the documentation, but this is a low-usage API, so we should use the inconvenient-but-safe API. With the update
that returns 2 options, it's too easy to do a local change in Includemod_error
(for instance), without realizing that an invariant is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is termination guaranteed if the update function always add a new line? The column width remains fixed, but I would naively assume that you need to explore the infinitely-growing lines in case they contain a better path?
it's too easy to do a local change in Includemod_error (for instance), without realizing that an invariant is broken.
What is the invariant we are talking about? If it is "we only add in the same direction", why is it worth enforcing in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
is only called in theKeep/Change
case, which corresponds to a diagonal move in the matrix.
I don't get it. Here is the code I see in trunk:
let compute_inner_cell ~weight ~test ~update tbl i j =
let compute_proposition i j diff =
let* diff = diff in
let+ localstate = Matrix.state tbl i j in
weight diff + Matrix.weight tbl i j, (diff, localstate)
in
let del =
let diff = let+ x = Matrix.line tbl (i-1) j in Delete x in
compute_proposition (i-1) j diff
in
let insert =
let diff = let+ x = Matrix.column tbl i (j-1) in Insert x in
compute_proposition i (j-1) diff
in
let diag =
let diff =
let* state = Matrix.state tbl (i-1) (j-1) in
let* line = Matrix.line tbl (i-1) (j-1) in
let* column = Matrix.column tbl (i-1) (j-1) in
match test state.state line column with
| Ok ok -> Some (Keep (line, column, ok))
| Error err -> Some (Change (line, column, err))
in
compute_proposition (i-1) (j-1) diff
in
let*! newweight, (diff, localstate) =
select_best_proposition [diag;del;insert]
in
let state = update diff localstate in
Matrix.set tbl i j ~weight:newweight ~state ~diff:(Some diff)
It looks to me like update
is called unconditionally, on a diff
whose category (insert/delete/keep/change) depends on the user-provided weight
function.
I think that the property you are referring to is the fact that, in the user code in includemod.ml
, the update function in fact only provides a non-empty array in the Keep/Change case. This is what I call a dynamic property (in particular, there is no enforcement by the type system).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the following code diverges in trunk:
let () =
let li =
variadic_diff
~weight:(function _ -> 0)
~test:(fun _ _ _ -> Error ())
~update:(With_left_extensions (fun _ () -> (), [|()|]))
() [|()|] [|()|]
in
print_endline (string_of_int (List.length li))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a deal this way?
Not at all.
Exposing variadic diffing as the only interface seems unnecessarily error prone.
(Of course the update type is an implementation detail that can go away).
Similarly, I would rather make the variadic interface as constrained as possible (with one side at a time, which removes a possible mistake at the consumer sode).
And adding those specialized functors seems straigthforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And adding those specialized functors seems straigthforward?
Yes, of course. I can also reinstate the ugly update
(I don't particularly care) if this is a blocking point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface could be strengthened to avoid this issue, but your example is in the territory of "clearly malicious codewhereas the current iteration was aimed at relieving the
update` writer from the burden of remembering which side should be extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the beginning, I was totally on board with this change ... until I realized Diffable
needs to be a functor parametrized by the local environment. If that's the case, we gain approximately nothing, we just trade one kind of complexity for another. If on top of that you need to simplify the arguments (like update
so that everyone follow the same mold, it's really not worth it all that much.
diff depending on the placement choices for a prefix of the | ||
input. This is done by returning (optional) extensions for the | ||
left or right input array. *) | ||
val update : change -> state -> state * left array option * right array option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to make the same comment: the previous (slightly convoluted) type was there on purpose, we want to enforce that users are only extending on one side only, and that it doesn't change during execution.
@Drup What is the problem with having some Diff callsites use a Diffable parametrized over the local environment? I could get rid of this by defining them as a local module inside the corresponding |
It's not a "problem" per se. but you are trading parametric polymorphism with 4 or 5 arguments, with a functor+local functor declaration. The complexity difference just isn't that big. |
It's the same on the callsite, but the definition side is substantially simpler, I think? Both in the type declarations and in the implementation. (I realize now reading the functor implementation again that I forgot to propagate the |
125520e
to
70c5baa
Compare
Done. On types: -val diff :
- weight:(('l, 'r, 'eq, 'diff) change -> int) ->
- test:('state -> 'l -> 'r -> ('eq, 'diff) result) ->
- update:(('l, 'r, 'eq, 'diff) change -> 'state -> 'state) ->
- 'state -> 'l array -> 'r array -> ('l, 'r, 'eq, 'diff) patch
+ val weight : change -> int
+ val test : state -> left -> right -> (eq, diff) result
+ val update : change -> state -> state * left array option * right array option
+ val diff : T.state -> T.left array -> T.right array -> T.patch On terms: -let diff ~weight ~test ~update state line column =
- let update d fs = { fs with state = update d fs.state } in
- let fullstate = { line; column; state } in
- compute_matrix ~weight ~test ~update fullstate
- |> construct_patch
-
+ let diff state line column =
+ { state; line; column }
+ |> compute_matrix
+ |> construct_patch |
(I'm thinking of going ahead and functorizing |
Concerning the Diffable functor, this can be avoided by integrating the environment and location in the state, isn't it? |
I pushed a new commit that also functorizes |
There you go: the |
Also as a generic comment, the functorization of the |
I wanted to see |
I'm a bit frustrated by the discussion around this PR. I have the impression that I made a real effort working on this code to get the hang of it; not expecting a medal, but you could sound more enthusiastic about #2 at least! |
Thank you for taking the time of understanding the rootPR. #2 is mostly moving around complexity (maybe decreasing it a little?). |
In order to facilitate comparison with other approaches, I implemented the restricted interfaces with |
Re. termination: I think that the key argument is that termination is ensured when the update function that the "input sizes" of all produced intermediary states are globally bounded in both directions. One sufficient criterion for this is that (1) one size never extends, and (2) the other never extends on Insert/Delete, but both aspects could be relaxed, it's just that nobody cares to write the more general implementation and that we don't have use-cases for now. (The current codebase is not correct in presence of extension on both sides; I think that the matrix computation works but reconstruction of the best path does not. This is a sensible reason to hide the extending-on-both-sides capability from the user.) |
I have played with a version with twice the number of functors: https://github.com/Octachron/ocaml#semdiff_functor_types . Having separate type definition functors make the impact on the implementation less intrusive while making sure that each diffing function is constrained to its own change type. |
I think the result is reasonably nice. Would you care to submit it as a PR to your PR, superseding these ones? Some comments looking at semdiff_type_decl...semdiff_functor_types (I can't post comments inline), in reverse patch order:
|
Indeed, I completely ran out of naming fuel along the way.
This has the advantage of making the |
Superseded by #4 |
Effect syntax: use Ctype.new_local_type
This PR is on top of #2. I wrote it as an integrable proof-of-concept:
if @Octachron agrees that this makes the code nicer, ideally he would be willing to finish the job by also functorizing Diffing_with_keys on his own. I now implemented functorization for bothDiffing
andDiffing_with_keys
, completing the change I had in mind.