`Seq.zip`, `Seq.map2` etc behave different from `List.zip`, `List.map2`, `Array.zip` etc w.r.t. raising for different lengths #14121

abelbraaksma · 2022-10-16T14:22:26Z

In cases where you apply a pairwise operation to two sequences, like Seq.zip, the behavior in F# Core is defined by the implementation of Seq.map2 and List.map2 and the like. The behavior between collection types is different, however.

Array.zip and List.zip throw an ArgumentException whenever the sequences have differing lengths
Seq.zip on the other hand doesn't.

I doubt this behavior can be changed (backwards compatibility), but I do think it is a bug/oversight or whatchamacallit. I'm currently working on implementing and extending TaskSeq, based on @dsyme's original code from this repo and raised this as a question to myself: fsprojects/FSharp.Control.TaskSeq#32. Then I figured, let's broaden the discussion scope ;).

Repro steps

// this is fine
[1;2;3] |> Seq.zip ["one"; "two" ] |> Seq.toList

// this raises
[1;2;3] |> List.zip ["one"; "two" ]

// this raises too
[1;2;3] |> Array.zip ["one"; "two" ]

Also, this is quite weird:

// returns true??
[1;2;3] |> Seq.forall2 (=) Seq.empty  // true
[1;2;3] |> Seq.forall2 (=) [1;2;3;4]  // true

[1;2;3] |> List.forall2 (=) Seq.empty  // exception
[1;2;3] |> List.forall2 (=) [1;2;3;4]  // exception

Expected behavior

The same behavior for all collection types.

Actual behavior

Functions like Seq.map2, Seq.map3, Seq.mapi2, Seq.zip do not raise an ArgumentException when the sizes are different. However, the implementations do read past the end of the sequence and even if false, read the next item of the paired sequence as well (see MapEnumerator code here). In other words, the information whether one or both sequences are exhausted is available.

Known workarounds

In lazily evaluated sequences, the only workaround is to "roll your own". Easy enough, but still. Alternatively, you could, of course, cache the sequences as an eager sequence like List or Array.

Related information

I did try to find a motivation for this behavior in the source code an online, but failed to do so. There's certainly an argument to be made for not raising an exception, but then one would expect that to be the case for all collection types.

Perhaps there's something with lazy sequences that suggest not raising exceptions in general. But something like [1..3] |> Seq.take 4 raises (not immediately, but when iterating over the sequence), in other words, it does not seem to be a taboo.

The text was updated successfully, but these errors were encountered:

vzarytovskii · 2022-10-17T05:24:49Z

I guess the main difference is that seq can be lazy/a computation, and we may not know its length on advance, so to check it we may need to enumerate it.

I would say it makes sense for seq not to throw, since it's quite different from the collection types such as list or array

abelbraaksma · 2022-10-17T10:06:58Z

so to check it we may need to enumerate it.

@vzarytovskii thanks! Note that we're talking here specifically about pairwise enumerations here (zip, mapi2, map2, iter2 etc). Both sequences are advanced at the same time. So by the time one is exhausted, we know from the other one (through the paired call to MoveNext) whether it is also exhausted.

In other words, we can throw without having to enumerate the whole sequence, we would throw only if the two MoveNext calls return different results. This is similar to Seq.take throwing on [1; 2] |> Seq.take 3 |> Seq.toList. It throws not because it iterates in advance, but only once the sequence is consumed. Because it knows that it has reached the end. However, without Seq.toList here, it would not throw.

I would say it makes sense for seq not to throw, since it's quite different from the collection types such as list or array

But then, why does it throw in other scenarios where exhaustion comes into play? It seems inconsistent.

ly29 · 2022-10-17T10:11:54Z

Altering this behavior would be a breaking change. I like the python like behaivour of not throwing things which make List.zip unusable in many cases. Compare to the slice operations that are safe now.

vzarytovskii · 2022-10-17T10:13:59Z

But then, why does it throw in other scenarios where exhaustion comes into play? It seems inconsistent.

Yeah, we can do the check while enumerating it, however will unlikely change the behaviour, since it will be a huge breaking change.

abelbraaksma · 2022-10-17T10:32:22Z

Oh, definitely. I understand it would be a breaking change. I should probably have opened this as a discussion instead. My main reason for asking, I guess, is to find out whether this was an oversight at the time (i.e. parity with other collections), or deliberate. As such, I'm considering what approach to take for TaskSeq. To raise (current behavior) or not.

I like the python like behaivour of not throwing things which make List.zip unusable in many cases. Compare to the slice operations that are safe now.

This is a good argument against throwing, for sure. I guess that, if there's a deterministic behavior that doesn't throw it should be preferred. Confusingly there's Seq.take (throws) and Seq.truncate (does not throw). Perhaps one of the difficulties is that there are essentially three scenarios (from a user perspective):

Safe operation, return deterministic result (like Seq.truncate and Seq.zip currently)
Unsafe operation, throw when conditions are not met (like Seq.take)
Protected operation, return None when failing (like Seq.tryExactlyOne, Seq.tryHead. Oddly, Seq.tail exists and throws, but Seq.tryTail does not exist)

Offering all three for each and every function seems a bit much. Also, the tryXXX functions cannot work in most cases for Seq since they are lazily evaluated and would require iterating at least once to return a tried result (tryTail, though, wouldn't require full iteration).

abelbraaksma · 2022-10-17T10:35:44Z

Wait, I get it for tryTail now. If there are side effects, the minimum for it to return Some successfully is to eagerly evaluate at least two elements, and the user would have to re-iterate that second item the next time around.

But it shouldn't evaluate anything, unless asked. I.e. let x = [1] |> Seq.tail |> Seq.tail is fine. It should only throw once you do let y = Seq.toList x or something of the sort.

edgarfgp · 2022-10-17T10:40:32Z

This seems to be inconsistent with LINQ . where is does not throw for the examples show in this issue .

#13207 . Is also not consistent with LINQ .

abelbraaksma · 2022-10-17T10:46:18Z

@edgarfgp, yes, I'm working on that issue (unrelated to this, though). Note that LINQ itself behaves inconsistent wrt that issue, see PR for discussion, but this has since been fixed in the latest .NET, I believe.

dsyme · 2022-10-17T14:52:47Z

This is by design, it's just always been this way, it's checked by tests in the repo.

dsyme · 2022-10-17T14:54:10Z

I guess the main difference is that seq can be lazy/a computation, and we may not know its length on advance, so to check it we may need to enumerate it.

I would say it makes sense for seq not to throw, since it's quite different from the collection types such as list or array

This is the rationale really.

abelbraaksma · 2022-10-17T23:07:03Z

I understand the argument “it’s always been this way”, and that we can’t change it. I don’t understand the rationale, as none of my examples require iteration of the whole sequence to throw. In fact, already during the standard operation, all information is available to throw or not (mainly, the two or three booleans that know whether to continue).

This is different from Haskell, btw, in which the order of arguments determines which of the MoveNexts are called, which IMO is wrong in a different way (non commutative arguments).

Anyway, fair enough to close this out, I agree we shouldn’t change the defaults here.

abelbraaksma added Bug Needs-Triage labels Oct 16, 2022

This was referenced Oct 16, 2022

Should we throw an exception for TaskSeq.zip when the sequences are of unequal length? fsprojects/FSharp.Control.TaskSeq#32

Closed

Comparing sequences fails fsprojects/FsUnit#221

Closed

vzarytovskii added this to the Backlog milestone Oct 17, 2022

dsyme closed this as completed Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Seq.zip`, `Seq.map2` etc behave different from `List.zip`, `List.map2`, `Array.zip` etc w.r.t. raising for different lengths #14121

`Seq.zip`, `Seq.map2` etc behave different from `List.zip`, `List.map2`, `Array.zip` etc w.r.t. raising for different lengths #14121

abelbraaksma commented Oct 16, 2022 •

edited

Loading

vzarytovskii commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022 •

edited

Loading

ly29 commented Oct 17, 2022 •

edited

Loading

vzarytovskii commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022 •

edited

Loading

edgarfgp commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022

dsyme commented Oct 17, 2022

dsyme commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022

Seq.zip, Seq.map2 etc behave different from List.zip, List.map2, Array.zip etc w.r.t. raising for different lengths #14121

Seq.zip, Seq.map2 etc behave different from List.zip, List.map2, Array.zip etc w.r.t. raising for different lengths #14121

Comments

abelbraaksma commented Oct 16, 2022 • edited Loading

Repro steps

Expected behavior

Actual behavior

Known workarounds

Related information

vzarytovskii commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022 • edited Loading

ly29 commented Oct 17, 2022 • edited Loading

vzarytovskii commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022 • edited Loading

edgarfgp commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022

dsyme commented Oct 17, 2022

dsyme commented Oct 17, 2022

abelbraaksma commented Oct 17, 2022

`Seq.zip`, `Seq.map2` etc behave different from `List.zip`, `List.map2`, `Array.zip` etc w.r.t. raising for different lengths #14121

`Seq.zip`, `Seq.map2` etc behave different from `List.zip`, `List.map2`, `Array.zip` etc w.r.t. raising for different lengths #14121

abelbraaksma commented Oct 16, 2022 •

edited

Loading

abelbraaksma commented Oct 17, 2022 •

edited

Loading

ly29 commented Oct 17, 2022 •

edited

Loading

abelbraaksma commented Oct 17, 2022 •

edited

Loading