Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seq.zip, Seq.map2 etc behave different from List.zip, List.map2, Array.zip etc w.r.t. raising for different lengths #14121

Closed
abelbraaksma opened this issue Oct 16, 2022 · 11 comments

Comments

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Oct 16, 2022

In cases where you apply a pairwise operation to two sequences, like Seq.zip, the behavior in F# Core is defined by the implementation of Seq.map2 and List.map2 and the like. The behavior between collection types is different, however.

  • Array.zip and List.zip throw an ArgumentException whenever the sequences have differing lengths
  • Seq.zip on the other hand doesn't.

I doubt this behavior can be changed (backwards compatibility), but I do think it is a bug/oversight or whatchamacallit. I'm currently working on implementing and extending TaskSeq, based on @dsyme's original code from this repo and raised this as a question to myself: fsprojects/FSharp.Control.TaskSeq#32. Then I figured, let's broaden the discussion scope ;).

Repro steps

// this is fine
[1;2;3] |> Seq.zip ["one"; "two" ] |> Seq.toList

// this raises
[1;2;3] |> List.zip ["one"; "two" ]

// this raises too
[1;2;3] |> Array.zip ["one"; "two" ]

Also, this is quite weird:

// returns true??
[1;2;3] |> Seq.forall2 (=) Seq.empty  // true
[1;2;3] |> Seq.forall2 (=) [1;2;3;4]  // true

[1;2;3] |> List.forall2 (=) Seq.empty  // exception
[1;2;3] |> List.forall2 (=) [1;2;3;4]  // exception

Expected behavior

The same behavior for all collection types.

Actual behavior

Functions like Seq.map2, Seq.map3, Seq.mapi2, Seq.zip do not raise an ArgumentException when the sizes are different. However, the implementations do read past the end of the sequence and even if false, read the next item of the paired sequence as well (see MapEnumerator code here). In other words, the information whether one or both sequences are exhausted is available.

Known workarounds

In lazily evaluated sequences, the only workaround is to "roll your own". Easy enough, but still. Alternatively, you could, of course, cache the sequences as an eager sequence like List or Array.

Related information

I did try to find a motivation for this behavior in the source code an online, but failed to do so. There's certainly an argument to be made for not raising an exception, but then one would expect that to be the case for all collection types.

Perhaps there's something with lazy sequences that suggest not raising exceptions in general. But something like [1..3] |> Seq.take 4 raises (not immediately, but when iterating over the sequence), in other words, it does not seem to be a taboo.

@vzarytovskii
Copy link
Member

I guess the main difference is that seq can be lazy/a computation, and we may not know its length on advance, so to check it we may need to enumerate it.

I would say it makes sense for seq not to throw, since it's quite different from the collection types such as list or array

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Oct 17, 2022

so to check it we may need to enumerate it.

@vzarytovskii thanks! Note that we're talking here specifically about pairwise enumerations here (zip, mapi2, map2, iter2 etc). Both sequences are advanced at the same time. So by the time one is exhausted, we know from the other one (through the paired call to MoveNext) whether it is also exhausted.

In other words, we can throw without having to enumerate the whole sequence, we would throw only if the two MoveNext calls return different results. This is similar to Seq.take throwing on [1; 2] |> Seq.take 3 |> Seq.toList. It throws not because it iterates in advance, but only once the sequence is consumed. Because it knows that it has reached the end. However, without Seq.toList here, it would not throw.

I would say it makes sense for seq not to throw, since it's quite different from the collection types such as list or array

But then, why does it throw in other scenarios where exhaustion comes into play? It seems inconsistent.

@ly29
Copy link

ly29 commented Oct 17, 2022

Altering this behavior would be a breaking change. I like the python like behaivour of not throwing things which make List.zip unusable in many cases. Compare to the slice operations that are safe now.

@vzarytovskii
Copy link
Member

But then, why does it throw in other scenarios where exhaustion comes into play? It seems inconsistent.

Yeah, we can do the check while enumerating it, however will unlikely change the behaviour, since it will be a huge breaking change.

@abelbraaksma
Copy link
Contributor Author

Oh, definitely. I understand it would be a breaking change. I should probably have opened this as a discussion instead. My main reason for asking, I guess, is to find out whether this was an oversight at the time (i.e. parity with other collections), or deliberate. As such, I'm considering what approach to take for TaskSeq. To raise (current behavior) or not.

I like the python like behaivour of not throwing things which make List.zip unusable in many cases. Compare to the slice operations that are safe now.

This is a good argument against throwing, for sure. I guess that, if there's a deterministic behavior that doesn't throw it should be preferred. Confusingly there's Seq.take (throws) and Seq.truncate (does not throw). Perhaps one of the difficulties is that there are essentially three scenarios (from a user perspective):

  1. Safe operation, return deterministic result (like Seq.truncate and Seq.zip currently)
  2. Unsafe operation, throw when conditions are not met (like Seq.take)
  3. Protected operation, return None when failing (like Seq.tryExactlyOne, Seq.tryHead. Oddly, Seq.tail exists and throws, but Seq.tryTail does not exist)

Offering all three for each and every function seems a bit much. Also, the tryXXX functions cannot work in most cases for Seq since they are lazily evaluated and would require iterating at least once to return a tried result (tryTail, though, wouldn't require full iteration).

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Oct 17, 2022

Wait, I get it for tryTail now. If there are side effects, the minimum for it to return Some successfully is to eagerly evaluate at least two elements, and the user would have to re-iterate that second item the next time around.

But it shouldn't evaluate anything, unless asked. I.e. let x = [1] |> Seq.tail |> Seq.tail is fine. It should only throw once you do let y = Seq.toList x or something of the sort.

@edgarfgp
Copy link
Contributor

This seems to be inconsistent with LINQ . where is does not throw for the examples show in this issue .

#13207 . Is also not consistent with LINQ .

@abelbraaksma
Copy link
Contributor Author

@edgarfgp, yes, I'm working on that issue (unrelated to this, though). Note that LINQ itself behaves inconsistent wrt that issue, see PR for discussion, but this has since been fixed in the latest .NET, I believe.

@vzarytovskii vzarytovskii added this to the Backlog milestone Oct 17, 2022
@dsyme
Copy link
Contributor

dsyme commented Oct 17, 2022

This is by design, it's just always been this way, it's checked by tests in the repo.

@dsyme dsyme closed this as completed Oct 17, 2022
@dsyme
Copy link
Contributor

dsyme commented Oct 17, 2022

I guess the main difference is that seq can be lazy/a computation, and we may not know its length on advance, so to check it we may need to enumerate it.

I would say it makes sense for seq not to throw, since it's quite different from the collection types such as list or array

This is the rationale really.

@abelbraaksma
Copy link
Contributor Author

I understand the argument “it’s always been this way”, and that we can’t change it. I don’t understand the rationale, as none of my examples require iteration of the whole sequence to throw. In fact, already during the standard operation, all information is available to throw or not (mainly, the two or three booleans that know whether to continue).

This is different from Haskell, btw, in which the order of arguments determines which of the MoveNexts are called, which IMO is wrong in a different way (non commutative arguments).

Anyway, fair enough to close this out, I agree we shouldn’t change the defaults here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

5 participants