-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16694] [CORE] Use for/foreach rather than map for Unit expressions whose side effects are required #14332
Conversation
Test build #62764 has finished for PR 14332 at commit
|
Jenkins retest this please |
Test build #62767 has finished for PR 14332 at commit
|
sys.exit(1) | ||
parser.parse(args, defaultParams) match { | ||
case Some(params) => run(params) | ||
case _ => sys.exit(1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't change these kinds of constructs. It's almost purely a stylistic issue (for which I don't believe we have an officially declared preference), and 1) pattern matching over Options is poor form, IMO; 2) the pattern matching alternative here doesn't any more unambiguously imply or signal use of side effects.
If I had my way, we'd fold
Options, but that's an argument I lost long ago:
val anOption: Option[T] = ...
anOption.fold { handleNone } { t => ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is map
. On style: why would you use map
with an expression whose type is Unit
? theoretically that's never correct. (It's a warning in IJ.) The question isn't readability.
The deeper issue turns up with view-like collections, where the map
body won't actually execute at all. I don't know if that type of problem is actually possible with Option
though. I figured it can't hurt to foreclose that possibility entirely with a more explicit construct.
fold
is less obvious and has the same theoretical problem. I would not use that pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing incorrect about generating an expression of type Unit; what I am talking about is only pattern matching on Options, not the view
issue; IJ has lots of opinions, not all of them correct.
What theoretical problem are you talking about with fold
over Option
? The view
issue does not come into play since the pattern matching with Option construct is not even valid if you are dealing with anOption.view
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and the "less obvious" argument is why my opinion on fold
with Option
was rejected -- even though it is a perfectly logical and obvious thing to do for a functional programmer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing wrong with an expression of type Unit
per se, but it can only provide side effects. map
defines a transformation, but such an expression doesn't transform into anything. Further, when the transformations (thus side effects) occur varies. If the primary purpose is to trigger side effects, right now, we have foreach
right? That's what's going on in the rest of the change. I think that's uncontroversial.
How about an argument from consistency? The other changes exist because of a potential correctness problem. I don't think the same problem can ever happen with Option
because of getOrElse
. But it seems reasonable to avoid the same construct anyway, per argument above, and per consistency. The question is, what to replace it with?
if-else
or match
both seem fine and straightforward to me. Both are also at least as "obvious" to me as map
or fold
, probably moreso. From an FP perspective, are those actually appropriate with side-effect-only expressions. At least, why would those be a better choice than something built around foreach
?
Yeah I don't think fold
can have the same problem actually since it would always force evaluation, scratch that. Or else, it seems to in a test like the map
one in the JIRA. It's viable as a way to trigger side effects, it seems, even if I'd say it's still not great FP.
I personally would not use it as an alternative for handling Option
because the semantics of fold
are about combining multiple values into one. However this usage isn't combining anything, and there aren't even values except trivially Unit
. Is it obvious in option.fold(foo)(bar)
that foo
happens if it's not defined and bar
happens when it is? compared to if-else
or match
, where it's entirely explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument from consistency says to treat Option
as a collection or monad, not as something special, and to treat Unit
as just another type, not as something special. Treated consistently, the return of the zero-value in a fold over a None is not more surprising than folding over an empty List producing the zero-value. To a functional programmer aware of the semantics of fold, producing zeroes from folding over empty collections is every bit as explicit as an if-else or pattern match.
The argument about using fold
with Option
in Spark isn't going anywhere at this point, but you should look at the Scala API docs, which include the comment that "[a] less-idiomatic way to use scala.Option values is via pattern matching." Also see other commentary, such as http://blog.originate.com/blog/2014/06/15/idiomatic-scala-your-options-do-not-match/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Option
has foreach
as well and would be fine if the use case here were also to "do X if the Option
exists" but here we also need "... or do Y if it doesn't". I'm not sure it's that Unit
is special or not, but that it signals there is no meaningful result, which highlights the real issue: the only purpose is the side effect. The difference between transforming a return value with fold
or whatever and then exiting with it, and using these constructs to exit, is precisely the difference here. You are arguing that side-effects are neither here nor there when thinking about this from a theoretical functional programming perspective? can't agree with that at all! Transform without side effects? makes total sense for FP idioms. "Transforms" with no result and only side effects? not so much.
The blog you link makes the point that fold
is concise, but less readable, and I agree, and I think that's a negative. The Scala API doc shows match
in an example where side-effects are desired (println
) and not for transformation. How about if-else
? I do not particularly think match
is necessary.
It's a good discussion but for consistency and readability I think this change (or to if-else
) is positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that you are reading what you want to see instead of what is really there. All of the examples in the Scala API doc involve side-effects. The three examples are essentially equivalent in their intent and effects, and using pattern matching with Option
is clearly called out as the least idiomatic.
fold
isn't in question since it has already been ruled out by other Spark committers as beyond the ken of those unfamiliar with idiomatic functional programming. I see no reason, however, to do anything different in Spark's code than what the Scala API declares to be "most idiomatic way".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reading http://www.scala-lang.org/api/2.11.8/#scala.Option
The only operations where a function operates on an Option
have no side effect:
val upper = name map { _.trim } filter { _.length != 0 } map { _.toUpperCase }
Side effects are applied to the result of getOrElse
only. What are you reading?
I wonder if the example conflates two things. It's definitely less idiomatic to accomplish the transformation with match
. println(option match ...)
would be odd. It pushes the println
inside though which makes it less clear what about this is unidiomatic. The code snippets don't actually do the same thing, even.
I think you're suggesting that this implies the former is more idiomatic. Yes, but it's also not what you're arguing for. It's crucially different in that we're talking about Option.map
or fold
to achieve side effects. The example specifically doesn't do that. Calling them "essentially equivalent" hand-waves past the very point.
I understand FP fine; none of this is beyond any of us. I just don't think your argument holds together. I am not going to change this code based on this. I'd welcome other opinions though.
The Scala API docs clearly specify that the most idiomatic usage of Option is to treat it as a collection or monad and use map, flatMap, filter, or foreach. Those docs also clearly specify that using Option values via pattern matching is less idiomatic. There is nothing fundamentally wrong with Usage of But neither should we shy away from any and all usages of |
Jenkins retest this please |
Test build #63003 has finished for PR 14332 at commit
|
…body, not actually defining a transformation
Test build #63018 has finished for PR 14332 at commit
|
Merged to master |
… execute it eagerly ### What changes were proposed in this pull request? This PR is basically a followup of #14332. Calling `map` alone might leave it not executed due to lazy evaluation, e.g.) ``` scala> val foo = Seq(1,2,3) foo: Seq[Int] = List(1, 2, 3) scala> foo.map(println) 1 2 3 res0: Seq[Unit] = List((), (), ()) scala> foo.view.map(println) res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...) scala> foo.view.foreach(println) 1 2 3 ``` We should better use `foreach` to make sure it's executed where the output is unused or `Unit`. ### Why are the changes needed? To prevent the potential issues by not executing `map`. ### Does this PR introduce _any_ user-facing change? No, the current codes look not causing any problem for now. ### How was this patch tested? I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally. Closes #31110 from HyukjinKwon/SPARK-34059. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
… sure execute it eagerly ### What changes were proposed in this pull request? This is a backport of #31110. I ran intelliJ inspection again in this branch. This PR is basically a followup of #14332. Calling `map` alone might leave it not executed due to lazy evaluation, e.g.) ``` scala> val foo = Seq(1,2,3) foo: Seq[Int] = List(1, 2, 3) scala> foo.map(println) 1 2 3 res0: Seq[Unit] = List((), (), ()) scala> foo.view.map(println) res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...) scala> foo.view.foreach(println) 1 2 3 ``` We should better use `foreach` to make sure it's executed where the output is unused or `Unit`. ### Why are the changes needed? To prevent the potential issues by not executing `map`. ### Does this PR introduce _any_ user-facing change? No, the current codes look not causing any problem for now. ### How was this patch tested? I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally. Closes #31137 from HyukjinKwon/SPARK-34059-3.1. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
… sure execute it eagerly ### What changes were proposed in this pull request? This is a backport of #31110. I ran intelliJ inspection again in this branch. This PR is basically a followup of #14332. Calling `map` alone might leave it not executed due to lazy evaluation, e.g.) ``` scala> val foo = Seq(1,2,3) foo: Seq[Int] = List(1, 2, 3) scala> foo.map(println) 1 2 3 res0: Seq[Unit] = List((), (), ()) scala> foo.view.map(println) res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...) scala> foo.view.foreach(println) 1 2 3 ``` We should better use `foreach` to make sure it's executed where the output is unused or `Unit`. ### Why are the changes needed? To prevent the potential issues by not executing `map`. ### Does this PR introduce _any_ user-facing change? No, the current codes look not causing any problem for now. ### How was this patch tested? I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally. Closes #31138 from HyukjinKwon/SPARK-34059-3.0. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
… sure execute it eagerly ### What changes were proposed in this pull request? This is a backport of #31110. I ran intelliJ inspection again in this branch. This PR is basically a followup of #14332. Calling `map` alone might leave it not executed due to lazy evaluation, e.g.) ``` scala> val foo = Seq(1,2,3) foo: Seq[Int] = List(1, 2, 3) scala> foo.map(println) 1 2 3 res0: Seq[Unit] = List((), (), ()) scala> foo.view.map(println) res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...) scala> foo.view.foreach(println) 1 2 3 ``` We should better use `foreach` to make sure it's executed where the output is unused or `Unit`. ### Why are the changes needed? To prevent the potential issues by not executing `map`. ### Does this PR introduce _any_ user-facing change? No, the current codes look not causing any problem for now. ### How was this patch tested? I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally. Closes #31139 from HyukjinKwon/SPARK-34059-2.4. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Use foreach/for instead of map where operation requires execution of body, not actually defining a transformation
How was this patch tested?
Jenkins