Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16694] [CORE] Use for/foreach rather than map for Unit expressions whose side effects are required #14332

Closed
wants to merge 1 commit into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Jul 24, 2016

What changes were proposed in this pull request?

Use foreach/for instead of map where operation requires execution of body, not actually defining a transformation

How was this patch tested?

Jenkins

@SparkQA
Copy link

SparkQA commented Jul 24, 2016

Test build #62764 has finished for PR 14332 at commit 6fabd16.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jul 24, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 24, 2016

Test build #62767 has finished for PR 14332 at commit 6fabd16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sys.exit(1)
parser.parse(args, defaultParams) match {
case Some(params) => run(params)
case _ => sys.exit(1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change these kinds of constructs. It's almost purely a stylistic issue (for which I don't believe we have an officially declared preference), and 1) pattern matching over Options is poor form, IMO; 2) the pattern matching alternative here doesn't any more unambiguously imply or signal use of side effects.

If I had my way, we'd fold Options, but that's an argument I lost long ago:

val anOption: Option[T] = ...
anOption.fold { handleNone } { t => ... }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is map. On style: why would you use map with an expression whose type is Unit? theoretically that's never correct. (It's a warning in IJ.) The question isn't readability.

The deeper issue turns up with view-like collections, where the map body won't actually execute at all. I don't know if that type of problem is actually possible with Option though. I figured it can't hurt to foreclose that possibility entirely with a more explicit construct.

fold is less obvious and has the same theoretical problem. I would not use that pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing incorrect about generating an expression of type Unit; what I am talking about is only pattern matching on Options, not the view issue; IJ has lots of opinions, not all of them correct.

What theoretical problem are you talking about with fold over Option? The view issue does not come into play since the pattern matching with Option construct is not even valid if you are dealing with anOption.view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and the "less obvious" argument is why my opinion on fold with Option was rejected -- even though it is a perfectly logical and obvious thing to do for a functional programmer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing wrong with an expression of type Unit per se, but it can only provide side effects. map defines a transformation, but such an expression doesn't transform into anything. Further, when the transformations (thus side effects) occur varies. If the primary purpose is to trigger side effects, right now, we have foreach right? That's what's going on in the rest of the change. I think that's uncontroversial.

How about an argument from consistency? The other changes exist because of a potential correctness problem. I don't think the same problem can ever happen with Option because of getOrElse. But it seems reasonable to avoid the same construct anyway, per argument above, and per consistency. The question is, what to replace it with?

if-else or match both seem fine and straightforward to me. Both are also at least as "obvious" to me as map or fold, probably moreso. From an FP perspective, are those actually appropriate with side-effect-only expressions. At least, why would those be a better choice than something built around foreach?

Yeah I don't think fold can have the same problem actually since it would always force evaluation, scratch that. Or else, it seems to in a test like the map one in the JIRA. It's viable as a way to trigger side effects, it seems, even if I'd say it's still not great FP.

I personally would not use it as an alternative for handling Option because the semantics of fold are about combining multiple values into one. However this usage isn't combining anything, and there aren't even values except trivially Unit. Is it obvious in option.fold(foo)(bar) that foo happens if it's not defined and bar happens when it is? compared to if-else or match, where it's entirely explicit?

Copy link
Contributor

@markhamstra markhamstra Jul 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument from consistency says to treat Option as a collection or monad, not as something special, and to treat Unit as just another type, not as something special. Treated consistently, the return of the zero-value in a fold over a None is not more surprising than folding over an empty List producing the zero-value. To a functional programmer aware of the semantics of fold, producing zeroes from folding over empty collections is every bit as explicit as an if-else or pattern match.

The argument about using fold with Option in Spark isn't going anywhere at this point, but you should look at the Scala API docs, which include the comment that "[a] less-idiomatic way to use scala.Option values is via pattern matching." Also see other commentary, such as http://blog.originate.com/blog/2014/06/15/idiomatic-scala-your-options-do-not-match/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Option has foreach as well and would be fine if the use case here were also to "do X if the Option exists" but here we also need "... or do Y if it doesn't". I'm not sure it's that Unit is special or not, but that it signals there is no meaningful result, which highlights the real issue: the only purpose is the side effect. The difference between transforming a return value with fold or whatever and then exiting with it, and using these constructs to exit, is precisely the difference here. You are arguing that side-effects are neither here nor there when thinking about this from a theoretical functional programming perspective? can't agree with that at all! Transform without side effects? makes total sense for FP idioms. "Transforms" with no result and only side effects? not so much.

The blog you link makes the point that fold is concise, but less readable, and I agree, and I think that's a negative. The Scala API doc shows match in an example where side-effects are desired (println) and not for transformation. How about if-else? I do not particularly think match is necessary.

It's a good discussion but for consistency and readability I think this change (or to if-else) is positive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that you are reading what you want to see instead of what is really there. All of the examples in the Scala API doc involve side-effects. The three examples are essentially equivalent in their intent and effects, and using pattern matching with Option is clearly called out as the least idiomatic.

fold isn't in question since it has already been ruled out by other Spark committers as beyond the ken of those unfamiliar with idiomatic functional programming. I see no reason, however, to do anything different in Spark's code than what the Scala API declares to be "most idiomatic way".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading http://www.scala-lang.org/api/2.11.8/#scala.Option
The only operations where a function operates on an Option have no side effect:
val upper = name map { _.trim } filter { _.length != 0 } map { _.toUpperCase }
Side effects are applied to the result of getOrElse only. What are you reading?

I wonder if the example conflates two things. It's definitely less idiomatic to accomplish the transformation with match. println(option match ...) would be odd. It pushes the println inside though which makes it less clear what about this is unidiomatic. The code snippets don't actually do the same thing, even.

I think you're suggesting that this implies the former is more idiomatic. Yes, but it's also not what you're arguing for. It's crucially different in that we're talking about Option.map or fold to achieve side effects. The example specifically doesn't do that. Calling them "essentially equivalent" hand-waves past the very point.

I understand FP fine; none of this is beyond any of us. I just don't think your argument holds together. I am not going to change this code based on this. I'd welcome other opinions though.

@markhamstra
Copy link
Contributor

The Scala API docs clearly specify that the most idiomatic usage of Option is to treat it as a collection or monad and use map, flatMap, filter, or foreach. Those docs also clearly specify that using Option values via pattern matching is less idiomatic. There is nothing fundamentally wrong with map applying a function that produces results of type Unit, nor is such an operation some kind of special not-a-transformation that must always be treated differently from operations that have been blessed as legitimate transformations. Sure, when you eventually come to the boundaries of purely functional code and it is time to deal with side effects, then the distinctness of Unit comes into play. But it is a basic FP principle or best practice to not treat types as specific instead of abstract until you logically must deal with the specifics -- and that is just not the general case with map functions that produce Unit, nor with folding an Option.

Usage of Unit consistent with usage of other types in things like map and fold can be confusing to the many Spark contributors who aren't completely at home with idiomatic FP, and that is why, e.g., we chose not to allow folding of Option within the Spark code. That's not going to change based on any argument in this PR's comments.

But neither should we shy away from any and all usages of Unit producing functions within map -- especially not when that usage is the preferred, idiomatic usage of Option. The issue that SPARK-16694 should be dealing with is not a fundamental problem with map and Unit, but rather with problems caused by the not completely sound implementation of view in Scala. There are known problems and ugly corner cases associated with view, and that's why I think Scala programmers would be well advised to avoid it when they can, or to use it with great caution when they must. Alternatively, you can more safely use something like Josh Sureth's Viewduction work: https://github.com/jsuereth/viewducers

@srowen
Copy link
Member Author

srowen commented Jul 29, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 29, 2016

Test build #63003 has finished for PR 14332 at commit 6fabd16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…body, not actually defining a transformation
@SparkQA
Copy link

SparkQA commented Jul 29, 2016

Test build #63018 has finished for PR 14332 at commit 3a5c5a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jul 30, 2016

Merged to master

@asfgit asfgit closed this in 0dc4310 Jul 30, 2016
@srowen srowen deleted the SPARK-16694 branch July 30, 2016 15:07
viirya pushed a commit that referenced this pull request Jan 10, 2021
… execute it eagerly

### What changes were proposed in this pull request?

This PR is basically a followup of #14332.
Calling `map` alone might leave it not executed due to lazy evaluation, e.g.)

```
scala> val foo = Seq(1,2,3)
foo: Seq[Int] = List(1, 2, 3)

scala> foo.map(println)
1
2
3
res0: Seq[Unit] = List((), (), ())

scala> foo.view.map(println)
res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...)

scala> foo.view.foreach(println)
1
2
3
```

We should better use `foreach` to make sure it's executed where the output is unused or `Unit`.

### Why are the changes needed?

To prevent the potential issues by not executing `map`.

### Does this PR introduce _any_ user-facing change?

No, the current codes look not causing any problem for now.

### How was this patch tested?

I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally.

Closes #31110 from HyukjinKwon/SPARK-34059.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
HyukjinKwon added a commit that referenced this pull request Jan 12, 2021
… sure execute it eagerly

### What changes were proposed in this pull request?

This is a backport of #31110. I ran intelliJ inspection again in this branch.

This PR is basically a followup of #14332.
Calling `map` alone might leave it not executed due to lazy evaluation, e.g.)

```
scala> val foo = Seq(1,2,3)
foo: Seq[Int] = List(1, 2, 3)

scala> foo.map(println)
1
2
3
res0: Seq[Unit] = List((), (), ())

scala> foo.view.map(println)
res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...)

scala> foo.view.foreach(println)
1
2
3
```

We should better use `foreach` to make sure it's executed where the output is unused or `Unit`.

### Why are the changes needed?

To prevent the potential issues by not executing `map`.

### Does this PR introduce _any_ user-facing change?

No, the current codes look not causing any problem for now.

### How was this patch tested?

I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally.

Closes #31137 from HyukjinKwon/SPARK-34059-3.1.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Jan 12, 2021
… sure execute it eagerly

### What changes were proposed in this pull request?

This is a backport of #31110. I ran intelliJ inspection again in this branch.

This PR is basically a followup of #14332.
Calling `map` alone might leave it not executed due to lazy evaluation, e.g.)

```
scala> val foo = Seq(1,2,3)
foo: Seq[Int] = List(1, 2, 3)

scala> foo.map(println)
1
2
3
res0: Seq[Unit] = List((), (), ())

scala> foo.view.map(println)
res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...)

scala> foo.view.foreach(println)
1
2
3
```

We should better use `foreach` to make sure it's executed where the output is unused or `Unit`.

### Why are the changes needed?

To prevent the potential issues by not executing `map`.

### Does this PR introduce _any_ user-facing change?

No, the current codes look not causing any problem for now.

### How was this patch tested?

I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally.

Closes #31138 from HyukjinKwon/SPARK-34059-3.0.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Jan 12, 2021
… sure execute it eagerly

### What changes were proposed in this pull request?

This is a backport of #31110. I ran intelliJ inspection again in this branch.

This PR is basically a followup of #14332.
Calling `map` alone might leave it not executed due to lazy evaluation, e.g.)

```
scala> val foo = Seq(1,2,3)
foo: Seq[Int] = List(1, 2, 3)

scala> foo.map(println)
1
2
3
res0: Seq[Unit] = List((), (), ())

scala> foo.view.map(println)
res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...)

scala> foo.view.foreach(println)
1
2
3
```

We should better use `foreach` to make sure it's executed where the output is unused or `Unit`.

### Why are the changes needed?

To prevent the potential issues by not executing `map`.

### Does this PR introduce _any_ user-facing change?

No, the current codes look not causing any problem for now.

### How was this patch tested?

I found these item by running IntelliJ inspection, double checked one by one, and fixed them. These should be all instances across the codebase ideally.

Closes #31139 from HyukjinKwon/SPARK-34059-2.4.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants