Skip to content

Comments

[SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Use other functions to simplify the code pattern of s.c.MapOps#view.mapValues#44122

Closed
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:SPARK-46212

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Dec 2, 2023

What changes were proposed in this pull request?

This pr simplifies s.c.MapOps.view.mapValues using the following approach:

  • For the s.c.immutable.MapOps type, replace it with the s.c.immutable.MapOps#transform function.
def transform[W](f: (K, V) => W): CC[K, W] = map { case (k, v) => (k, f(k, v)) }

Like the case in CountMinSketchSuite:

sampledItems.groupBy(identity).view.mapValues(_.length.toLong)

  • For the s.c.MapOps type, since the transform function does not exist for this type, replace it directly with the map function.
def map[K2, V2](f: ((K, V)) => (K2, V2)): CC[K2, V2] = mapFactory.from(new View.Map(this, f))

Like the case in KafkaTestUtils:

zkClient.getPartitionsForTopics(zkClient.getAllTopicsInCluster()).view.mapValues(_.size).toSeq

  • For the s.c.mutable.MapOps type, the transform function has also been deprecated. At the same time, the signature of transform and its replacement function mapValuesInPlace is as follows:
  @deprecated("Use mapValuesInPlace instead", "2.13.0")
  @inline final def transform(f: (K, V) => V): this.type = mapValuesInPlace(f)

  def mapValuesInPlace(f: (K, V) => V): this.type = {...}

The target type of the value in the function is V, which is different from the target type of the value in s.c.immutable.MapOps#transform, which is W. This does not meet the desired requirement. So in this scenario, it can be divided into two sub-scenarios for handling:

  1. If the mutable.Map are using needs to be eventually converted to an immutable.Map, first convert it to an immutable.Map and then use the transform function for replacement. Like the case in SparkConnectPlanner:

namedArguments.asScala.view.mapValues(transformExpression).toMap)

  1. If the mutable.Map are using does not need to be converted to an immutable.Map in the end, directly use the map function from scala.collection.MapOps for replacement. Like the case in SparkSession:

.putAllNamedArguments(args.asScala.view.mapValues(lit(_).expr).toMap.asJava)))

In addition, there is a special case in PythonWorkerFactory:

For this case, it only needs to destroy each Process in values without returning any value. Therefore, it has been rewritten using .values.foreach.

Why are the changes needed?

The coding pattern of s.c.MapOps.view.mapValues seems verbose, it can be simplified using other functions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft December 2, 2023 10:30
@LuciferYang LuciferYang changed the title [SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace s.c.MapOps#view.mapValues with s.c.mutable.MapOps.map/s.c.MapOps.map/s.c.immutable.MapOps#transform [SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace s.c.MapOps#view.mapValues with s.c.mutable.MapOps#map/s.c.MapOps#map/s.c.immutable.MapOps#transform Dec 2, 2023
@LuciferYang LuciferYang changed the title [SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace s.c.MapOps#view.mapValues with s.c.mutable.MapOps#map/s.c.MapOps#map/s.c.immutable.MapOps#transform [SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace s.c.MapOps#view.mapValues with s.c.MapOps#map/s.c.immutable.MapOps#transform Dec 2, 2023
@LuciferYang
Copy link
Contributor Author

cc @cloud-fan

@LuciferYang LuciferYang marked this pull request as ready for review December 2, 2023 11:05
@LuciferYang LuciferYang changed the title [SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace s.c.MapOps#view.mapValues with s.c.MapOps#map/s.c.immutable.MapOps#transform [SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Use other functions to simplify the code pattern of s.c.MapOps#view.mapValues Dec 2, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (from my side). Thank you, @LuciferYang .

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.

@cloud-fan
Copy link
Contributor

late LGTM, thanks for simplifying it!

asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
…F][EXAMPLES] Use other functions to simplify the code pattern of `s.c.MapOps#view.mapValues`

### What changes were proposed in this pull request?
This pr simplifies `s.c.MapOps.view.mapValues` using the following approach:

- For the `s.c.immutable.MapOps` type, replace it with the `s.c.immutable.MapOps#transform` function.

```scala
def transform[W](f: (K, V) => W): CC[K, W] = map { case (k, v) => (k, f(k, v)) }
```

Like the case in `CountMinSketchSuite`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/common/sketch/src/test/scala/org/apache/spark/util/sketch/CountMinSketchSuite.scala#L59

- For the `s.c.MapOps` type, since the `transform` function does not exist for this type, replace it directly with the `map` function.

```scala
def map[K2, V2](f: ((K, V)) => (K2, V2)): CC[K2, V2] = mapFactory.from(new View.Map(this, f))
```

Like the case in `KafkaTestUtils`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala#L381

- For the `s.c.mutable.MapOps` type, the `transform` function has also been deprecated. At the same time, the signature of `transform` and its replacement function `mapValuesInPlace` is as follows:

```scala

  deprecated("Use mapValuesInPlace instead", "2.13.0")
  inline final def transform(f: (K, V) => V): this.type = mapValuesInPlace(f)

  def mapValuesInPlace(f: (K, V) => V): this.type = {...}
```

The target type of the value in the function is `V`, which is different from the target type of the value in `s.c.immutable.MapOps#transform`, which is `W`. This does not meet the desired requirement. So in this scenario, it can be divided into two sub-scenarios for handling:

1. If the `mutable.Map` are using needs to be eventually converted to an `immutable.Map`, first convert it to an `immutable.Map` and then use the `transform` function for replacement. Like the case in `SparkConnectPlanner`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L292

2. If the `mutable.Map` are using does not need to be converted to an `immutable.Map` in the end, directly use the `map` function from `scala.collection.MapOps` for replacement. Like the case in `SparkSession`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala#L313

In addition, there is a special case in `PythonWorkerFactory`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L381

For this case, it only needs to `destroy` each `Process` in `values` without returning any value. Therefore, it has been rewritten using `.values.foreach`.

### Why are the changes needed?
The coding pattern of `s.c.MapOps.view.mapValues` seems verbose, it can be simplified using other functions.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#44122 from LuciferYang/SPARK-46212.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
…F][EXAMPLES] Use other functions to simplify the code pattern of `s.c.MapOps#view.mapValues`

### What changes were proposed in this pull request?
This pr simplifies `s.c.MapOps.view.mapValues` using the following approach:

- For the `s.c.immutable.MapOps` type, replace it with the `s.c.immutable.MapOps#transform` function.

```scala
def transform[W](f: (K, V) => W): CC[K, W] = map { case (k, v) => (k, f(k, v)) }
```

Like the case in `CountMinSketchSuite`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/common/sketch/src/test/scala/org/apache/spark/util/sketch/CountMinSketchSuite.scala#L59

- For the `s.c.MapOps` type, since the `transform` function does not exist for this type, replace it directly with the `map` function.

```scala
def map[K2, V2](f: ((K, V)) => (K2, V2)): CC[K2, V2] = mapFactory.from(new View.Map(this, f))
```

Like the case in `KafkaTestUtils`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala#L381

- For the `s.c.mutable.MapOps` type, the `transform` function has also been deprecated. At the same time, the signature of `transform` and its replacement function `mapValuesInPlace` is as follows:

```scala

  deprecated("Use mapValuesInPlace instead", "2.13.0")
  inline final def transform(f: (K, V) => V): this.type = mapValuesInPlace(f)

  def mapValuesInPlace(f: (K, V) => V): this.type = {...}
```

The target type of the value in the function is `V`, which is different from the target type of the value in `s.c.immutable.MapOps#transform`, which is `W`. This does not meet the desired requirement. So in this scenario, it can be divided into two sub-scenarios for handling:

1. If the `mutable.Map` are using needs to be eventually converted to an `immutable.Map`, first convert it to an `immutable.Map` and then use the `transform` function for replacement. Like the case in `SparkConnectPlanner`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L292

2. If the `mutable.Map` are using does not need to be converted to an `immutable.Map` in the end, directly use the `map` function from `scala.collection.MapOps` for replacement. Like the case in `SparkSession`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala#L313

In addition, there is a special case in `PythonWorkerFactory`:

https://github.com/apache/spark/blob/0d40b1aea758b95a4416c8653599af8713a4aa16/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L381

For this case, it only needs to `destroy` each `Process` in `values` without returning any value. Therefore, it has been rewritten using `.values.foreach`.

### Why are the changes needed?
The coding pattern of `s.c.MapOps.view.mapValues` seems verbose, it can be simplified using other functions.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#44122 from LuciferYang/SPARK-46212.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants