[SPARK-24073][SQL]: Rename DataReaderFactory to InputPartition. #21145

rdblue · 2018-04-24T19:58:46Z

What changes were proposed in this pull request?

Renames:

DataReaderFactory to InputPartition
DataReader to InputPartitionReader
createDataReaderFactories to planInputPartitions
createUnsafeDataReaderFactories to planUnsafeInputPartitions
createBatchDataReaderFactories to planBatchInputPartitions

This fixes the changes in SPARK-23219, which renamed ReadTask to
DataReaderFactory. The intent of that change was to make the read and
write API match (write side uses DataWriterFactory), but the underlying
problem is that the two classes are not equivalent.

ReadTask/DataReader function as Iterable/Iterator. One InputPartition is
a specific partition of the data to be read, in contrast to
DataWriterFactory where the same factory instance is used in all write
tasks. InputPartition's purpose is to manage the lifecycle of the
associated reader, which is now called InputPartitionReader, with an
explicit create operation to mirror the close operation. This was no
longer clear from the API because DataReaderFactory appeared to be more
generic than it is and it isn't clear why a set of them is produced for
a read.

How was this patch tested?

Existing tests, which have been updated to use the new name.

SparkQA · 2018-04-24T20:04:27Z

Test build #89799 has finished for PR 21145 at commit c364c05.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-25T00:04:41Z

Test build #89806 has finished for PR 21145 at commit cdf2b4d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-25T00:09:39Z

Test build #89808 has finished for PR 21145 at commit cea3b86.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-25T18:53:06Z

Test build #89848 has finished for PR 21145 at commit 609ec14.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-04-25T23:14:55Z

...rnal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala

@@ -299,13 +299,13 @@ private[kafka010] class KafkaMicroBatchReader(
  }
 }

-/** A [[DataReaderFactory]] for reading Kafka data in a micro-batch streaming query. */
+/** A [[ReadTask]] for reading Kafka data in a micro-batch streaming query. */
 private[kafka010] case class KafkaMicroBatchDataReaderFactory(


@rdblue . The re-naming of this PR seems to be partial. Could you replace KafkaMicroBatchDataReaderFactory with KafkaMicroBatchReadTask together in this PR? I guess there will be more occurrences like this.

This fixes the API, not implementations, and it already touches 30+ files.

I'd rather not fix the downstream classes for two reasons. First, to avoid this becoming really large. Second, we need to be able to evolve these APIs without requiring changes to all implementations. I think we should avoid requiring changes to make everything consistent, or else there's tension between making necessary changes to the API and trying to move existing code to that API.

Yes. This kind of changes always become unnecessarily big. Since this PR turns the master branch into an inconsistent state, could you make a JIRA issue for the remaining tasks which this PR avoids? Then, someone else can help Apache Spark become more consistent later eventually in Apache Spark 2.4 (or 3.0) timeframe.

I think we should avoid requiring changes to make everything consistent

Sure, good idea.

arunmahadevan · 2018-04-26T18:00:43Z

IMO, its better to keep it the current way.

DataReaderFactory implies that its something that produces DataReader which it does, whereas ReadTask is gives a notion that it does some reading whereas what it really does is createDataReader.

The current naming makes it consistent at the read/write side and I think the re-naming would add to the already confusing interfaces.

rdblue · 2018-04-26T21:13:47Z

@arunmahadevan, the problem is that the current naming is misleading. This is not a factory (it only produces one specific reader) and it does not have the same lifecycle as the write-side factory. Using parallel naming for the two implies an equivalence that doesn't exist.

gatorsmile · 2018-04-27T19:07:47Z

It sounds like both ReadTask and DataReaderFactory are not a good name here. Any better idea?

rdblue · 2018-04-28T00:44:22Z

I think ReadTask is fine. That name does not imply that you can use the object itself to read, but it does correctly show that it is one task in a larger operation. I think the name implies that it represents something to be read, which is correct, and it is reasonable to look at the API for that object to see how to read it. That can be clearly accomplished, so I don't think we need a different name.

rdblue · 2018-05-01T17:27:47Z

@cloud-fan and @henryr, do you have an opinion about naming here?

henryr · 2018-05-02T20:51:57Z

I don't mind ReadTask. It's imperfect because 'task' implies that this is a thing that can be executed, whereas this interface doesn't have a way to pass control to the task object. It's more like a description of a read task, but I think ReadTaskDescriptor would be a bit verbose.

I certainly agree that this is not a factory.

cloud-fan · 2018-05-03T07:27:44Z

I agree this is not a real factory, but it's not a real task either. I feel something like DataReaderHolder or ReadTaskDescriptor should be better. also cc @rxin @marmbrus

gengliangwang · 2018-05-04T06:40:47Z

Either names are not perfect.
It is not a real task, and it has a method name createDataReader, while there is createDataWriter in DataWriterFactory.
It is not a factory (design pattern). I did the renaming ReadTask -> DataReaderFactory to make read and write API consistent. It wasn't such misleading as expected, since the API in DataSourceReader is List<DataReaderFactory<Row>> createDataReaderFactories();.
Now I feel sorry that I didn't come up with a better naming at that time. But partially changing the naming to ReadTask now only makes things worse.
If there is a better name than both names, let's use it. Otherwise, I prefer DataReaderFactory to ReadTask.

rdblue · 2018-05-04T15:36:58Z

@gengliangwang, we can follow up with a rename for the streaming classes that already use this API. But there is no need to do that right now and make this commit larger.

I think I've already made it clear that I think DataReaderFactory is misleading and we should go with the originally proposed design, ReadTask.

jose-torres · 2018-05-04T15:38:31Z

I don't see the problem with the name ReadTask. In RDDs, we call the serializable representation of a partition for distribution to executors just Partition, and I've always found this pretty intuitive. Certainly it wouldn't be better to call it ComputeIteratorFactory instead.

gatorsmile · 2018-05-04T16:39:59Z

sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java

@@ -22,20 +22,20 @@
 import org.apache.spark.annotation.InterfaceStability;

 /**
- * A reader factory returned by {@link DataSourceReader#createDataReaderFactories()} and is
+ * A read task returned by {@link DataSourceReader#createReadTasks()} and is


I still think ReadTask is confusing. I were asked by multiple people what is a ReadTask, especially when a Task is already clearly defined as a unit of execution in Spark. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L31-L51

Let us avoid using Task here.

Ok - how about ReadTaskDescriptor? (In that case I think it would be ok to leave the method name as createReadTasks()).

Hadoop uses the term "input split" for this. Would it be more clear if Spark adopted the same language?

InputSplit seems a pretty good name for batch, since one split will be processed by one Spark task. How do streaming guys think about it?

Now I'm rethinking the suggestion: InputSplit is a well-known Hadoop class that we probably shouldn't duplicate. What about using InputPartition instead? That makes it clear that the partitioning is on the input data and uses the more common term in Spark.

Is everyone okay with this? @jose-torres @gengliangwang @cloud-fan @henryr @arunmahadevan @gatorsmile?

InputPartition sounds fine, but is it ok to have a method like "createDataReader" inside it? Will create confusion when "inputPartition" is member of other classes like DataSourceRDDPartition?

It appears that the DataReaderFactory is kind of a wrapper for the Reader so that the Reader itself need not be serializable. I am also ok to leave it as is (though technically it may not be a factory).

Okay, one more idea that captures the relationship between DataReader and ReadTask: What about using DataReadable? That's similar to the Iterable and Iterator relationship that the two classes have.

So we want to expose 2 things in the naming:

it represents an input RDD partition

it creates DataReader

I think the first one really needs to be pointed out explicitly, while the second one is not that confusing to create a DataReader from a partition. So +1 on InputPartition.

rdblue · 2018-05-04T17:34:30Z

@gatorsmile, the Spark UI has used the term "task" for years to refer to the same thing. I don't think it is unreasonable to use the same term.

SparkQA · 2018-05-07T21:18:19Z

Test build #90334 has finished for PR 21145 at commit 250c1de.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-05-08T19:04:35Z

Test build #90378 has finished for PR 21145 at commit 560ad6a.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

rdblue · 2018-05-08T19:47:35Z

@cloud-fan, I've updated this PR to use InputPartition and similar names, since we seem to have consensus around it.

SparkQA · 2018-05-08T19:49:47Z

Test build #90380 has finished for PR 21145 at commit 3eff34b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-05-08T19:55:02Z

Test build #90381 has finished for PR 21145 at commit ec53d12.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

Renames: * DataReaderFactory -> InputPartition * DataReader -> InputPartitionReader * createDataReaderFactories -> planInputPartitions * createUnsafeDataReaderFactories -> planUnsafeInputPartitions * createBatchDataReaderFactories -> planBatchInputPartitions This fixes the changes in SPARK-23219, which renamed ReadTask to DataReaderFactory. The intent of that change was to make the read and write API match (write side uses DataWriterFactory), but the underlying problem is that the two classes are not equivalent. ReadTask/DataReader function as Iterable/Iterator. One InputPartition is a specific partition of the data to be read, in contrast to DataWriterFactory where the same factory instance is used in all write tasks. InputPartition's purpose is to manage the lifecycle of the associated reader, which is now called InputPartitionReader, with an explicit create operation to mirror the close operation. This was no longer clear from the API because DataReaderFactory appeared to be more generic than it is and it isn't clear why a set of them is produced for a read.

SparkQA · 2018-05-08T23:24:44Z

Test build #90382 has finished for PR 21145 at commit 1423979.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-05-09T04:22:35Z

sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.java

@@ -76,5 +76,5 @@
   * If this method fails (by throwing an exception), the action would fail and no Spark job was
   * submitted.
   */
-  List<DataReaderFactory<Row>> createDataReaderFactories();
+  List<InputPartition<Row>> planInputPartitions();


in the hadoop world there is InputFormat.getSplits, shall we follow and use getInputPartitions here?

I think plan is a more accurate verb. To some Java people, get implies that the call is very cheap because it is associated with getters, which typically just return a field's value. Since that's not the case here and callers shouldn't consider this method cheap, I think it makes sense to use a different name that reflects what is actually happening: split planning.

cloud-fan · 2018-05-09T04:23:47Z

any other suggestions about naming? we are going to rename DataReaderFactory to InputPartition. cc @rxin @marmbrus @jose-torres

gengliangwang · 2018-05-09T07:00:10Z

InputPartition is good and straightforward 👍

dongjoon-hyun · 2018-05-09T19:00:58Z

Overall, +1 for the change.

gatorsmile · 2018-05-09T22:14:40Z

LGTM to InputPartition. CC @marmbrus @jose-torres to check the streaming side.

jose-torres · 2018-05-10T00:23:41Z

LGTM. I can own cleaning up the names of the streaming classes, probably wrapping that into the broader task of getting a design doc for the streaming reader API.

rdblue · 2018-05-10T00:31:33Z

Thanks @jose-torres! I appreciate not blocking this commit on those changes, since it would be difficult to keep this up to date from the other paths changing, while we discussed what to call these classes.

gatorsmile · 2018-05-10T04:48:22Z

Thanks! Merged to master.

## What changes were proposed in this pull request? In apache#21145, DataReaderFactory is renamed to InputPartition. This PR is to revise wording in the comments to make it more clear. ## How was this patch tested? None Author: Gengliang Wang <gengliang.wang@databricks.com> Closes apache#21326 from gengliangwang/revise_reader_comments.

Renames: * `DataReaderFactory` to `InputPartition` * `DataReader` to `InputPartitionReader` * `createDataReaderFactories` to `planInputPartitions` * `createUnsafeDataReaderFactories` to `planUnsafeInputPartitions` * `createBatchDataReaderFactories` to `planBatchInputPartitions` This fixes the changes in SPARK-23219, which renamed ReadTask to DataReaderFactory. The intent of that change was to make the read and write API match (write side uses DataWriterFactory), but the underlying problem is that the two classes are not equivalent. ReadTask/DataReader function as Iterable/Iterator. One InputPartition is a specific partition of the data to be read, in contrast to DataWriterFactory where the same factory instance is used in all write tasks. InputPartition's purpose is to manage the lifecycle of the associated reader, which is now called InputPartitionReader, with an explicit create operation to mirror the close operation. This was no longer clear from the API because DataReaderFactory appeared to be more generic than it is and it isn't clear why a set of them is produced for a read. Existing tests, which have been updated to use the new name. Author: Ryan Blue <blue@apache.org> Closes apache#21145 from rdblue/SPARK-24073-revert-data-reader-factory-rename. (cherry picked from commit 62d0139) Conflicts: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryLocation.java external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ContinuousDataReaderFactory.java sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ContinuousInputPartition.java sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousQueuedDataReader.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ContinuousMemoryStream.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamMicroBatchReader.scala sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/RateSourceSuite.scala sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousQueuedDataReaderSuite.scala

## What changes were proposed in this pull request? In apache#21145, DataReaderFactory is renamed to InputPartition. This PR is to revise wording in the comments to make it more clear. ## How was this patch tested? None Author: Gengliang Wang <gengliang.wang@databricks.com> Closes apache#21326 from gengliangwang/revise_reader_comments. (cherry picked from commit 6fb7d6c)

Renames: * `DataReaderFactory` to `InputPartition` * `DataReader` to `InputPartitionReader` * `createDataReaderFactories` to `planInputPartitions` * `createUnsafeDataReaderFactories` to `planUnsafeInputPartitions` * `createBatchDataReaderFactories` to `planBatchInputPartitions` This fixes the changes in SPARK-23219, which renamed ReadTask to DataReaderFactory. The intent of that change was to make the read and write API match (write side uses DataWriterFactory), but the underlying problem is that the two classes are not equivalent. ReadTask/DataReader function as Iterable/Iterator. One InputPartition is a specific partition of the data to be read, in contrast to DataWriterFactory where the same factory instance is used in all write tasks. InputPartition's purpose is to manage the lifecycle of the associated reader, which is now called InputPartitionReader, with an explicit create operation to mirror the close operation. This was no longer clear from the API because DataReaderFactory appeared to be more generic than it is and it isn't clear why a set of them is produced for a read. Existing tests, which have been updated to use the new name. Author: Ryan Blue <blue@apache.org> Closes apache#21145 from rdblue/SPARK-24073-revert-data-reader-factory-rename.

## What changes were proposed in this pull request? In apache#21145, DataReaderFactory is renamed to InputPartition. This PR is to revise wording in the comments to make it more clear. ## How was this patch tested? None Author: Gengliang Wang <gengliang.wang@databricks.com> Closes apache#21326 from gengliangwang/revise_reader_comments.

Renames: * `DataReaderFactory` to `InputPartition` * `DataReader` to `InputPartitionReader` * `createDataReaderFactories` to `planInputPartitions` * `createUnsafeDataReaderFactories` to `planUnsafeInputPartitions` * `createBatchDataReaderFactories` to `planBatchInputPartitions` This fixes the changes in SPARK-23219, which renamed ReadTask to DataReaderFactory. The intent of that change was to make the read and write API match (write side uses DataWriterFactory), but the underlying problem is that the two classes are not equivalent. ReadTask/DataReader function as Iterable/Iterator. One InputPartition is a specific partition of the data to be read, in contrast to DataWriterFactory where the same factory instance is used in all write tasks. InputPartition's purpose is to manage the lifecycle of the associated reader, which is now called InputPartitionReader, with an explicit create operation to mirror the close operation. This was no longer clear from the API because DataReaderFactory appeared to be more generic than it is and it isn't clear why a set of them is produced for a read. Existing tests, which have been updated to use the new name. Author: Ryan Blue <blue@apache.org> Closes apache#21145 from rdblue/SPARK-24073-revert-data-reader-factory-rename. (cherry picked from commit 62d0139) Conflicts: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryLocation.java external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ContinuousDataReaderFactory.java sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ContinuousInputPartition.java sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousQueuedDataReader.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ContinuousMemoryStream.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamMicroBatchReader.scala sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/RateSourceSuite.scala sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousQueuedDataReaderSuite.scala

## What changes were proposed in this pull request? In apache#21145, DataReaderFactory is renamed to InputPartition. This PR is to revise wording in the comments to make it more clear. ## How was this patch tested? None Author: Gengliang Wang <gengliang.wang@databricks.com> Closes apache#21326 from gengliangwang/revise_reader_comments. (cherry picked from commit 6fb7d6c)

Renames: * `DataReaderFactory` to `InputPartition` * `DataReader` to `InputPartitionReader` * `createDataReaderFactories` to `planInputPartitions` * `createUnsafeDataReaderFactories` to `planUnsafeInputPartitions` * `createBatchDataReaderFactories` to `planBatchInputPartitions` This fixes the changes in SPARK-23219, which renamed ReadTask to DataReaderFactory. The intent of that change was to make the read and write API match (write side uses DataWriterFactory), but the underlying problem is that the two classes are not equivalent. ReadTask/DataReader function as Iterable/Iterator. One InputPartition is a specific partition of the data to be read, in contrast to DataWriterFactory where the same factory instance is used in all write tasks. InputPartition's purpose is to manage the lifecycle of the associated reader, which is now called InputPartitionReader, with an explicit create operation to mirror the close operation. This was no longer clear from the API because DataReaderFactory appeared to be more generic than it is and it isn't clear why a set of them is produced for a read. Existing tests, which have been updated to use the new name. Author: Ryan Blue <blue@apache.org> Closes apache#21145 from rdblue/SPARK-24073-revert-data-reader-factory-rename. Ref: LIHADOOP-48531

In apache#21145, DataReaderFactory is renamed to InputPartition. This PR is to revise wording in the comments to make it more clear. None Author: Gengliang Wang <gengliang.wang@databricks.com> Closes apache#21326 from gengliangwang/revise_reader_comments. Ref: LIHADOOP-48531

rdblue force-pushed the SPARK-24073-revert-data-reader-factory-rename branch from c364c05 to cdf2b4d Compare April 24, 2018 23:57

rdblue changed the title ~~SPARK-24073: Rename DataReaderFactory to ReadTask.~~ [SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask. Apr 25, 2018

rdblue force-pushed the SPARK-24073-revert-data-reader-factory-rename branch from cea3b86 to 609ec14 Compare April 25, 2018 15:17

rdblue mentioned this pull request Apr 25, 2018

SPARK-23325: Use InternalRow when reading with DataSourceV2. #21118

Closed

dongjoon-hyun reviewed Apr 25, 2018

View reviewed changes

gatorsmile reviewed May 4, 2018

View reviewed changes

rdblue force-pushed the SPARK-24073-revert-data-reader-factory-rename branch from 609ec14 to 250c1de Compare May 7, 2018 17:41

rdblue force-pushed the SPARK-24073-revert-data-reader-factory-rename branch from 250c1de to 560ad6a Compare May 8, 2018 18:57

rdblue changed the title ~~[SPARK-24073][SQL]: Rename DataReaderFactory to ReadTask.~~ [SPARK-24073][SQL]: Rename DataReaderFactory to InputPartition. May 8, 2018

rdblue force-pushed the SPARK-24073-revert-data-reader-factory-rename branch from 560ad6a to 3eff34b Compare May 8, 2018 19:46

rdblue force-pushed the SPARK-24073-revert-data-reader-factory-rename branch from 3eff34b to ec53d12 Compare May 8, 2018 19:50

rdblue force-pushed the SPARK-24073-revert-data-reader-factory-rename branch from ec53d12 to 1423979 Compare May 8, 2018 19:57

cloud-fan reviewed May 9, 2018

View reviewed changes

asfgit closed this in 62d0139 May 10, 2018

gengliangwang mentioned this pull request May 15, 2018

[SPARK-24275][SQL] Revise doc comments in InputPartition #21326

Closed

[SPARK-24073][SQL]: Rename DataReaderFactory to InputPartition. #21145

[SPARK-24073][SQL]: Rename DataReaderFactory to InputPartition. #21145

Conversation

rdblue commented Apr 24, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Apr 24, 2018

SparkQA commented Apr 25, 2018

SparkQA commented Apr 25, 2018

SparkQA commented Apr 25, 2018

Choose a reason for hiding this comment

rdblue Apr 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arunmahadevan commented Apr 26, 2018

rdblue commented Apr 26, 2018

gatorsmile commented Apr 27, 2018

rdblue commented Apr 28, 2018

rdblue commented May 1, 2018

henryr commented May 2, 2018

cloud-fan commented May 3, 2018

gengliangwang commented May 4, 2018

rdblue commented May 4, 2018

jose-torres commented May 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue commented May 4, 2018

SparkQA commented May 7, 2018

SparkQA commented May 8, 2018

rdblue commented May 8, 2018

SparkQA commented May 8, 2018

SparkQA commented May 8, 2018

SparkQA commented May 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented May 9, 2018

gengliangwang commented May 9, 2018

dongjoon-hyun commented May 9, 2018

gatorsmile commented May 9, 2018

jose-torres commented May 10, 2018

rdblue commented May 10, 2018

gatorsmile commented May 10, 2018

rdblue commented Apr 24, 2018 •

edited

Loading

rdblue Apr 25, 2018 •

edited

Loading