Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27049][SQL] Support handling partition values in the abstraction of file source V2 #23961

Closed
wants to merge 1 commit into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

In FileFormat, the method buildReaderWithPartitionValues appends the partition values to the end of the result returned by buildReader, so that for data sources like CSV/JSON/AVRO only need to implement buildReader to read a single file without taking care of partition values.

This PR proposes to support handling partition values in file source v2 abstraction by:

  1. Have two methods buildReader and buildReaderWithPartitionValues in FilePartitionReaderFactory, which have exactly the same meaning as they are in FileFormat
  2. Rename buildColumnarReader as buildColumnarReaderWithPartitionValues to make the naming consistent.

How was this patch tested?

Existing unit tests.

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #103002 has finished for PR 23961 at commit f2f5668.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class FilePartitionReaderFactory(

@gengliangwang
Copy link
Member Author

retest this please.

@gengliangwang
Copy link
Member Author

@cloud-fan

@SparkQA
Copy link

SparkQA commented Mar 5, 2019

Test build #103021 has finished for PR 23961 at commit f2f5668.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class FilePartitionReaderFactory(

@@ -29,13 +29,3 @@ class PartitionRecordReader[T](

override def close(): Unit = rowReader.close()
}

class PartitionRecordReaderWithProject[X, T](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this class to append partition values?

I feel it's weird to have these 3 methods in FilePartitionReaderFactory: buildReader, buildReaderWithPartitionValues and buildColumnarReaderWithPartitionValues. They are not symmetrical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which class do you mean? I think it is bit complex to refactor the current ColumnarBatch to support appending partition values. I should create another JIRA for it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just different ways to abstract the code: 1) add common methods in the parent class 2) create util classes.

In this case I feel 2) is better, because the added methods in the parent class are not symmetrical.

@gengliangwang
Copy link
Member Author

Close this one and create #23987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants