Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][DOC] Update Partition Discovery section to enumerate all available file sources #19139

Closed
wants to merge 4 commits into from
Closed

[MINOR][DOC] Update Partition Discovery section to enumerate all available file sources #19139

wants to merge 4 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 5, 2017

What changes were proposed in this pull request?

All built-in data sources support Partition Discovery. We had better update the document to give the users more benefit clearly.

AFTER

1

How was this patch tested?

SKIP_API=1 jekyll serve --watch

@gatorsmile
Copy link
Member

Are the partition discovery applicable to all the built-in data sources?

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81421 has finished for PR 19139 at commit 462fe27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

As you pointed out, I checked it again. Right, HadoopFsRelationTest have it.
Should we merge ParquetPartitionDiscoverySuite and OrcPartitionDiscoverySuite into HadoopFsRelationTest, too?

@gatorsmile
Copy link
Member

abstract class HadoopFsRelationTest is extended by four data sources. You can move some test cases into it. However, I am not sure whether you do it for all of them.

@dongjoon-hyun
Copy link
Member Author

@gatorsmile .
For this PR, I just update the doc only. Thank you so much!

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81423 has finished for PR 19139 at commit fd00fbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -733,7 +733,7 @@ SELECT * FROM parquetTable

Table partitioning is a common optimization approach used in systems like Hive. In a partitioned
table, data are usually stored in different directories, with partitioning column values encoded in
the path of each partition directory. The Parquet data source is now able to discover and infer
the path of each partition directory. All built-in data sources are able to discover and infer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> All built-in file sources (including ...)

You know, it is not applicable to JDBC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Yep. I'll enumerate it.

@gatorsmile
Copy link
Member

BTW, please update the PR title.

@dongjoon-hyun dongjoon-hyun changed the title [MINOR][DOC] Add ORC in Partition Discovery section. [MINOR][DOC] Update Partition Discovery section to enumerate all available data sources Sep 5, 2017
@dongjoon-hyun
Copy link
Member Author

I added (including TEXT/CSV/JSON/ORC/Parquet) and updated the PR title, too.

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81428 has finished for PR 19139 at commit 128c779.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -733,8 +733,9 @@ SELECT * FROM parquetTable

Table partitioning is a common optimization approach used in systems like Hive. In a partitioned
table, data are usually stored in different directories, with partitioning column values encoded in
the path of each partition directory. The Parquet data source is now able to discover and infer
partitioning information automatically. For example, we can store all our previously used
the path of each partition directory. All built-in data sources (including TEXT/CSV/JSON/ORC/Parquet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data sources -> file sources
TEXT -> Text

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's updated.

@gatorsmile
Copy link
Member

LGTM except a minor comment

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81429 has finished for PR 19139 at commit 018fdb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [MINOR][DOC] Update Partition Discovery section to enumerate all available data sources [MINOR][DOC] Update Partition Discovery section to enumerate all available file sources Sep 5, 2017
@gatorsmile
Copy link
Member

Thanks! Merged to master/2.2

asfgit pushed a commit that referenced this pull request Sep 5, 2017
…ailable file sources

## What changes were proposed in this pull request?

All built-in data sources support `Partition Discovery`. We had better update the document to give the users more benefit clearly.

**AFTER**

<img width="906" alt="1" src="https://user-images.githubusercontent.com/9700541/30083628-14278908-9244-11e7-98dc-9ad45fe233a9.png">

## How was this patch tested?

```
SKIP_API=1 jekyll serve --watch
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #19139 from dongjoon-hyun/partitiondiscovery.

(cherry picked from commit 9e451bc)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@asfgit asfgit closed this in 9e451bc Sep 5, 2017
@dongjoon-hyun
Copy link
Member Author

Thank you, @gatorsmile !

@dongjoon-hyun dongjoon-hyun deleted the partitiondiscovery branch September 5, 2017 22:15
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…ailable file sources

## What changes were proposed in this pull request?

All built-in data sources support `Partition Discovery`. We had better update the document to give the users more benefit clearly.

**AFTER**

<img width="906" alt="1" src="https://user-images.githubusercontent.com/9700541/30083628-14278908-9244-11e7-98dc-9ad45fe233a9.png">

## How was this patch tested?

```
SKIP_API=1 jekyll serve --watch
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#19139 from dongjoon-hyun/partitiondiscovery.

(cherry picked from commit 9e451bc)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants