Skip to content

[SPARK-15871][SQL] Add assertNotPartitioned check in DataFrameWriter#13597

Closed
lw-lin wants to merge 1 commit intoapache:masterfrom
lw-lin:add-assertNotPartitioned
Closed

[SPARK-15871][SQL] Add assertNotPartitioned check in DataFrameWriter#13597
lw-lin wants to merge 1 commit intoapache:masterfrom
lw-lin:add-assertNotPartitioned

Conversation

@lw-lin
Copy link
Contributor

@lw-lin lw-lin commented Jun 10, 2016

What changes were proposed in this pull request?

It doesn't make sense to specify partitioning parameters, when we write data out from Datasets/DataFrames into jdbc tables or streaming ForeachWriters.

This patch adds assertNotPartitioned check in DataFrameWriter.

operation should check not partitioned?
mode
outputMode
trigger
format
option/options
partitionBy
bucketBy
sortBy
save
queryName
startStream
foreach yes
insertInto
saveAsTable
jdbc yes
json
parquet
orc
text
csv

How was this patch tested?

New dedicated tests.


private def assertNotBucketed(operation: String): Unit = {
if (numBuckets.isDefined || sortColumnNames.isDefined) {
throw new IllegalArgumentException(
Copy link
Contributor Author

@lw-lin lw-lin Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, do you mind if this patch change IllegalArgumentException here to AnalysisException? In most places in this class we throw AnalysisException, :-)

This change should be OK as this has only been in the 2.0-preview release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong preference here.

@SparkQA
Copy link

SparkQA commented Jun 10, 2016

Test build #60292 has finished for PR 13597 at commit 83b2230.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lw-lin
Copy link
Contributor Author

lw-lin commented Jun 10, 2016

@marmbrus @cloud-fan @zsxwing , would you mind taking a look? Thanks! :-)

@zsxwing
Copy link
Member

zsxwing commented Jun 10, 2016

LGTM. @marmbrus what do you think?

@marmbrus
Copy link
Contributor

Seems fine to me.

@zsxwing
Copy link
Member

zsxwing commented Jun 10, 2016

Thanks, merging to master and 2.0.

@asfgit asfgit closed this in fb21902 Jun 10, 2016
asfgit pushed a commit that referenced this pull request Jun 10, 2016
## What changes were proposed in this pull request?

It doesn't make sense to specify partitioning parameters, when we write data out from Datasets/DataFrames into `jdbc` tables or streaming `ForeachWriter`s.

This patch adds `assertNotPartitioned` check in `DataFrameWriter`.

<table>
<tr>
	<td align="center"><strong>operation</strong></td>
	<td align="center"><strong>should check not partitioned?</strong></td>
</tr>
<tr>
	<td align="center">mode</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">outputMode</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">trigger</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">format</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">option/options</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">partitionBy</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">bucketBy</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">sortBy</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">save</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">queryName</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">startStream</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">foreach</td>
	<td align="center">yes</td>
</tr>
<tr>
	<td align="center">insertInto</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">saveAsTable</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">jdbc</td>
	<td align="center">yes</td>
</tr>
<tr>
	<td align="center">json</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">parquet</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">orc</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">text</td>
	<td align="center"></td>
</tr>
<tr>
	<td align="center">csv</td>
	<td align="center"></td>
</tr>
</table>

## How was this patch tested?

New dedicated tests.

Author: Liwei Lin <lwlin7@gmail.com>

Closes #13597 from lw-lin/add-assertNotPartitioned.

(cherry picked from commit fb21902)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
@lw-lin lw-lin deleted the add-assertNotPartitioned branch June 11, 2016 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants