Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5061][Alex Baretta] SQLContext: overload createParquetFile #3882

Closed
wants to merge 1 commit into from

Conversation

alexbaretta
Copy link
Contributor

Overload of createParquetFile taking a StructType instead of a TypeTag

Overload taking a StructType instead of TypeTag
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ash211
Copy link
Contributor

ash211 commented Jan 3, 2015

Jenkins this is ok to test

* @group userf
*/
@Experimental
def createParquetFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of think createEmptyParquetFile would be a better name for this method, since most Parquet files have data I'd think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrew,

OK, but keep in mind that my patch overloads an existing method. If you
think createParquetFile should be renamed to createEmptyParquetFile you
should probably file a separate JIRA.

Also, arguably "creating a file" implies that it is empty.

Alex
On Jan 2, 2015 5:11 PM, "Andrew Ash" notifications@github.com wrote:

In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
#3882 (diff):

  • * val schema = StructType(List(StructField("name", StringType),StructField("age", IntegerType)))
  • * createParquetFile(schema, "path/to/file.parquet").registerTempTable("people")
  • * sql("INSERT INTO people SELECT 'michael', 29")
  • * }}}
  • * @param schema StructType describing the records to be stored in the Parquet file.
  • * @param path The path where the directory containing parquet metadata should be created.
  • * Data inserted into this table will also be stored at this location.
  • * @param allowExisting When false, an exception will be thrown if this directory already exists.
  • * @param conf A Hadoop configuration object that can be used to specify options to the parquet
  • * output format.
  • * @group userf
  • */
  • @experimental
  • def createParquetFile(

I kind of think createEmptyParquetFile would be a better name for this
method, since most Parquet files have data I'd think


Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/3882/files#r22428199.

@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25000 has started for PR 3882 at commit f6e40b5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25000 has finished for PR 3882 at commit f6e40b5.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25000/
Test FAILed.

@alexbaretta alexbaretta closed this Jan 8, 2015
@alexbaretta alexbaretta deleted the createParquetFile branch January 8, 2015 21:51
@alexbaretta
Copy link
Contributor Author

In retrospect amending my commit might not have been the right thing to do... Any feedback on how to properly amend a PR would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants