Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,43 @@ class SQLContext(@transient val sparkContext: SparkContext)
path, ScalaReflection.attributesFor[A], allowExisting, conf, this))
}


/**
* :: Experimental ::
* Creates an empty parquet file with the provided schema. The parquet file thus created
* can be registered as a table, which can then be used as the target of future
* `insertInto` operations.
*
* {{{
* val sqlContext = new SQLContext(...)
* import sqlContext._
*
* val schema = StructType(List(StructField("name", StringType),StructField("age", IntegerType)))
* createParquetFile(schema, "path/to/file.parquet").registerTempTable("people")
* sql("INSERT INTO people SELECT 'michael', 29")
* }}}
*
* @param schema StructType describing the records to be stored in the Parquet file.
* @param path The path where the directory containing parquet metadata should be created.
* Data inserted into this table will also be stored at this location.
* @param allowExisting When false, an exception will be thrown if this directory already exists.
* @param conf A Hadoop configuration object that can be used to specify options to the parquet
* output format.
*
* @group userf
*/
@Experimental
def createParquetFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of think createEmptyParquetFile would be a better name for this method, since most Parquet files have data I'd think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrew,

OK, but keep in mind that my patch overloads an existing method. If you
think createParquetFile should be renamed to createEmptyParquetFile you
should probably file a separate JIRA.

Also, arguably "creating a file" implies that it is empty.

Alex
On Jan 2, 2015 5:11 PM, "Andrew Ash" notifications@github.com wrote:

In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
#3882 (diff):

  • * val schema = StructType(List(StructField("name", StringType),StructField("age", IntegerType)))
  • * createParquetFile(schema, "path/to/file.parquet").registerTempTable("people")
  • * sql("INSERT INTO people SELECT 'michael', 29")
  • * }}}
  • * @param schema StructType describing the records to be stored in the Parquet file.
  • * @param path The path where the directory containing parquet metadata should be created.
  • * Data inserted into this table will also be stored at this location.
  • * @param allowExisting When false, an exception will be thrown if this directory already exists.
  • * @param conf A Hadoop configuration object that can be used to specify options to the parquet
  • * output format.
  • * @group userf
  • */
  • @experimental
  • def createParquetFile(

I kind of think createEmptyParquetFile would be a better name for this
method, since most Parquet files have data I'd think


Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/3882/files#r22428199.

schema: StructType,
path: String,
allowExisting: Boolean = true,
conf: Configuration = new Configuration()): SchemaRDD = {
new SchemaRDD(
this,
ParquetRelation.createEmpty(
path, schema.toAttributes, allowExisting, conf, this))
}

/**
* Registers the given RDD as a temporary table in the catalog. Temporary tables exist only
* during the lifetime of this instance of SQLContext.
Expand Down