Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15171][SQL] Remove the references to deprecated method dataset.registerTempTable #13098

Closed
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
46 changes: 23 additions & 23 deletions docs/sql-programming-guide.md
Expand Up @@ -529,7 +529,7 @@ case class Person(name: String, age: Int)

// Create an RDD of Person objects and register it as a table.
val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
people.registerTempTable("people")
people.createOrReplaceTempView("people")

// SQL statements can be run by using the sql methods provided by sqlContext.
val teenagers = sqlContext.sql("SELECT name, age FROM people WHERE age >= 13 AND age <= 19")
Expand Down Expand Up @@ -605,7 +605,7 @@ JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").m

// Apply a schema to an RDD of JavaBeans and register it as a table.
DataFrame schemaPeople = sqlContext.createDataFrame(people, Person.class);
schemaPeople.registerTempTable("people");
schemaPeople.createOrReplaceTempView("people");

// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
Expand Down Expand Up @@ -643,7 +643,7 @@ people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))

# Infer the schema, and register the DataFrame as a table.
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.registerTempTable("people")
schemaPeople.createOrReplaceTempView("people")

# SQL can be run over DataFrames that have been registered as a table.
teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
Expand Down Expand Up @@ -703,8 +703,8 @@ val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
// Apply the schema to the RDD.
val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)

// Register the DataFrames as a table.
peopleDataFrame.registerTempTable("people")
// Creates a temporary view.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creates a temporary view using the DataFrame

peopleDataFrame.createOrReplaceTempView("people")

// SQL statements can be run by using the sql methods provided by sqlContext.
val results = sqlContext.sql("SELECT name FROM people")
Expand Down Expand Up @@ -771,10 +771,10 @@ JavaRDD<Row> rowRDD = people.map(
// Apply the schema to the RDD.
DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema);

// Register the DataFrame as a table.
peopleDataFrame.registerTempTable("people");
// Creates a temporary view.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

peopleDataFrame.createOrReplaceTempView("people");

// SQL can be run over RDDs that have been registered as tables.
// SQL can be run over a temporary view.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL can be run over a temporary view created using DataFrames.

DataFrame results = sqlContext.sql("SELECT name FROM people");

// The results of SQL queries are DataFrames and support all the normal RDD operations.
Expand Down Expand Up @@ -824,8 +824,8 @@ schema = StructType(fields)
# Apply the schema to the RDD.
schemaPeople = sqlContext.createDataFrame(people, schema)

# Register the DataFrame as a table.
schemaPeople.registerTempTable("people")
# Creates a temporary view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

schemaPeople.createOrReplaceTempView("people")

# SQL can be run over DataFrames that have been registered as a table.
results = sqlContext.sql("SELECT name FROM people")
Expand All @@ -844,7 +844,7 @@ for name in names.collect():
# Data Sources

Spark SQL supports operating on a variety of data sources through the `DataFrame` interface.
A DataFrame can be operated on as normal RDDs and can also be registered as a temporary table.
A DataFrame can be operated on as normal RDDs and can also be used to create a temporary view.
Registering a DataFrame as a table allows you to run SQL queries over its data. This section
describes the general methods for loading and saving data using the Spark Data Sources and then
goes into specific options that are available for the built-in data sources.
Expand Down Expand Up @@ -1072,8 +1072,8 @@ people.write.parquet("people.parquet")
// The result of loading a Parquet file is also a DataFrame.
val parquetFile = sqlContext.read.parquet("people.parquet")

//Parquet files can also be registered as tables and then used in SQL statements.
parquetFile.registerTempTable("parquetFile")
//Parquet files can also be used to create a temporary view and then used in SQL statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Space after //.

parquetFile.createOrReplaceTempView("parquetFile")
val teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19")
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
{% endhighlight %}
Expand All @@ -1094,8 +1094,8 @@ schemaPeople.write().parquet("people.parquet");
// The result of loading a parquet file is also a DataFrame.
DataFrame parquetFile = sqlContext.read().parquet("people.parquet");

// Parquet files can also be registered as tables and then used in SQL statements.
parquetFile.registerTempTable("parquetFile");
// Parquet files can also be used to create a temporary view and then used in SQL statements.
parquetFile.createOrReplaceTempView("parquetFile");
DataFrame teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19");
List<String> teenagerNames = teenagers.javaRDD().map(new Function<Row, String>() {
public String call(Row row) {
Expand All @@ -1120,8 +1120,8 @@ schemaPeople.write.parquet("people.parquet")
# The result of loading a parquet file is also a DataFrame.
parquetFile = sqlContext.read.parquet("people.parquet")

# Parquet files can also be registered as tables and then used in SQL statements.
parquetFile.registerTempTable("parquetFile");
# Parquet files can also be used to create a temporary view and then used in SQL statements.
parquetFile.createOrReplaceTempView("parquetFile");
teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19")
teenNames = teenagers.map(lambda p: "Name: " + p.name)
for teenName in teenNames.collect():
Expand All @@ -1144,7 +1144,7 @@ write.parquet(schemaPeople, "people.parquet")
# The result of loading a parquet file is also a DataFrame.
parquetFile <- read.parquet(sqlContext, "people.parquet")

# Parquet files can also be registered as tables and then used in SQL statements.
# Parquet files can also be used to create a temporary view and then used in SQL statements.
registerTempTable(parquetFile, "parquetFile")
teenagers <- sql(sqlContext, "SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19")
schema <- structType(structField("name", "string"))
Expand Down Expand Up @@ -1507,7 +1507,7 @@ people.printSchema()
// |-- name: string (nullable = true)

// Register this DataFrame as a table.
people.registerTempTable("people")
people.createOrReplaceTempView("people")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment.


// SQL statements can be run by using the sql methods provided by sqlContext.
val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
Expand Down Expand Up @@ -1544,8 +1544,8 @@ people.printSchema();
// |-- age: long (nullable = true)
// |-- name: string (nullable = true)

// Register this DataFrame as a table.
people.registerTempTable("people");
// Creates a temporary view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as aforementioned.

people.createOrReplaceTempView("people");

// SQL statements can be run by using the sql methods provided by sqlContext.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");
Expand Down Expand Up @@ -1582,8 +1582,8 @@ people.printSchema()
# |-- age: long (nullable = true)
# |-- name: string (nullable = true)

# Register this DataFrame as a table.
people.registerTempTable("people")
# Creates a temporary view.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as aforementioned.

people.createOrReplaceTempView("people")

# SQL statements can be run by using the sql methods provided by `sqlContext`.
teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
Expand Down
12 changes: 6 additions & 6 deletions docs/streaming-programming-guide.md
Expand Up @@ -1553,8 +1553,8 @@ words.foreachRDD { rdd =>
// Convert RDD[String] to DataFrame
val wordsDataFrame = rdd.toDF("word")

// Register as table
wordsDataFrame.registerTempTable("words")
// Create a temporary view
wordsDataFrame.createOrReplaceTempView("words")

// Do word count on DataFrame using SQL and print it
val wordCountsDataFrame =
Expand Down Expand Up @@ -1606,8 +1606,8 @@ words.foreachRDD(
});
DataFrame wordsDataFrame = sqlContext.createDataFrame(rowRDD, JavaRow.class);

// Register as table
wordsDataFrame.registerTempTable("words");
// Creates a temporary view
wordsDataFrame.createOrReplaceTempView("words");

// Do word count on table using SQL and print it
DataFrame wordCountsDataFrame =
Expand Down Expand Up @@ -1646,8 +1646,8 @@ def process(time, rdd):
rowRdd = rdd.map(lambda w: Row(word=w))
wordsDataFrame = sqlContext.createDataFrame(rowRdd)

# Register as table
wordsDataFrame.registerTempTable("words")
# Creates a temporary view
wordsDataFrame.createOrReplaceTempView("words")

# Do word count on table using SQL and print it
wordCountsDataFrame = sqlContext.sql("select word, count(*) as total from words group by word")
Expand Down
Expand Up @@ -73,11 +73,11 @@ public Person call(String line) {
}
});

// Apply a schema to an RDD of Java Beans and register it as a table.
// Apply a schema to an RDD of Java Beans and create a temporary view
Dataset<Row> schemaPeople = spark.createDataFrame(people, Person.class);
schemaPeople.createOrReplaceTempView("people");

// SQL can be run over RDDs that have been registered as tables.
// SQL can be run over RDDs which backs a temporary view.
Dataset<Row> teenagers = spark.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

// The results of SQL queries are DataFrames and support all the normal RDD operations.
Expand All @@ -101,7 +101,7 @@ public String call(Row row) {
// The result of loading a parquet file is also a DataFrame.
Dataset<Row> parquetFile = spark.read().parquet("people.parquet");

//Parquet files can also be registered as tables and then used in SQL statements.
// A temporary view can be created by using Parquet files and then used in SQL statements.
parquetFile.createOrReplaceTempView("parquetFile");
Dataset<Row> teenagers2 =
spark.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19");
Expand Down Expand Up @@ -130,7 +130,7 @@ public String call(Row row) {
// |-- age: IntegerType
// |-- name: StringType

// Register this DataFrame as a table.
// Creates a temporary view
peopleFromJsonFile.createOrReplaceTempView("people");

// SQL statements can be run by using the sql methods provided by `spark`
Expand Down
Expand Up @@ -94,7 +94,7 @@ public JavaRecord call(String word) {
});
Dataset<Row> wordsDataFrame = spark.createDataFrame(rowRDD, JavaRecord.class);

// Register as table
// Creates a temporary view
wordsDataFrame.createOrReplaceTempView("words");

// Do word count on table using SQL and print it
Expand Down
2 changes: 1 addition & 1 deletion examples/src/main/python/sql.py
Expand Up @@ -66,7 +66,7 @@
# |-- age: long (nullable = true)
# |-- name: string (nullable = true)

# Register this DataFrame as a temporary table.
# Creates a temporary view.
people.createOrReplaceTempView("people")

# SQL statements can be run by using the sql methods provided by `spark`
Expand Down
Expand Up @@ -70,7 +70,7 @@ def process(time, rdd):
rowRdd = rdd.map(lambda w: Row(word=w))
wordsDataFrame = spark.createDataFrame(rowRdd)

# Register as table
# Creates a temporary view
wordsDataFrame.createOrReplaceTempView("words")

# Do word count on table using SQL and print it
Expand Down
Expand Up @@ -35,8 +35,8 @@ object RDDRelation {
import spark.implicits._

val df = spark.createDataFrame((1 to 100).map(i => Record(i, s"val_$i")))
// Any RDD containing case classes can be registered as a table. The schema of the table is
// automatically inferred using scala reflection.
// Any RDD containing case classes can be used to create a temporary view. The schema of the
// view is automatically inferred using scala reflection.
df.createOrReplaceTempView("records")

// Once tables have been registered, you can run SQL queries over them.
Expand Down Expand Up @@ -66,7 +66,7 @@ object RDDRelation {
// Queries can be run using the DSL on parquet files just like the original RDD.
parquetFile.where($"key" === 1).select($"value".as("a")).collect().foreach(println)

// These files can also be registered as tables.
// These files can also be used to create a temporary view.
parquetFile.createOrReplaceTempView("parquetFile")
spark.sql("SELECT * FROM parquetFile").collect().foreach(println)

Expand Down
Expand Up @@ -70,9 +70,9 @@ object HiveFromSpark {
case Row(key: Int, value: String) => s"Key: $key, Value: $value"
}

// You can also register RDDs as temporary tables within a HiveContext.
// You can also use RDDs to create temporary views within a HiveContext.
val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
rdd.toDF().registerTempTable("records")
rdd.toDF().createOrReplaceTempView("records")

// Queries can then join RDD data with data stored in Hive.
println("Result of SELECT *:")
Expand Down
Expand Up @@ -66,7 +66,7 @@ object SqlNetworkWordCount {
// Convert RDD[String] to RDD[case class] to DataFrame
val wordsDataFrame = rdd.map(w => Record(w)).toDF()

// Register as table
// Creates a temporary view
wordsDataFrame.createOrReplaceTempView("words")

// Do word count on table using SQL and print it
Expand Down
Expand Up @@ -68,7 +68,7 @@ public void pipeline() {
Pipeline pipeline = new Pipeline()
.setStages(new PipelineStage[]{scaler, lr});
PipelineModel model = pipeline.fit(dataset);
model.transform(dataset).registerTempTable("prediction");
model.transform(dataset).createOrReplaceTempView("prediction");
Dataset<Row> predictions = spark.sql("SELECT label, probability, prediction FROM prediction");
predictions.collectAsList();
}
Expand Down
Expand Up @@ -54,7 +54,7 @@ public void setUp() {
List<LabeledPoint> points = generateLogisticInputAsList(1.0, 1.0, 100, 42);
datasetRDD = jsc.parallelize(points, 2);
dataset = spark.createDataFrame(datasetRDD, LabeledPoint.class);
dataset.registerTempTable("dataset");
dataset.createOrReplaceTempView("dataset");
}

@After
Expand All @@ -68,7 +68,7 @@ public void logisticRegressionDefaultParams() {
LogisticRegression lr = new LogisticRegression();
Assert.assertEquals(lr.getLabelCol(), "label");
LogisticRegressionModel model = lr.fit(dataset);
model.transform(dataset).registerTempTable("prediction");
model.transform(dataset).createOrReplaceTempView("prediction");
Dataset<Row> predictions = spark.sql("SELECT label, probability, prediction FROM prediction");
predictions.collectAsList();
// Check defaults
Expand Down Expand Up @@ -97,14 +97,14 @@ public void logisticRegressionWithSetters() {

// Modify model params, and check that the params worked.
model.setThreshold(1.0);
model.transform(dataset).registerTempTable("predAllZero");
model.transform(dataset).createOrReplaceTempView("predAllZero");
Dataset<Row> predAllZero = spark.sql("SELECT prediction, myProbability FROM predAllZero");
for (Row r : predAllZero.collectAsList()) {
Assert.assertEquals(0.0, r.getDouble(0), eps);
}
// Call transform with params, and check that the params worked.
model.transform(dataset, model.threshold().w(0.0), model.probabilityCol().w("myProb"))
.registerTempTable("predNotAllZero");
.createOrReplaceTempView("predNotAllZero");
Dataset<Row> predNotAllZero = spark.sql("SELECT prediction, myProb FROM predNotAllZero");
boolean foundNonZero = false;
for (Row r : predNotAllZero.collectAsList()) {
Expand All @@ -130,7 +130,7 @@ public void logisticRegressionPredictorClassifierMethods() {
LogisticRegressionModel model = lr.fit(dataset);
Assert.assertEquals(2, model.numClasses());

model.transform(dataset).registerTempTable("transformed");
model.transform(dataset).createOrReplaceTempView("transformed");
Dataset<Row> trans1 = spark.sql("SELECT rawPrediction, probability FROM transformed");
for (Row row : trans1.collectAsList()) {
Vector raw = (Vector) row.get(0);
Expand Down
Expand Up @@ -50,7 +50,7 @@ public void setUp() {
List<LabeledPoint> points = generateLogisticInputAsList(1.0, 1.0, 100, 42);
datasetRDD = jsc.parallelize(points, 2);
dataset = spark.createDataFrame(datasetRDD, LabeledPoint.class);
dataset.registerTempTable("dataset");
dataset.createOrReplaceTempView("dataset");
}

@After
Expand All @@ -65,7 +65,7 @@ public void linearRegressionDefaultParams() {
assertEquals("label", lr.getLabelCol());
assertEquals("auto", lr.getSolver());
LinearRegressionModel model = lr.fit(dataset);
model.transform(dataset).registerTempTable("prediction");
model.transform(dataset).createOrReplaceTempView("prediction");
Dataset<Row> predictions = spark.sql("SELECT label, prediction FROM prediction");
predictions.collect();
// Check defaults
Expand Down
4 changes: 2 additions & 2 deletions python/pyspark/sql/context.py
Expand Up @@ -57,7 +57,7 @@ def __init__(self, sparkContext, sparkSession=None, jsqlContext=None):
... b=True, list=[1, 2, 3], dict={"s": 0}, row=Row(a=1),
... time=datetime(2014, 8, 1, 14, 1, 5))])
>>> df = allTypes.toDF()
>>> df.registerTempTable("allTypes")
>>> df.createOrReplaceTempView("allTypes")
>>> sqlContext.sql('select i+1, d+1, not b, list[1], dict["s"], time, row.a '
... 'from allTypes where b and i > 0').collect()
[Row((i + CAST(1 AS BIGINT))=2, (d + CAST(1 AS DOUBLE))=2.0, (NOT b)=False, list[1]=2, \
Expand Down Expand Up @@ -106,7 +106,7 @@ def getOrCreate(cls, sc):
def newSession(self):
"""
Returns a new SQLContext as new session, that has separate SQLConf,
registered temporary tables and UDFs, but shared SparkContext and
registered temporary views and UDFs, but shared SparkContext and
table cache.
"""
return self.__class__(self._sc, self.sparkSession.newSession())
Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/sql/readwriter.py
Expand Up @@ -266,7 +266,7 @@ def table(self, tableName):
:param tableName: string, name of the table.

>>> df = spark.read.parquet('python/test_support/sql/parquet_partitioned')
>>> df.registerTempTable('tmpTable')
>>> df.createOrReplaceTempView('tmpTable')
>>> spark.read.table('tmpTable').dtypes
[('name', 'string'), ('year', 'int'), ('month', 'int'), ('day', 'int')]
"""
Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/sql/session.py
Expand Up @@ -186,7 +186,7 @@ def __init__(self, sparkContext, jsparkSession=None):
def newSession(self):
"""
Returns a new SparkSession as new session, that has separate SQLConf,
registered temporary tables and UDFs, but shared SparkContext and
registered temporary views and UDFs, but shared SparkContext and
table cache.
"""
return self.__class__(self._sc, self._jsparkSession.newSession())
Expand Down