[SPARK-32064][SQL] Supporting create temporary table#28901
[SPARK-32064][SQL] Supporting create temporary table#28901LantaoJin wants to merge 6 commits intoapache:masterfrom
Conversation
|
Test build #124382 has finished for PR 28901 at commit
|
|
Test build #124394 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #124397 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #124400 has finished for PR 28901 at commit
|
|
Test build #124444 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #124446 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #124447 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #124463 has finished for PR 28901 at commit
|
gatorsmile
left a comment
There was a problem hiding this comment.
This feature requires a lot of changes in different places. We need to define whether it should be global or local; whether we should create such a schema in each session; various error handling when the expected table dropping is not completely finished.
Trying to understand your use case first. Instead of creating a regular table, you want to create a temp table that does not need to be manually dropped?
|
@gatorsmile Yes. Just like Hive temporary table or Teradata volatile table. We are migrating our Spark to v3.0. This is one of inside features which had widely used in our prodution. |
|
Test build #124594 has finished for PR 28901 at commit
|
|
Test build #124600 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #124603 has finished for PR 28901 at commit
|
|
@gatorsmile @cloud-fan Current implementation is not complex. Any comments? |
|
For a proper support, this requires more discussions about the semantics. Also, we need to list the expected behaviors for all the statements listed in https://spark.apache.org/docs/latest/sql-ref-syntax.html . So far, this PR and design doc does not have the corresponding contents. |
| } | ||
| } | ||
|
|
||
| test("create temporary table using data source") { |
There was a problem hiding this comment.
maybe create a new suite for these?
|
|
||
| val SPARK_SCRATCH_DIR = | ||
| buildStaticConf("spark.scratchdir") | ||
| .doc("Scratch space for Spark temporary table and so on. Similar with hive.exec.scratchdir") |
There was a problem hiding this comment.
let's not bring up hive here. Slowly nobody will care about Hive.
Also this should be spark.sql.scratchdir?
Sure, if you could tell me more about what do we need to discuss and what details should be written in documentation, that would be very helpful to me. About the concept of "temporary table", I think it is widly used in database domain: MySQL, PostgreSQL, Oracle etc, also in data warehouse domain: Hive, Teradata etc. Even though their implementations and grammar maybe different more or less, the purposes are similar in my opinion. This implementation and grammar of |
|
If I write the output to a temp location and then create a temp view, is it similar to the temp table? Except that temp table can be removed when the session terminates. |
There is no path and materialized data for temp view. So the answer is no. You can simply treate a Imaging the user case like below: |
|
Let me clarify my previous comment a little bit: If I write the output to a temp location and then create a temp view to read from this temp location, ... I agree with your use case, but I'm bit worried about adding a new big API (temp table) if there are easy workarounds. |
Ah, I knew your meaning now, using |
|
Test build #125153 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #125178 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #125195 has finished for PR 28901 at commit
|
|
retest this please |
|
Test build #125805 has finished for PR 28901 at commit
|
|
how about to use |
I think you mean
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Many databases and data warehouse SQL engines support temporary tables. A temporary table, as its named implied, is a short-lived table that its life will be only for current session.
Hive Temporary Table
Teradata Volatile Table
PostgreSQL Temporary Table
In Spark, there is no temporary table. the DDL “CREATE TEMPORARY TABLE AS SELECT” will create a temporary view. A temporary view is totally different with a temporary table.
This ticket to support Spark native temporary table. More details are described in DESIGN DOCS
Parent ticket https://issues.apache.org/jira/browse/SPARK-32063
Why are the changes needed?
A temporary view is just a VIEW. It doesn’t materialize data in storage. So it has below shortage:
Does this PR introduce any user-facing change?
YES.
before the patch, it will create a local temporary VIEW. After this patch, it will create a temporary table.
before the patch, it will throw exception. After this patch, it will create a temporary table.
Add a new API in
Catalog.scalaHow was this patch tested?
Add unit tests.