[SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test #2352

chenghao-intel · 2014-09-11T07:05:49Z

Logically, we should remove the Hive Table/Database first and then reset the Hive configuration, repoint to the new data warehouse directory etc.
Otherwise it raised exceptions like "Database doesn't not exists: default" in the local testing.

SparkQA · 2014-09-11T07:53:32Z

QA tests have started for PR 2352 at commit 74fd76b.

This patch merges cleanly.

SparkQA · 2014-09-11T09:39:27Z

QA tests have finished for PR 2352 at commit 74fd76b.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2014-09-11T18:52:59Z

@chenghao-intel Actually this issue has bothered us for some time, and makes the Maven build on Jenkins fail. But we had never reproduce it locally... Would you mind to elaborate on the exact reproduction steps? Details like Maven profiles and other parameters would be greatly helpful. Thanks!

chenghao-intel · 2014-09-12T00:26:16Z

I got the latest master and run

sbt/sbt -Phive assembly 'test-only org.apache.spark.sql.hive.execution.HiveQuerySuite'

liancheng · 2014-09-12T06:07:53Z

Hmm... I couldn't reproduce the HiveQuerySuite failure, but I can steadily reproduce similar failure with StatisticsSuite, and your patch does fixes this one.

Just to make sure I understand this correctly: so the RESET command resets all Hive configurations to their default value, including javax.jdo.option.ConnectionURL and hive.metastore.warehouse.dir. And in the case of TestHiveContext, default value of these two properties are respectively:

jdbc:derby:;databaseName=metastore_db;create=true, and
/user/hive/warehouse

which overrides the random temporary directories specified by TestHiveContext. Then, when this line gets executed, we try to find the "default" database from the wrong place and thus causes the "default" database missing error.

The weird thing is that this part of code has existed for a long time (ever since Spark SQL became part of Spark), but it never fails :( While debugging this "default" database missing issue, I also observed some race condition and execution order related issue, which maybe the reason that this bug has been covered for so long a time...

Anyway, this PR LGTM. Thanks for fixing this!

@marmbrus Let's see whether this can bring our Jenkins Maven build back!

chenghao-intel · 2014-09-12T06:28:15Z

Yes, I think you understand correctly, but I am not sure why the unit test passed with Jenkins previously. Probably the multithreading stuff did the tricky. Let's see if this fix will help.

liancheng · 2014-09-12T06:33:17Z

Actually the SBT Jenkins build is still alright, it's the Maven build that is broken, that's even stranger, since you can easily reproduce it with SBT...

liancheng · 2014-09-12T18:14:13Z

Got more clue on this, which explains why HiveQuerySuite doesn't fail previously. (but @chenghao-intel, why it fails on your side? Still mysterious.) Basically, we were jumping between two different sets of local metastore/warehouse directories while testing. The detailed process is:

When the TestHive singleton object is instantiated, we create a pair of temporary directories and configure them as local testing metastore/warehouse. Let's abbreviate them as m1 and w1.

At this point, these two directories are created, but remain empty. Default Hive database will be created lazily later.
Then HiveQuerySuite gets started. Whenever a test case created via HiveComparisonTest.createQueryTest is executed, we first execute a SHOW TABLES command (notice the "MINOR HACK" comment).

An important thing happens here is that the "default" database gets created in m1 lazily at this point.
Then reset() is called.

Within reset(), first of all, we execute a Hive RESET command, which sets all configurations to their default values, including javax.jdo.option.ConnectionURL and hive.metastore.warehouse.dir. This implies metastore is reset to the metastore_db directory under current working directory and warehouse is reset to /user/hive/warehouse (which usually doesn't exist).
Then follows the getAllTables call, which is used to delete all tables in the "default" database.

During the getAllTables call, the metastore_db directory is created if it's not there, and again, Hive creates an empty "default" database in it lazily. Hmm... wait, so here we end up with two "default" databases, one in m1 and another in metastore_db! As a result, these lines are actually always trying to cleanup tables and databases under the newly created metastore_db directory, which is empty.
At last, we call configure() again and sets metadata/warehouse directories back to m1/w1, which remain intact.

In a word, the TL;DR here is, previously, testing databases and testing tables created by test suites inherited from HiveComparisonTest never really got cleaned up, and the "MINOR HACK" perfectly covered up probably the oldest bug in the history of Spark SQL! By applying this PR, we should be able to remove this hack safely.

liancheng · 2014-09-12T18:24:20Z

If you run StatisticsSuite separately with either sbt test-only or mvn -DwildcardSuites, you can always reproduce the "default" database missing exception. Because no command like SHOW TABLES gets executed, thus the "default" database is not created in the temporary testing warehouse directory.

marmbrus · 2014-09-12T18:30:39Z

Thanks for finding this! I've merge to master and 1.1 and 1.0.

@JoshRosen I think this should fix the Jenkins errors. Please let me know if SQL is responsible for any more failures.

Logically, we should remove the Hive Table/Database first and then reset the Hive configuration, repoint to the new data warehouse directory etc. Otherwise it raised exceptions like "Database doesn't not exists: default" in the local testing. Author: Cheng Hao <hao.cheng@intel.com> Closes #2352 from chenghao-intel/test_hive and squashes the following commits: 74fd76b [Cheng Hao] eliminate the error log (cherry picked from commit 8194fc6) Signed-off-by: Michael Armbrust <michael@databricks.com>

This is a follow up of #2352. Now we can finally remove the evil "MINOR HACK", which covered up the eldest bug in the history of Spark SQL (see details [here](#2352 (comment))). Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2377 from liancheng/remove-evil-minor-hack and squashes the following commits: 0869c78 [Cheng Lian] Removes the evil MINOR HACK

chenghao-intel · 2014-09-15T01:42:18Z

Thank you @liancheng for so detailed explanation. Actually I didn't know those while submitting this PR. :)

a follow up of #2377 and #2352, see detail there. Author: wangfei <wangfei1@huawei.com> Closes #2505 from scwf/patch-6 and squashes the following commits: 4874ec8 [wangfei] removes the evil MINOR HACK

eliminate the error log

74fd76b

asfgit closed this in 8194fc6 Sep 12, 2014

liancheng mentioned this pull request Sep 13, 2014

[SPARK-3481][SQL] Removes the evil MINOR HACK #2377

Closed

scwf mentioned this pull request Sep 23, 2014

[SPARK-3481][SQL] removes the evil MINOR HACK #2505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test #2352

[SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test #2352

chenghao-intel commented Sep 11, 2014

SparkQA commented Sep 11, 2014

SparkQA commented Sep 11, 2014

liancheng commented Sep 11, 2014

chenghao-intel commented Sep 12, 2014

liancheng commented Sep 12, 2014

chenghao-intel commented Sep 12, 2014

liancheng commented Sep 12, 2014

liancheng commented Sep 12, 2014

liancheng commented Sep 12, 2014

marmbrus commented Sep 12, 2014

chenghao-intel commented Sep 15, 2014

[SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test #2352

[SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test #2352

Conversation

chenghao-intel commented Sep 11, 2014

SparkQA commented Sep 11, 2014

SparkQA commented Sep 11, 2014

liancheng commented Sep 11, 2014

chenghao-intel commented Sep 12, 2014

liancheng commented Sep 12, 2014

chenghao-intel commented Sep 12, 2014

liancheng commented Sep 12, 2014

liancheng commented Sep 12, 2014

liancheng commented Sep 12, 2014

marmbrus commented Sep 12, 2014

chenghao-intel commented Sep 15, 2014