Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test #2352

Closed
wants to merge 1 commit into from

Conversation

chenghao-intel
Copy link
Contributor

Logically, we should remove the Hive Table/Database first and then reset the Hive configuration, repoint to the new data warehouse directory etc.
Otherwise it raised exceptions like "Database doesn't not exists: default" in the local testing.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2352 at commit 74fd76b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2352 at commit 74fd76b.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@chenghao-intel Actually this issue has bothered us for some time, and makes the Maven build on Jenkins fail. But we had never reproduce it locally... Would you mind to elaborate on the exact reproduction steps? Details like Maven profiles and other parameters would be greatly helpful. Thanks!

@chenghao-intel
Copy link
Contributor Author

I got the latest master and run

sbt/sbt -Phive assembly 'test-only org.apache.spark.sql.hive.execution.HiveQuerySuite'

@liancheng
Copy link
Contributor

Hmm... I couldn't reproduce the HiveQuerySuite failure, but I can steadily reproduce similar failure with StatisticsSuite, and your patch does fixes this one.

Just to make sure I understand this correctly: so the RESET command resets all Hive configurations to their default value, including javax.jdo.option.ConnectionURL and hive.metastore.warehouse.dir. And in the case of TestHiveContext, default value of these two properties are respectively:

  • jdbc:derby:;databaseName=metastore_db;create=true, and
  • /user/hive/warehouse

which overrides the random temporary directories specified by TestHiveContext. Then, when this line gets executed, we try to find the "default" database from the wrong place and thus causes the "default" database missing error.

The weird thing is that this part of code has existed for a long time (ever since Spark SQL became part of Spark), but it never fails :( While debugging this "default" database missing issue, I also observed some race condition and execution order related issue, which maybe the reason that this bug has been covered for so long a time...

Anyway, this PR LGTM. Thanks for fixing this!

@marmbrus Let's see whether this can bring our Jenkins Maven build back!

@chenghao-intel
Copy link
Contributor Author

Yes, I think you understand correctly, but I am not sure why the unit test passed with Jenkins previously. Probably the multithreading stuff did the tricky. Let's see if this fix will help.

@liancheng
Copy link
Contributor

Actually the SBT Jenkins build is still alright, it's the Maven build that is broken, that's even stranger, since you can easily reproduce it with SBT...

@liancheng
Copy link
Contributor

Got more clue on this, which explains why HiveQuerySuite doesn't fail previously. (but @chenghao-intel, why it fails on your side? Still mysterious.) Basically, we were jumping between two different sets of local metastore/warehouse directories while testing. The detailed process is:

  1. When the TestHive singleton object is instantiated, we create a pair of temporary directories and configure them as local testing metastore/warehouse. Let's abbreviate them as m1 and w1.

    At this point, these two directories are created, but remain empty. Default Hive database will be created lazily later.

  2. Then HiveQuerySuite gets started. Whenever a test case created via HiveComparisonTest.createQueryTest is executed, we first execute a SHOW TABLES command (notice the "MINOR HACK" comment).

    An important thing happens here is that the "default" database gets created in m1 lazily at this point.

  3. Then reset() is called.

    Within reset(), first of all, we execute a Hive RESET command, which sets all configurations to their default values, including javax.jdo.option.ConnectionURL and hive.metastore.warehouse.dir. This implies metastore is reset to the metastore_db directory under current working directory and warehouse is reset to /user/hive/warehouse (which usually doesn't exist).

  4. Then follows the getAllTables call, which is used to delete all tables in the "default" database.

    During the getAllTables call, the metastore_db directory is created if it's not there, and again, Hive creates an empty "default" database in it lazily. Hmm... wait, so here we end up with two "default" databases, one in m1 and another in metastore_db! As a result, these lines are actually always trying to cleanup tables and databases under the newly created metastore_db directory, which is empty.

  5. At last, we call configure() again and sets metadata/warehouse directories back to m1/w1, which remain intact.

In a word, the TL;DR here is, previously, testing databases and testing tables created by test suites inherited from HiveComparisonTest never really got cleaned up, and the "MINOR HACK" perfectly covered up probably the oldest bug in the history of Spark SQL! By applying this PR, we should be able to remove this hack safely.

@liancheng
Copy link
Contributor

If you run StatisticsSuite separately with either sbt test-only or mvn -DwildcardSuites, you can always reproduce the "default" database missing exception. Because no command like SHOW TABLES gets executed, thus the "default" database is not created in the temporary testing warehouse directory.

@marmbrus
Copy link
Contributor

Thanks for finding this! I've merge to master and 1.1 and 1.0.

@JoshRosen I think this should fix the Jenkins errors. Please let me know if SQL is responsible for any more failures.

asfgit pushed a commit that referenced this pull request Sep 12, 2014
Logically, we should remove the Hive Table/Database first and then reset the Hive configuration, repoint to the new data warehouse directory etc.
Otherwise it raised exceptions like "Database doesn't not exists: default" in the local testing.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #2352 from chenghao-intel/test_hive and squashes the following commits:

74fd76b [Cheng Hao] eliminate the error log

(cherry picked from commit 8194fc6)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@asfgit asfgit closed this in 8194fc6 Sep 12, 2014
asfgit pushed a commit that referenced this pull request Sep 13, 2014
This is a follow up of #2352. Now we can finally remove the evil "MINOR HACK", which covered up the eldest bug in the history of Spark SQL (see details [here](#2352 (comment))).

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #2377 from liancheng/remove-evil-minor-hack and squashes the following commits:

0869c78 [Cheng Lian] Removes the evil MINOR HACK
@chenghao-intel
Copy link
Contributor Author

Thank you @liancheng for so detailed explanation. Actually I didn't know those while submitting this PR. :)

asfgit pushed a commit that referenced this pull request Sep 23, 2014
 a follow up of #2377 and #2352, see detail there.

Author: wangfei <wangfei1@huawei.com>

Closes #2505 from scwf/patch-6 and squashes the following commits:

4874ec8 [wangfei] removes the evil MINOR HACK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants