Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference #28449

Closed
wants to merge 19 commits into from

Conversation

huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
Please see #28414

Why are the changes needed?

Make the doc cleaner and easily editable by MD editors.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually build and check

huaxingao and others added 12 commits May 4, 2020 10:48
### What changes were proposed in this pull request?
Set `spark.ui.enabled` to `false` in `SqlBasedBenchmark.getSparkSession`. This disables UI in all SQL benchmarks by default.

### Why are the changes needed?
UI overhead lowers numbers in the `Relative` column and impacts on `Stdev` in benchmark results.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Checked by running `DateTimeRebaseBenchmark`.

Closes apache#28432 from MaxGekk/ui-off-in-benchmarks.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
…ical

### What changes were proposed in this pull request?

Internal usages like `{stop,warning,message}({paste,paste0,sprintf}` and `{stop,warning,message}(some_literal_string_as_variable` have been removed and replaced as appropriate.

### Why are the changes needed?

CRAN policy recommends against using such constructions to build error messages, in particular because it makes the process of creating portable error messages for the package more onerous.

### Does this PR introduce any user-facing change?

There may be some small grammatical changes visible in error messaging.

### How was this patch tested?

Not done

Closes apache#28365 from MichaelChirico/r-stop-paste.

Authored-by: Michael Chirico <michael.chirico@grabtaxi.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…meBenchmark`

### What changes were proposed in this pull request?
- Changed to the number of rows in benchmark cases from 3 to the actual number `N`.
- Regenerated benchmark results in the environment:

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 |

### Why are the changes needed?
The changes are needed to have:
- Correct benchmark results
- Base line for other perf improvements that can be checked in the same environment.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the benchmark and checking its output.

Closes apache#28440 from MaxGekk/SPARK-31527-DateTimeBenchmark-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
…rated code on driver should not embed platform-specific constant

### What changes were proposed in this pull request?

Allow customized timeouts for `runSparkSubmit`, which will make flaky tests more likely to pass by using a larger timeout value.

I was able to reproduce the test failure on my laptop, which took 1.5 - 2 minutes to finish the test. After increasing the timeout, the test now can pass locally.

### Why are the changes needed?

This allows slow tests to use a larger timeout, so they are more likely to succeed.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The test was able to pass on my local env after the change.

Closes apache#28438 from tianshizz/SPARK-31267.

Authored-by: Tianshi Zhu <zhutianshirea@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…figuration is not taking effect

### What changes were proposed in this pull request?
This pr port [HIVE-10415](https://issues.apache.org/jira/browse/HIVE-10415): `hive.start.cleanup.scratchdir` configuration is not taking effect.

### Why are the changes needed?

I encountered this issue:
![image](https://user-images.githubusercontent.com/5399861/80869375-aeafd080-8cd2-11ea-8573-93ec4b422be1.png)
I'd like to make `hive.start.cleanup.scratchdir` effective to reduce this issue.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit test

Closes apache#28436 from wangyum/SPARK-31626.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…eader caused by datetime rebase

### What changes were proposed in this pull request?

Push the rebase logic to the lower level of the parquet vectorized reader, to make the final code more vectorization-friendly.

### Why are the changes needed?

Parquet vectorized reader is carefully implemented, to make it more likely to be vectorized by the JVM. However, the newly added datetime rebase degrade the performance a lot, as it breaks vectorization, even if the datetime values don't need to rebase (this is very likely as dates before 1582 is rare).

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

Run part of the `DateTimeRebaseBenchmark` locally. The results:
before this patch
```
[info] Load dates from parquet:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1582, vec on, rebase off                     2677           2838         142         37.4          26.8       1.0X
[info] after 1582, vec on, rebase on                      3828           4331         805         26.1          38.3       0.7X
[info] before 1582, vec on, rebase off                    2903           2926          34         34.4          29.0       0.9X
[info] before 1582, vec on, rebase on                     4163           4197          38         24.0          41.6       0.6X

[info] Load timestamps from parquet:             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1900, vec on, rebase off                     3537           3627         104         28.3          35.4       1.0X
[info] after 1900, vec on, rebase on                      6891           7010         105         14.5          68.9       0.5X
[info] before 1900, vec on, rebase off                    3692           3770          72         27.1          36.9       1.0X
[info] before 1900, vec on, rebase on                     7588           7610          30         13.2          75.9       0.5X
```

After this patch
```
[info] Load dates from parquet:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1582, vec on, rebase off                     2758           2944         197         36.3          27.6       1.0X
[info] after 1582, vec on, rebase on                      2908           2966          51         34.4          29.1       0.9X
[info] before 1582, vec on, rebase off                    2840           2878          37         35.2          28.4       1.0X
[info] before 1582, vec on, rebase on                     3407           3433          24         29.4          34.1       0.8X

[info] Load timestamps from parquet:             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1900, vec on, rebase off                     3861           4003         139         25.9          38.6       1.0X
[info] after 1900, vec on, rebase on                      4194           4283          77         23.8          41.9       0.9X
[info] before 1900, vec on, rebase off                    3849           3937          79         26.0          38.5       1.0X
[info] before 1900, vec on, rebase on                     7512           7546          55         13.3          75.1       0.5X
```

Date type is 30% faster if the values don't need to rebase, 20% faster if need to rebase.
Timestamp type is 60% faster if the values don't need to rebase, no difference if need to rebase.

Closes apache#28406 from cloud-fan/perf.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Fixed typo in `docs` directory and in `project/MimaExcludes.scala`

### Why are the changes needed?
Better readability of documents

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No test needed

Closes apache#28447 from kiszk/typo_20200504.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…session catalog

## What changes were proposed in this pull request?

SHOW TBLPROPERTIES does not get the correct table properties for tables using the Session Catalog. This PR fixes that, by explicitly falling back to the V1 implementation if the table is in fact a V1 table. We also hide the reserved table properties for V2 tables, as users do not have control over setting these table properties. Henceforth, if they cannot be set or controlled by the user, then they shouldn't be displayed as such.

### Why are the changes needed?

Shows the incorrect table properties, i.e. only what exists in the Hive MetaStore for V2 tables that may have table properties outside of the MetaStore.

### Does this PR introduce _any_ user-facing change?

Fixes a bug

### How was this patch tested?

Regression test

Closes apache#28434 from brkyvz/ddlCommands.

Authored-by: Burak Yavuz <brkyvz@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?

This PR aims to upgrade SLF4J from 1.7.16 to 1.7.30.

### Why are the changes needed?

SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run under Java 9. Also, this will bring all latest bug fixes.
- http://www.slf4j.org/news.html

> When running under Java 9, log4j version 1.2.x is unable to correctly parse the "java.version" system property. Assuming an inccorect Java version, it proceeded to disable its MDC functionality. The slf4j-log4j12 module shipping in this release fixes the issue by tweaking MDC internals by reflection, allowing log4j to run under Java 9. See also SLF4J-393.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins with the existing tests.

Closes apache#28446 from dongjoon-hyun/SPARK-31633.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122282 has finished for PR 28449 at commit 45f939f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…arquet reader caused by datetime rebase"

This reverts commit 931c0bc.
…hdir configuration is not taking effect"

This reverts commit 065871c.
…ite.Generated code on driver should not embed platform-specific constant"

This reverts commit da32137.
@huaxingao huaxingao closed this May 4, 2020
@huaxingao huaxingao deleted the html-cleanup branch May 4, 2020 18:29
@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122281 has finished for PR 28449 at commit b558cc1.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants